Pesticide transfers in agricultural catchments are responsible for diffuse but major risks to water quality. Spatialized pesticide transfer models are useful tools to assess the impact of the structure of the landscape on water quality. Before considering using these tools in operational contexts, quantifying their uncertainties is a preliminary necessary step. In this study, we explored how global sensitivity analysis could be applied to the recent PESHMELBA pesticide transfer model to quantify uncertainties on transfer simulations. We set up a virtual catchment based on a real one, and we compared different approaches for sensitivity analysis that could handle the specificities of the model: a high number of input parameters and a limited size of sample due to computational cost and spatialized output. After a preliminary screening step, we calculated Sobol' indices obtained from polynomial chaos expansion, Hilbert–Schmidt independence criterion (HSIC) dependence measures and feature importance measures obtained from random forest surrogate model. Results from the different methods were compared regarding both the information they provide and their computational cost. Sensitivity indices were first computed for each landscape element (site sensitivity indices). Second, we proposed to aggregate them at the hillslope and the catchment scale in order to get a summary of the model sensitivity and a valuable insight into the model hydrodynamic behaviour. Conclusions about the advantages and disadvantages of each method may help modellers to conduct global sensitivity analysis on other such modular and distributed hydrological models as there has been a growing interest in these approaches in recent years.

Pesticide transfers from fields to water bodies are a major but also complex environmental concern. Significant efforts are required to assess risks for aquatic ecosystems and human lives. To do so, numerical models that simulate pesticide transfers and fate are necessary tools to support risk management. Among others things, such models make it possible to explore and compare scenarios of exposure and to assess mitigation measures. For this purpose and to support decision-making, physically based models such as in

In the field of pesticide transfer modelling,

Examples of global sensitivity analysis performed on pesticide transfer models.

Table

Therefore, the objective of this paper is to evaluate and to compare three new, low-computational-cost GSA methods for application to the PESHMELBA model. The methods will be especially compared in terms of interpretability of the sensitivity indices they provide and reliability (based on the study of their convergence rate). We also investigate whether such approaches suit the spatialized aspect of PESHMELBA. To do so, we investigate the relevancy of computing both local and aggregated indices following the recommendations from

The PESHMELBA model represents a catchment as a set of interconnected components that stand for landscape elements such as plots, vegetative filter strips (VFSs), ditches, hedges or rivers

A virtual scenario of limited size is set from a portion of La Morcille catchment (France) in order to explore the different GSA methods and to ease interpretation of spatialized results. The chosen portion is selected so as to remain representative of the global composition of La Morcille catchment in terms of soil, slope, type and size of elements, as well as interface length between them. The chosen scenario is composed of 10 vineyard plots, four vegetative filter strips and five river reaches that delimit a left and a right slope (see Fig.

Soils types on the catchment are mainly sandy

Soil type locations for the case study. Green contours show the vegetative filter strips.

Considering the different soil horizons whose hydrodynamic behaviour and texture must be parameterized, the two types of vegetation (grassland and vineyard), and the different landscape element types (plots, VFSs and river reaches) that are simulated, the scenario results in 145 input parameters to be considered for sensitivity analysis. They are described in Table

Input factor description, corresponding spatial level definition and value for the nominal scenario. Nominal values are explicitly distinguished between vineyard plots and VFSs with the character “–” when needed.

For the nominal scenario, values for bulk density bd and organic carbon content moc are available from

Soil parameters for SU1, 2 and 3. Hydrodynamic parameters are based on the van Genuchten retention curve and on Schaap–van Genuchten conductivity curve fitting. Parameters are described in Table

The pesticide chosen in this study is tebuconazole as it is a fungicide widely used in the La Morcille catchment. It is a slightly mobile molecule, and we use a Freundlich isotherm to describe its adsorption equilibrium. Adsorption parameters are obtained from

A no-flux boundary condition is applied on all sides except on the surface where rain and potential evapotranspiration are considered. Rain data are extracted from the BDOH database

Climatic forcing (rain and potential evapotranspiration) for the simulation. The dotted red line stands for the one-shot pesticide application.

Although virtual, we aim at setting initial conditions as plausible as possible for this scenario. As running the model on a warm-up period is not possible due to data availability and computational cost limitation, initial water table levels are deduced from piezometric data on a neighbouring hillslope, and all soil columns are supposed to be in hydrostatic equilibrium at the beginning of the simulation. Data from several piezometers are available on a transect, perpendicular to the river. Data are extrapolated over the virtual hillslope width on both sides of the river. An upstream 0.177 m

Two types of vegetation are represented in this scenario. Vineyard cover is considered on plots, while permanent grassland is simulated on VFSs. Considering the period of simulation (3 months), a fixed root depth (Zr

Finally, ponding height is set to 0.01 m for vineyard plots, while an increased value is set for VFSs (0.05 m). According to

The input factor distributions are set to be as representative as possible of the available data on the catchment and the associated uncertainties. Mean values are taken from the nominal scenario described in Sect.

As commonly found in the literature

Using a fully distributed model such as PESHMELBA raises the issue of sampling strategy. Indeed, in this case study, even if the site is only composed of 14 surface units, the large number of soil horizons on the catchment, considering the hydrodynamic distinction between plots and VFSs, also dramatically increases the number of parameters. Sampling all parameters on each spatial unit leads to a huge number of simulations that could not be computationally afforded. Moreover, such independent sampling on a very large number of parameters may lead to misinterpretation of the sensitivity analysis results as the influence of physical processes could not be distinguished from spatial arrangement. For each sample, one set of soil parameters is therefore sampled for each soil horizon, and those parameters are applied to all spatial units that contain this horizon which sets the number of parameters to be considered in the GSA at 145.

Although the PESHMELBA model is dynamic, model outputs considered in this paper are scalar quantities rather than temporal series to keep the problem simple. In order to investigate PESHMELBA abilities to properly represent transfers in a heterogeneous landscape, sensitivity analysis is performed on four hydrological and quality variables: (1) cumulated water volume transferred in the subsurface (saturated lateral transfers), (2) pesticide mass transferred in the subsurface (saturated lateral transfers) (3) cumulated water volume transferred on surface (surface runoff) and (4) cumulated pesticide mass transferred on surface (surface runoff). However, these quantities are spatialized, leading to multidimensional outputs. To deal with the spatialized aspect, GSA is first performed on scalar quantities, on each landscape element (see Sect.

Full workflow used to perform GSA on the PESHMELBA model in three steps: (1) screening, (2a) ranking at a local scale and (2b) ranking at the catchment scale.

The full workflow used to perform GSA in the PESHMELBA model is summarized in Fig.

In what follows, we denote

Variance-based methods aim at determining how input factors contribute to the output variance

Sobol' indices' direct computation requires a large sample size that can not be afforded in this case study. As a result, we compute Sobol' indices from a limited sample size, based on polynomial chaos expansion (PCE;

Sensitivity measures based on the Hilbert–Schmidt independence criterion (HSIC;

The HSIC theory relies on reproducing kernel Hilbert space (RKHS) and kernel functions. Let

The resulting sensitivity indexes proposed by

When using a universal kernel, the HSIC indices can also be statistically used for screening purposes

Random forests

As each individual decision tree is very sensitive to the input dataset, bagging is used to avoid correlations between them and to ensure model stability. It consists in training each decision tree from a different training dataset smaller than the original one. Such subsets are built from the original one by resampling with replacement, making some members be used more than once, while others may not be used. Such a technique makes the random forests more robust when facing slight variations in the input space and increases accuracy of the prediction

RF workflow (adapted from

RF structure can be used to provide knowledge about how influential each input factor is. This measure is referred to as

For each tree

estimate

For each input factor

randomly permute

Estimate

For each input factor

compute the mean decrease in accuracy MDA

In order to assess the robustness of the calculated sensitivity indices, an additional

After computing local sensitivity indices for each landscape unit on a scalar quantity, aggregated indices are computed at the catchment scale (step 2b, Fig.

Screening is performed on the

For the ranking task, the three methods (Sobol' indices from PCE, HSIC and RF) are applied on each HU based on a

Sobol' total- and first-order indices computed using PCE, HSIC and RF site sensitivity indices for all output variables on HU14 with associated 95 % confidence intervals. RF feature importance measures are normalized by

Considering Sobol' indices, influential parameters that are identified highly differ from one output variable to the other. They are linked to distinct physical processes that may interact with the other ones. This way, sensitivity analysis brings knowledge about the way PESHMELBA represents the hydrological functioning of the virtual catchment. Water subsurface flow (top line) is driven by deep soil hydrodynamic parameters both related to vertical infiltration and subsurface saturated transfers. Water surface runoff (line 2) is also mainly influenced by deep soil parameters. Overland flow is therefore identified as being mostly due to saturation rather than to rainfall excess. Subsurface exchanges with the river are also identified as an influential process as the river bed saturated conductivity (Ks_river) is part of the most influential parameters. Such a finding is consistent with the position of HU14, which is directly connected to the river but also to many plots (see Fig.

The rankings that are provided by the three methods are broadly consistent giving confidence in their robustness. Looking in more detail, HSIC and RF indices are more similar to Sobol' total-order indices than to Sobol' first-order indices, contrary to the conclusions of

Regarding error bounds, they are very small for the HSIC and RFs, contrary to Sobol' indices estimates. The HSIC is expected to be very accurate from very small sample sizes

In order to assess each method convergence rate, sensitivity indices are calculated for growing sample sizes, from 50 to 2000 points. Results for water surface runoff and pesticide surface runoff are presented respectively in Figs.

Convergence plot for solute water runoff variable: the figure shows the ranking for the five most influential parameters identified with a sample of

Figure

Convergence plot for solute surface runoff variable: the figure shows the ranking for the five most influential parameters identified with a sample of

Results about pesticide surface runoff (Fig.

In this section, we focus on Sobol' indices (first- and total-order) despite larger error bounds as it is the only method used in this study that allows us to get separate information on interactive effects.
Site rankings such as those presented in Fig.

Maps of Sobol' site sensitivity indices for water surface runoff for the most significantly influential parameters.

Maps of Sobol' site sensitivity indices for pesticide surface runoff for the most significantly influential parameters.

Finally, Fig.

For all variables considered, the number of input parameters retained after screening remained quite high, proving that performing screening on PESHMELBA variables is a challenging task. We can first incriminate the many physical processes interacting in PESHMELBA in a spatially distributed way, each of them with its own set of characteristic parameters. However, it can also be explained by the methodology that may not be discriminating enough. Many previous studies developed efficient screening techniques for complex environmental models

By exploring several methods for ranking, the aim was to analyse their specificities and the interest of each one for the sensitivity analysis of a complex environmental model, characterized by many parameters and a high computational cost such as PESHMELBA. Considering the results of this study, we believe that the choice of the method depends on the properties of the model, the objective of the sensitivity analysis and the sample size available:

The Sobol' method remains attractive when sensitivity analysis is used to gain knowledge about the model by finely analysing its behaviour. Indeed, Sobol' indices provide a clear interpretation of the calculated indices (percentages of variance explained) and explicit information about the interactions between parameters. These elements are particularly valuable when one wishes to use sensitivity analysis to understand the functioning of the model, and this is why this approach is still widely used in the hydrological community. In the case of variables that are reasonably complex and that are not characterized by too much interactions of physical processes (such as water variables in our case), using chaos polynomials to estimate Sobol' indices is particularly interesting and efficient since it allows the use of a pre-existing sample, of very limited size. Conclusions are much contrasted for complex variables such as pesticide variables as convergence results showed that

If the sensitivity analysis aims at simplifying the model or focusing the calibration efforts, if the physical interpretation of the results is not a priority and if one has a pre-existing sample of very limited size (inferior to 750 points in our case), the use of HSIC indices is a good option as it provides robust sensitivity indices. However, it is important to note that using the HSIC dependence measure for sensitivity analysis is a recent idea and that there is still little knowledge available about identifying and differentiating the types of dependency that are captured. In addition, the choice of the kernel may affect the ranking results because each specific kernel is likely to give more or less importance to the infinite number of dependency forms that are captured by the HSIC. The question of the choice of the kernel is delicate, and it is still not addressed very much in the literature. While a few papers propose to choose the type and parameterization of the kernels in a way that maximizes the possible dependence between

The RF indices are also of interest in sensitivity analysis task as they are supposed to provide an estimator of total-order Sobol' indices. Those indices can thus be easily interpreted, and as for the other methods, they can be estimated from a pre-existing sample. However, PCE is still to be preferred since it provides more complete information including not only the total-order Sobol' indices but also the indices at all orders. In addition, our results showed that the metamodel constructed by RF is of lower quality than the one constructed by PCE at equivalent sample size, giving less confidence in the resulting sensitivity indices.

Sobol' first-order and total-order aggregated sensitivity indices for water surface runoff (left) and pesticide surface runoff (right) calculated at the scale of the catchment (top), left bank (middle) and right bank (bottom). Displayed parameters are the 11 most influential parameters regarding Sobol' indices at the catchment scale for each output variable. The bar colours are related to physical processes: brown is related to soil parameters, and the darker the brown, the deeper the parameter is; blue is related to river parameters, and green is related to vegetation parameters. The filling in of brown bars refers to the soil type of the parameter: soil 1 is not filled, and soil 2 is cross-hatched, while soil 3 is filled with circles.

Beyond the comparison of the different methods, we also tried to evaluate if it was possible and useful to combine several of these methods. However, considering the results obtained, we believe that combining the tested methods is still of little interest for hydrologists to better understand the model functioning. Indeed, the differences we found in rankings remain difficult to interpret. This is particularly the case when combining Sobol' and HSIC indices, due to the fact that the results from the HSIC dependence measure remain fuzzy to interpret.

Results about landscape analysis showed that, on the one hand, sensitivity maps provide local, detailed information about influential parameters on each location of the catchment. However they are computationally costly as one GSA per HU must be performed. This approach may be hard or even impossible to transpose to a real catchment scale composed of several hundred elements. On the other hand, catchment-scale aggregated indices provide synthetic information at a lower computational cost, but the spatialized aspect of the GSA is lost. As pointed out in

In this paper, we have described the first global sensitivity analysis of the modular and coupled PESHMELBA model. For this experiment, a simplified catchment was set in order to explore different approaches for GSA and to propose a methodology for future real applications. First, we performed screening using an independence test based on the HSIC dependence measure, dividing the dimension of the problem by 3. Second, we compared several innovative methods to compute sensitivity measures on each landscape element individually. Sobol' indices were found to be particularly attractive as they provide easy-to-interpret sensitivity measures. However, in the case of complex variables with dominant interactive effects, results showed that they may not be computed from very small samples. Third, we gathered such local sensitivity indices into sensitivity maps that highlighted local contributions of parameters. Finally, we computed aggregated indices at larger scales, on the whole catchment and on each bank hillslope since this scale still reflects spatial heterogeneities of hydrodynamic processes.
This study constitutes the first attempt of global sensitivity analysis of the PESHMELBA model. Future research should go a step further by considering the other sources of uncertainties that can affect the model and interact with parameter uncertainties. The impact of forcings, soil types, quantities and dates of application of pesticides should be addressed as already done in

Top: parameters and associated dates used to describe LAI evolution for vineyard cover. Bottom: constant LAI value set on grassland cover.

Distribution and statistics of the assigned probability density functions (pdfs) for the 145 input parameters, uniform: U(min,max), triangular: T(min,mean,max), normal: N(mean,standard deviation), lognormal: LN(mean,standard deviation) and truncated normal: TN(mean, standard deviation, min,max).

Continued.

Continued.

Remaining parameters after the screening step for each output variable. In the XXX_XXX_XXX syntax of parameter names, the first block is the type of element the parameter refers to (soil horizon, river, vegetation, pesticide, HU or VFS), and the second part is the parameter name, while the last part is the element index the parameter refers to (soil horizon or vegetation type).

The PESHMELBA model is an open-source model coded in Python (Version 2.7.17) and Fortran 90 and embedded in the OpenPALM coupler (Version 4.3.0). The code for the OpenPALM coupler is available from

All authors contributed to writing the text and to all stages of editing. PCE computation was performed by BS and ER, whereas HSIC and RF indices' computation was led by ER, with extensive support from CL and AV.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors kindly acknowledge Jérémy Verrier for his support on HIICS cluster usage, Stefano Marelli for his support on UQlab and PCE usage for GSA, Bertrand Ioss for his advice on the computation of RF feature importance measures, and Julien Pelamatti for his help on interpreting HSIC results.

This paper was edited by Charles Onyutha and reviewed by Fanny Sarrazin, Heng Dai, Razi Sheikholeslami, and one anonymous referee.