How is a global sensitivity analysis of a catchment-scale, distributed pesticide transfer model performed? Application to the PESHMELBA model

Rouzies, Emilie; Lauvernet, Claire; Sudret, Bruno; Vidard, Arthur

doi:https://doi.org/10.5194/gmd-16-3137-2023

Articles | Volume 16, issue 11

https://doi.org/10.5194/gmd-16-3137-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-16-3137-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 16, issue 11

Methods for assessment of models

|

05 Jun 2023

Methods for assessment of models |

| 05 Jun 2023

How is a global sensitivity analysis of a catchment-scale, distributed pesticide transfer model performed? Application to the PESHMELBA model

Emilie Rouzies, Claire Lauvernet, Bruno Sudret, and Arthur Vidard

Download

Final revised paper (published on 05 Jun 2023)
Preprint (discussion started on 22 Dec 2021)

Interactive discussion

Status: closed

RC1:
'Comment on gmd-2021-425', Fanny Sarrazin, 04 Feb 2022

The manuscript entitled ‘How to perform global sensitivity analysis of a catchment-scale, distributed pesticide transfer model? Application to the PESHMELBA model’ aims to provide a methodological contribution to the application of sensitivity analysis to spatially distributed models, taking as an example a pesticide transfer model. The study investigates the application of three different global sensitivity analysis (GSA) methods, namely the Variance based Sobol’ method based on Polynomial Chaos Expansion, the Feature Importance measure based on Random Forest, and The Hilbert Schmidt Independence Criterion (HSCI), that has been little used in previous hydrological and transport models. It examines the sensitivity at the local scale (homogeneous unit), as well as the landscape scale.

This is a welcome contributions, since performing a sensitivity analysis of spatially distributed model is challenging, because simulations are typically computational expensive, while the number of parameters is large. Many past sensitivity analysis applications have focused on lumped model representations. However, I think that a number of points of this manuscript need clarification and the novelty of the study needs to be better explained. I summarize my main points below, before providing detailed comments.

Main comments:

1) The method section focuses too much on the technical details of the sensitivity analysis methods (which are not methods, but methods taken from previous studies ). The methodology (how these methods are used) is not well explained, which I think should be more the focus of the paper.

2) Some clarification are needed regarding the setup of the case study (Section 2.2).

3) The manuscript lacks a discussion of the methodology and results with respect to previous studies. This would help to clarify the novelty of the study. In particular:

3a) The authors highlight that this is the first sensitivity analysis applied to the PESHMELBA model (e.g. L588 L26), but sensitivity analysis was applied to other pesticide models (e.g. Dubus et al., 2003; Hong & Purucker, 2018...). The manuscript lacks a review on previous sensitivity analyses (local, global) applied to pesticide models.

3b) It is also not clear to what extent the methodology for sensitivity analysis proposed in the manuscript is new compared to previous sensitivity analysis studies. In this respect, previous studies have also proposed to use first a computationally cheaper sensitivity analysis method (method that requires a relatively low number of model simulations, such as the Morris Elementary Effect Test) to screen non-influential inputs, before applying a computationally more expensive method (e.g. Sobol’ Variance Based method) based on the subset of influential inputs (e.g. Garcia et al., 2019; Vanuytrecht et al., 2014). This could be discussed in the manuscript.

4) I wish to point out that the PESHMELBA model, as well as the code to compute the HSIC sensitivity indices are not publicly available, but are available upon request from the corresponding author (Code and data availability section). To advance open science (and to comply with the GMD guidelines?), I think that it would be valuable to make these resources openly available, especially since the paper has a methodological focus.

Detailed comments:

L21 p1 ‘simple enough to ensure flexibility’: More explanation is needed here. This is vague and I am not sure what is meant by flexibility.

L30-31 p2 ‘catchment-scale model […] afforded’: Specify that this is spatially distributed models.

L61-73 p3: Also note the recentstudy of Smith et al. (2021).

Section 2.1: a presentation of the model parameters is missing. How many uncertain parameters that needs to be estimated are there? What are the different categories of parameters (e.g. soil, pesticide, vegetation etc., as I can read in Table 2). Parameters are only introduced much later in Section 2.5 (Table 2), which makes it difficult to follow Section 2.2 that describes the selection of the parameter values. The reference to Table 2 in the caption of Table 1 does not flow well.

Section 2.2:

- Why performing the experiment on a virtual catchment and not a ‘real’ one?

- I understand that the simulation experiment considers the application of the fungicide at the beginning of the winter period. Is this realistic?

- Why performing the experiments over a 3-month winter period? This is a very short time period.

- A justification for the soil moisture initial condition (hydrostatic equilibrium L157) is missing.

Section 2.3-2.4: I think that section 2.3 provides too many technical details that are not necessary to understand the methodology and analyses presented in the paper. The authors recognize themselves that this section could be skipped L183-184. My suggestion is to report only the main equations used to compute the sensitivity indices, while details on the derivation of these equations (that were taken from previous papers and that are therefore not really a contribution of this paper, if I understand correctly) can be moved in the supplements/appendix. I am mostly referring to the description of the Sobol’ and HSIC methods, while I think that the description of the random forest method in Section 2.4 reads very well. The main equations and references of Section 2.3 can be combined with the summary of the GSA methods provided in Section 2.5, to provide the reader only with the information that are needed to understand the methodology and the analyses, while avoiding unnecessary repetitions between Sections 2.4 and 2.5. In addition, I think that an overview of the methodology (why do you need to use the GSA methods?) is needed before introducing the specific GSA methods.

Equation (17):

- The sensitivity index for a given input is the average of the first order indices estimated for the different model outputs, weighted by the outputs variance, am I correct? This paper aims to help applying these methods, therefore I think that interpreting the equations in simple (intuitive) terms, would improve readability and clarity. It is very nice to have the formal mathematical proof for the equation, but the proof does not have any practical implications and could be moved into the supplements/appendix (this is an example of how this section could be simplified, see my previous comment).

- Only first order indices can be estimated for multidimensional outputs? In Figure 10 I see that also the total indices are calculated at the landscape scale. How was this done?

Equation (24): If Xi and Y are not independent, the value of the dependence measure estimated for a given bootstrap resample (that is in a way obtained by randomly attributing values of Y to each value of Xi, if I understand correctly) will tend to be larger than the dependence measure estimated for the original non-bootstrapped sample? Why?

Section 2.4: The GSA workflow is not well explained in the text. In particular, the references to the sample sizes used are confusing. I read that 1000 points are used for PCE (L382), 4000 points for HSIC (L391), that 1000 points were derived from the 4000 points used for HSIC and that 1000 points are used for RF. It is only by looking at Figure 5 that I finally understood that these numbers are linked: 4000 points initially used for HSIC and then based on HSIC screening 1000 points are selected for all subsequent analyses. However, I am still a bit unsure why it is written L374 that ‘a variance decomposition method was first used’, isn’t it HSIC?

L416 p17 ‘100 replications were used’: Why using 100 replications for bootstrapping? 1000 bootstrap resamples are typically used (e.g. Archer et al., 1997; Yang, 2011).

Table 2: I believe that the LAImin and LAIharv are missing. The Table would also need to include an additional column that specifies at which spatial level the parameters are defined (e.g. soil horizon, plot/VFS). It took me a while and a bit of digging in the manuscript to get this information. I would also add the value of the standard scenario in Table 2, this would further improve readability.

Section 2.5: this section does not clearly explain that the vegetation parameters and hpond are considered for vineyard plots and VFSs separately. As already mentioned in my previous comment, I think that the parameter should be clearly introduced in Section 2.1, which would improve readability and clarity.

Section 3: As mentioned in my main comments, the manuscript lacks a discussion of the methodology and results with respect to previous studies, which could be highlighted in an additional discussion section.

P463 ‘It is commonly stated that […]’. This sentence needs to be better justified. A reference is missing (e.g. Wagener & Pianosi, 2019). It can also be that many parameters are influential, but have only a small impact on the output except for a few parameters (e.g. five or six) that dominate the output variability.

L566-568: Could you explain more why is it more costly to assess the sensitivity analysis at the local scale compared to the catchment scale? From Eq.17, it looks that anyway the catchment scale indices require the calculation of the local scale indices.

Minor edits:

L47 p2 and L569 p26: replace ‘computational price’ by ‘computational cost’

L53 p2 ‘covariance’: this is a technical term I suggest to specify that it refers to input interactions-.

Section 2.3.3: clearly state that this section refers to regression trees (as a classification tree can also be considered).

P18 L450: I would replace ‘led’ by ‘leads’ (present tense), since these simulations were actually not performed.

P18 L453 ‘hydrodynamical parameter’: do the authors mean ‘soil parameters’ (as reported in Table 2)?

Table 2: There is an issue in the table header.

Figure 7: It is difficult to read the confidence intervals for the bars that have a dark brown color. I suggest revising the color scheme.

Appendix B: veget_LAI_min_1 appears twice.

References:

Archer, G. E. B., Saltelli, A., & Sobol, I. M. (1997). Sensitivity measures, ANOVA-like techniques and the use of boostrap. , (2), 99–120. https://doi.org/10.1080/00949659708811825

Dubus, I. G., Brown, C. D., & Beulke, S. (2003). Sensitivity analyses for four pesticide leaching models. , (9), 962–982. https://doi.org/10.1002/ps.723

Garcia, D., Arostegui, I., & Prellezo, R. (2019). Robust combination of the Morris and Sobol methods in complex multidimensional models. , , 104517. https://doi.org/10.1016/j.envsoft.2019.104517

Hong, T., & Purucker, S. T. (2018). Spatiotemporal sensitivity analysis of vertical transport of pesticides in soil. , , 24–38. https://doi.org/10.1016/j.envsoft.2018.03.018

Smith, J., Lin, L., Quinn, J., & Band, L. (2021). Guidance on evaluating parametric model uncertainty at decision-relevant scales. , 1–30. https://doi.org/10.5194/hess-2021-324

Vanuytrecht, E., Raes, D., & Willems, P. (2014). Global sensitivity analysis of yield output from the water productivity model. , , 323–332. https://doi.org/10.1016/j.envsoft.2013.10.017

Wagener, T., & Pianosi, F. (2019). What has Global Sensitivity Analysis ever done for us? A systematic review to support scientific advancement and to inform policy-making in earth system modelling. , (February), 1–18. https://doi.org/10.1016/j.earscirev.2019.04.006

Yang, J. (2011). Convergence and uncertainty analyses in Monte-Carlo based sensitivity analysis. , (4), 444–457. https://doi.org/10.1016/j.envsoft.2010.10.007

Citation: https://doi.org/10.5194/gmd-2021-425-RC1
- CC3: 'Reply on RC1', Emilie Rouzies, 11 Apr 2022
  
  Dear Reviewer 1,
  Thank you very much for the careful review and edits to the initital submission. Although the revision period is not over, we propose to start answering your detailed comments so as to keep the discussion as interactive as possible. All your comments and questions have been copied hereafter in italic. The revised manuscript will be provided in a second time so as to accomodate comments from all reviewers.
  Detailed comments
  L21 p1 ‘simple enough to ensure flexibility’: More explanation is needed here. This is vague and I am not sure what is meant by flexibility. Here we mean that models used to support decision-making should be designed so that users could easily modify the code to integrate new physical processes and/or adapt the existing ones. “Flexibility” then refers to the structure of these tools that should be ideally simple enough to enable such evolutions. The sentence has been clarified in the manuscript.
  L30-31 p2 ‘catchment-scale model [...] afforded’: Specify that this is spatially distributed models. Yes, added as suggested.
  L61-73 p3: Also note the recent study of Smith et al. (2021). Yes, added as suggested.
  Section 2.1: a presentation of the model parameters is missing. How many uncertain parameters that needs to be estimated are there? What are the different categories of parameters (e.g. soil, pesticide, vegetation etc., as I can read in Table 2). Parameters are only introduced much later in Section 2.5 (Table 2), which makes it difficult to follow Section 2.2 that describes the selection of the parameter values. The reference to Table 2 in the caption of Table 1 does not flow well. The model setup section has been deeply modified so as to introduce a presentation of model parameters much earlier than in the original manuscript. The Table from Section 2.5 has been modified and comes earlier to introduce all model parameters and associated categories. A sentence has also been added in the text to specify the number of uncertain parameters envolved in the senstivity analysis.
  Section 2.2: Why performing the experiment on a virtual catchment and not a ‘real’ one ? As mentioned in the text, the final targetted catchment for this study is the real La Morcille catchment. Figure 1 (left) depicts PESHMELBA meshing at this scale showing that such application results in a high number of landscape elements (>500). Conducting experiments at the full catchment scale would have drastically increased the computational cost of the analysis while turning difficult the interpretation of sensitivity analysis results considering that no such experiment has been conducted before. We then perform the experiment on a simplified case as a first try to get a clearer and simpler interpretation of the results both regarding methodological and spatial aspects.
  I understand that the simulation experiment considers the application of the fungicide at the beginning of the winter period. Is this realistic? As pointed out, considering an application of fungicide at the begining of the winter period is not very realistic. Actually, we suggest to remove all mention to “winter” period as the focus of this study is mainly methodological, based on a virtual case and realistic forcings. The chosen setup primarily aims at identifying influential factors on different physical processes integrated in PESHMELBA with a strong focus on lateral transfers of water and pesticides. We have then favoured a scenario with strong rain events since they result in both surface runoff and lateral saturated transfers in subsurface. The results of this study then provide general guidelines about the model behaviour but they should be further complemented with applications on each particular agropedoclimatic context of interest.
  
  Why performing the experiments over a 3-month winter period? This is a very short time period. In this case study, PESHMELBA time step is 1h on dry periods and 30 minuts during or after rainfall events resulting in a high computational cost for a three-month simulation (2h per simulation on the cluster used to run the simulations). A longer time period was then no affordable for this first experiment. In addition, we chose a period characterized by high cumulative rainfall volume to make sure that the different physical processes simulated in PESHMELBA would activate during the simulation (we were mainly concerned with activation of surface runoff and lateral saturated exchanges). This way, the performed sensitivity can also be used as a consistency check on the model structure itself allowing to check different physical processes simulation. However, we remain aware that results from GSA highly depend in climatic conditions as precised in the conclusion of the manuscript. As mentioned, further researches may focus on other contrasted time periods to draw robust conclusions.
  A justification for the soil moisture initial condition (hydrostatic equilibrium L157) is missing. An hydrostatic equilibrium has been chosen so as to provide the model with initial conditions as “neutral” as possible. We wanted the variables of interest to fully represent the dynamic of the catchment and not to include any non-physical warm-up period. To do so, another approach consists in running a warm-up simulation on a longer period but it would imply a high computational cost that could not have been afforded in this case.
  Section 2.3-2.4: I think that section 2.3 provides too many technical details that are not necessary to understand the methodology and analyses presented in the paper. The authors recognize themselves that this section could be skipped L183-184. My suggestion is to report only the main equations used to compute the sensitivity indices, while details on the derivation of these equations (that were taken from previous papers and that are therefore not really a contribution of this paper, if I understand correctly) can be moved in the supplements/appendix. I am mostly referring to the description of the Sobol’ and HSIC methods, while I think that the description of the random forest method in Section 2.4 reads very well. The main equations and references of Section 2.3 can be combined with the summary of the GSA methods provided in Section 2.5, to provide the reader only with the information that are needed to understand the methodology and the analyses, while avoiding unnecessary repetitions between Sections 2.4 and 2.5. In addition, I think that an overview of the methodology (why do you need to use the GSA methods?) is needed before introducing the specific GSA methods. The section on method description has been fully reviewed as suggested. Section 2.3 and 2.5 have been merged and only the main equations relative to each methods now remain together with more practical interpretation of calculated indices. We have also added a justification for method comparison and an overview on the full methodology at the begining of the section.
  Equation (17): The sensitivity index for a given input is the average of the first order indices estimated for the different model outputs, weighted by the outputs variance, am I correct? This paper aims to help applying these methods, therefore I think that interpreting the equations in simple (intuitive) terms, would improve readability and clarity. It is very nice to have the formal mathematical proof for the equation, but the proof does not have any practical implications and could be moved into the supplements/appendix (this is an example of how this section could be simplified, see my previous comment). Indeed, aggregated sensitivity indices correspond to an average of Sobol’ indices on each landscape unit weighted by local output variances. As suggested, the proof has been removed from the main text while a sentence has been added to qualitatively describe the formule for such indices.
  Only first order indices can be estimated for multidimensional outputs? In Figure 10 I see that also the total indices are calculated at the landscape scale. How was this done ? The formulation from previous Eq. (17) can actually be applied to Sobol’ indices from any order. We have clarified the text and have explicitely mentioned the calculation of first and total order indices in Section 2.6.
  Equation (24): If Xi and Y are not independent, the value of the dependence measure estimated for a given bootstrap resample (that is in a way obtained by randomly attributing values of Y to each value of Xi, if I understand correctly) will tend to be larger than the dependence measure estimated for the original non-bootstrapped sample? Why? First, yes a bootstrap resample is indeed obtained by randomly attributing values of Y to each value of Xi. However, if Xi and Y are not independent, the HSIC value for such a bootstrap resample will be lower than the HSIC value for the original sample because the random resampling step breaks the existing dependence relationship. The p-value then will tend to zero.
  Section 2.4: The GSA workflow is not well explained in the text. In particular, the references to the sample sizes used are confusing. I read that 1000 points are used for PCE (L382), 4000 points for HSIC (L391), that 1000 points were derived from the 4000 points used for HSIC and that 1000 points are used for RF. It is only by looking at Figure 5 that I finally understood that these numbers are linked: 4000 points initially used for HSIC and then based on HSIC screening 1000 points are selected for all subsequent analyses. However, I am still a bit unsure why it is written L374 that ‘a variance decomposition method was first used’, isn’t it HSIC? First, a screening test is performed based on the statistical using HSIC from a 4,000-point LHS. Once influential parameters have been identified, a new 1,000-point LHS is generated with only influential parameters. On this new sample, Sobol, HSIC and RF indices are compared for ranking. This description has been explicitely integrated at the begining of Section 2.4, when merging Section 2.3 and 2.5. with clearer references to sample sizes.
  L416 p17 ‘100 replications were used’: Why using 100 replications for bootstrapping? 1000 bootstrap resamples are typically used (e.g. Archer et al., 1997; Yang, 2011). Yes, indeed, we are aware that 1000 is a typical value for bootstrap resamples. However, such value was not affordable for estimating HSIC measures in a reasonable computing time. We then prefered to use 100 replications for all the tested methods, even the ones with low computational cost. Justification for this value has been added in the text.
  Table 2: I believe that the LAImin and LAIharv are missing. The Table would also need to include an additional column that specifies at which spatial level the parameters are defined (e.g. soil horizon, plot/VFS). It took me a while and a bit of digging in the manuscript to get this information. I would also add the value of the standard scenario in Table 2, this would further improve readability. As suggested, we added a column to Table 2 with spatial level definition and we also specified the values for the nominal simulation.
  Section 2.5: this section does not clearly explain that the vegetation parameters and hpond are considered for vineyard plots and VFSs separately. As already mentioned in my previous comment, I think that the parameter should be clearly introduced in Section 2.1, which would improve readability and clarity. Yes, modified as suggested
  Section 3: As mentioned in my main comments, the manuscript lacks a discussion of the methodology and results with respect to previous studies, which could be highlighted in an additional discussion section. As suggested, a discussion section has been added to comment on the global methodology and to put it into perspective in relation to previous studies.
  P463 ‘It is commonly stated that [...]’. This sentence needs to be better justified. A reference is missing (e.g. Wagener & Pianosi, 2019). It can also be that many parameters are influential, but have only a small impact on the output except for a few parameters (e.g. five or six) that dominate the output variability. Indeed, the sentence is inaccurate. The screening step intrinsically does not allow to draw conclusions on the number of parameters that dominate the output variability. We propose to eliminate the sentence to avoid confusion and hasty conclusions.
  L566-568: Could you explain more why is it more costly to assess the sensitivity analysis at the local scale compared to the catchment scale? From Eq.17, it looks that anyway the catchment scale indices require the calculation of the local scale indices. Indeed, in this case study we re-use the local scale indices to calculate the aggregated ones implying in this case no difference in computational cost. However, in its paper Gamboa et al. (2014) proposes an estimator for these aggregated indices that does not need the calculation of local indices. As local indices were calculated anyway in our case, we did not try such estimator but we mention it in the text since it seems very interesting to us, in the case the user does not want to compute local indices but directly the aggregated ones.
  In addition, you also suggested to modify the Data and Code availability. In order to comply with GMD Code and Data Policy, two Zenodo repositories have been created to provide both PESHMELBA source code and data. The urls and DOI have been added to the ’Code and Data Availability’ section:
  - PESHMELBA software: https://zenodo.org/record/6319769#.YinMV1TjKUk
  - Data and codes for sensitivity analysis: https://zenodo.org/record/6319773#.YinMc1TjKUk
  
  Citation: https://doi.org/10.5194/gmd-2021-425-CC3
CEC1:
'Comment on gmd-2021-425', Juan Antonio Añel, 21 Feb 2022

Dear authors,
After checking your manuscript, it has come to our attention that it does not comply with our Code and Data Policy.

https://www.geoscientific-model-development.net/policies/code_and_data_policy.html

We can not accept embargoes such as registration or previous contact with the authors, and the code must be publicly stored in one of our accepted repositories. In this case, given that, moreover, the code of PESHMELBA is free-libre open-source software (FLOSS), there is nothing that prevents this is done. The same applies to the Python and Matlab scripts and code you have used.

In this way, you must reply to this comment with the link to the repository used in your manuscript, with its DOI.

Moreover, please, for the sake of reproducibility, include in the text the version number of all the software that you have used: For example, UQLab version 2.0.0.

Also, you must include in a potential reviewed version of your manuscript the modified 'Code and Data Availability' section and the DOI of the code.

Please, reply as soon as possible to this comment with the link to the repository so that it is available for the peer-review process, as it should be.
Juan A. Añel

Geosci. Model Dev. Exec. Editor

Citation: https://doi.org/10.5194/gmd-2021-425-CEC1
- CC1:
  'Reply on CEC1', Emilie Rouzies, 08 Mar 2022
  
  Dear editor,
  As requested in your last comment, please find below the Zenodo links to the software sources, input data and codes for sensitivity analysis used in this manuscript:
  - PESHMELBA software: https://zenodo.org/deposit/6319769#
  
  - Data and sensitivity analysis codes: https://zenodo.org/record/6319773#.YidgSlTjKUk
  The DOI notices have not been published yet. Is it acceptable to provide the temporary links (listed below) and to make them public when the revision process is over ?
  Links to DOI notices:
  - PESHMELBA software: https://data.inrae.fr/dataset.xhtml?persistentId=doi%3A10.15454%2F2HAU8R&version=DRAFT
  
  - Data and sensitivity analysis codes: https://data.inrae.fr/dataset.xhtml?persistentId=doi%3A10.15454%2F2YVY4O&version=DRAFT
  Also, these links and DOIs will be included in the next reviewed version of the manuscript.
  Regards,
  Emilie Rouzies
  
  Citation: https://doi.org/10.5194/gmd-2021-425-CC1
  - CEC2: 'Reply on CC1', Juan Antonio Añel, 08 Mar 2022
    
    Dear authors,
    First of all, I can not access the Zenodo repository you have linked for PESHMELBA, as it asks me for a login. The Zenodo repository must be open and accessible to anyone.
    
    About the DOIs: I do not understand what you mean or why you continue using the data.inrae.fr servers. The DOI is already issued in the Zenodo repository for the Data (check the column on the right side of the webpage, it is 10.15454/2YVY4O), and yes, you must include it in any potential new version, not only in the final accepted paper. Think, for example, that your paper could not be finally accepted for publication. However, the Discussions paper would continue to be public on the webpage of Geoscientific Model Development Discussions. The Discussions paper has to comply with its reproducibility standards, and without the DOI or the permanent storage of the materials is not possible. Also, the final repository could be different to the current one, as potential changes requested by the reviewers and editors could make necessary to produce new software or data. This is another reason to keep separated repositories (if necessary) used during the review process and the final accepted paper.
    
    Therefore, please, deposit the PESHMELBA code in an open repository and link here the DOIs issued by Zenodo (and include them in any revised version of your paper).
    Regards,
    Juan A. Añel
    Geosci. Model Dev. Executive Editor
    
    Citation: https://doi.org/10.5194/gmd-2021-425-CEC2
    
    CC2: 'Reply on CEC2', Emilie Rouzies, 10 Mar 2022
    
    Dear editor,
    I indeed made a mistake when publishing the Zenodo link associated to the PESHMELBA software. Please find below the right url. The DOI notices associated to both repositories have also been made public :
    PESHMELBA software: https://zenodo.org/record/6319769#.YinMV1TjKUk
    Data and codes for sensitivity analysis: https://zenodo.org/record/6319773#.YinMc1TjKUk
    Sorry for the wrong sending, please let me know if you have any problem accessing the repositories,
    Regards,
    Emilie Rouzies
    
    Citation: https://doi.org/10.5194/gmd-2021-425-CC2
RC2:
'Comment on gmd-2021-425', Heng Dai, 14 Apr 2022
Report on the manuscript “How to perform global sensitivity analysis of a catchment-scale, distributed pesticide transfer model? Application to the PESHMELBA model.” submitted to .

This paper explored how global sensitivity analysis can be applied to the PESHMELBA pesticide transfer model to quantify uncertainties on transfer simulations. Three different sensitivity analysis approaches (Sobol’ indices obtained from Polynomial Chaos Expansion, HSIC dependence measures and feature importance measures obtained from Random Forest surrogate model) have been implemented into a test case of virtual catchment to compare their performances.

I believe this paper is well written with high quality and good logic. However, some points of this paper need to be clarified and more discussions are needed, as listed below.

Major Comments:

The novelty of this research needs more emphasis since the methods and algorithms are not new and the application of global sensitivity analysis in complex large-scale model is also not new (see Dai et al., 2017).

The reasons of doing comparison for these three different sensitivity analysis methods need more discussions. Some conclusions for differences of these three methods are too obvious (e.g., the Sobol can consider the interactions).

The screening procedure is unclear, what methods were used? The standard procedure is to use the Morris method or other low computational cost sensitivity analysis methods.

The description of aggregated sensitivity indices is ambiguous, and the advantage of using it is not convincing.
Citation: https://doi.org/10.5194/gmd-2021-425-RC2
AC1: 'Final response on gmd-2021-425', Emilie Rouzies, 12 May 2022

All the comments and questions of Reviewers 1 and 2 have been copied hereafter in bold. Please note that we will upload the revised manuscript as a supplement to these response comments within a few days.

Reviewer 1 Thank you very much for the careful review and edits to the initital submission. We have already provided responses to detailed comments during the discussion period. In what follows, we address main comments and duplicate responses to detailed comments. All your comments and questions have been copied hereafter in bold. We have also revised the manuscript accordingly to accommodate them.
Main comments
• The method section focuses too much on the technical details of the sensitivity analysis methods (which are not methods, but methods taken from previous studies ). The methodology (how these methods are used) is not well explained, which I think should be more the focus of the paper.
As suggested, the structure of the paper has been deeply modified so as to get a clearer overview of the full methodology, less technical details about the methods used for sensitivity analysis but more practical considerations on how these methods are used. To do so, Section 2.3 and Section 2.5 have been merged and shortened to keep only the main equations relative to each method. An overview of the full methodology so as a justification for using and comparing several methods has also been added at the begining of Section 2.4
• Some clarification are needed regarding the setup of the case study (Section 2.2).
According to your detailed comments, the model setup section has been modified so as to provide readers with a clearer description of the parameters considered in this study. The number of parameters envolved in the sensitivity analysis has notably been specified earlier in the text. The Table from Section 2.5 also comes earlier and it has been enriched with a description of the different categories of parameters.
• The manuscript lacks a discussion of the methodology and results with respect to previous studies. This would help to clarify the novelty of the study. In particular:
– The authors highlight that this is the first sensitivity analysis applied to the PESHMELBA model (e.g. L588 L26), but sensitivity analysis was applied to other pesticide models (e.g. Dubus et al., 2003; Hong & Purucker, 2018...). The manuscript lacks a review on previous sensitivity analyses (local, global) applied to pesticide models.
As suggested, a review on sensitivity analysis applied to pesticide models has been added in the introduction. It covers both the different approaches (local vs. global) and the use of a screening method to decrease the dimension of the problem.
• The method section focuses too much on the technical details of the sensitivity analysis methods (which are not methods, but methods taken from previous studies ). The methodology (how these methods are used) is not well explained, which I think should be more the focus of the paper.
As suggested, the structure of the paper has been deeply modified so as to get a clearer overview of the full methodology, less technical details about the methods used for sensitivity analysis but more practical considerations on how these methods are used. To do so, Section 2.3 and Section 2.5 have been merged and shortened to keep only the main equations relative to each method. An overview of the full methodology so as a justification for using and comparing several methods has also been added at the begining of Section 2.4
• Some clarification are needed regarding the setup of the case study (Section 2.2). According to your detailed comments, the model setup section has been modified so as to provide readers with a clearer description of the parameters considered in this study. The number of parameters envolved in the sensitivity analysis has notably been specified earlier in the text. The Table from Section 2.5 also comes earlier and it has been enriched with a description of the different categories of parameters.
• The manuscript lacks a discussion of the methodology and results with respect to previous studies. This would help to clarify the novelty of the study. In particular:
– The authors highlight that this is the first sensitivity analysis applied to the PESHMELBA model (e.g. L588 L26), but sensitivity analysis was applied to other pesticide models (e.g. Dubus et al., 2003; Hong & Purucker, 2018...). The manuscript lacks a review on previous sensitivity analyses (local, global) applied to pesticide models.
As suggested, a review on sensitivity analysis applied to pesticide models has been added in the introduction. It covers both the different approaches (local vs. global) and the use of a screening method to decrease the dimension of the problem.
– It is also not clear to what extent the methodology for sensitivity analysis proposed in the manuscript is new compared to previous sensitivity analysis studies. In this respect, previous studies have also proposed to use first a computationally cheaper sensitivity analysis method (method that requires a relatively low number of model simulations, such as the Morris Elementary Effect Test) to screen non-influential inputs, before applying a computationally more expensive method (e.g. Sobol’ Variance Based method) based on the subset of influential inputs (e.g. Garcia et al., 2019; Vanuytrecht et al., 2014). This could be discussed in the manuscript. Indeed, the methodology we followed to perform sensitivity analysis in this study is a classical approach: first, a screening step and second, a ranking step applied on the reduced set of input parameters. However, for both steps, the specificities af the application (high number of input parameters and high computational cost of a PESHMELBA simulation) prevented us from using classical methods. Alternatives approaches, that are recent and, up to now, poorly applied to pesticide model analysis were then necessary. In addition, combining several ranking methods with different definitions of sensitivity to get a robust overview of influential parameters is also new. A discussion section about the full methodology has been added to argue on these points.
• I wish to point out that the PESHMELBA model, as well as the code to compute the HSIC sensitivity indices are not publicly available, but are available upon request from the corresponding author (Code and data availability section). To advance open science (and to comply with the GMD guidelines?), I think that it would be valuable to make these resources openly available, especially since the paper has a methodological focus.
In order to comply with GMD Code and Data Policy, two Zenodo repositories have been created to provide both PESHMELBA source code and data. The urls and DOI have been added to the ’Code and Data Availability’ section :
- PESHMELBA software: https://zenodo.org/record/6319769#.YinMV1TjKUk
- Data and codes for sensitivity analysis: https://zenodo.org/record/6319773#.YinMc1TjKUk
Detailed comments
L21 p1 ‘simple enough to ensure flexibility’: More explanation is needed here. This is vague and I am not sure what is meant by flexibility.
Here we mean that models used to support decision-making should be designed so that users could easily modify the code to integrate new physical processes and/or adapt the existing ones. “Flexibility” then refers to the structure of these tools that should be ideally simple enough to enable such evolutions. The sentence has been clarified in the manuscript.
L30-31 p2 ‘catchment-scale model [...] afforded’: Specify that this is spatially distributed models.
Yes, added as suggested.
L61-73 p3: Also note the recent study of Smith et al. (2021).
Yes, added as suggested.
Section 2.1: a presentation of the model parameters is missing. How many uncertain parameters that needs to be estimated are there? What are the different categories of parameters (e.g. soil, pesticide, vegetation etc., as I can read in Table 2). Parameters are only introduced much later in Section 2.5 (Table 2), which makes it difficult to follow Section 2.2 that describes the selection of the parameter values. The reference to Table 2 in the caption of Table 1 does not flow well.
The model setup section has been deeply modified so as to introduce a presentation of model parameters much earlier than in the original manuscript. The Table from Section 2.5 has been modified and comes earlier to introduce all model parameters and associated categories. A sentence has also been added in the text to specify the number of uncertain parameters envolved in the senstivity analysis.
Section 2.2: Why performing the experiment on a virtual catchment and not a ‘real’ one ?
As mentioned in the text, the final targetted catchment for this study is the real La Morcille catchment. Figure 1 (left) depicts PESHMELBA meshing at this scale showing that such application results in a high number of landscape elements (>500). Conducting experiments at the full catchment scale would have drastically increased the computational cost of the analysis while turning difficult the interpretation of sensitivity analysis results considering that no such experiment has been conducted before. We then perform the experiment on a simplified case as a first try to get a clearer and simpler interpretation of the results both regarding methodological and spatial aspects.
I understand that the simulation experiment considers the application of the fungicide at the beginning of the winter period. Is this realistic?
As pointed out, considering an application of fungicide at the begining of the winter period is not very realistic. Actually, we suggest to remove all mention to “winter” period as the focus of this study is mainly methodological, based on a virtual case and realistic forcings. The chosen setup primarily aims at identifying influential factors on different physical processes integrated in PESHMELBA with a strong focus on lateral transfers of water and pesticides. We have then favoured a scenario with strong rain events since they result in both surface runoff and lateral saturated transfers in subsurface. The results of this study then provide general guidelines about the model behaviour but they should be further complemented with applications on each particular agropedoclimatic context of interest.
Why performing the experiments over a 3-month winter period? This is a very short time period.
In this case study, PESHMELBA time step is 1h on dry periods and 30 minuts during or after rainfall events resulting in a high computational cost for a three-month simulation (2h per simulation on the cluster used to run the simulations). A longer time period was then no affordable for this first experiment. In addition, we chose a period characterized by high cumulative rainfall volume to make sure that the different physical processes simulated in PESHMELBA would activate during the simulation (we were mainly concerned with activation of surface runoff and lateral saturated exchanges). This way, the performed sensitivity can also be used as a consistency check on the model structure itself allowing to check different physical processes simulation. However, we remain aware that results from GSA highly depend in climatic conditions as precised in the conclusion of the manuscript. As mentioned, further researches may focus on other contrasted time periods to draw robust conclusions.
A justification for the soil moisture initial condition (hydrostatic equilibrium L157) is missing.
An hydrostatic equilibrium has been chosen so as to provide the model with initial conditions as “neutral” as possible. We wanted the variables of interest to fully represent the dynamic of the catchment and not to include any non-physical warm-up period. To do so, another approach consists in running a warm-up simulation on a longer period but it would imply a high computational cost that could not have been afforded in this case.
Section 2.3-2.4: I think that section 2.3 provides too many technical details that are not necessary to understand the methodology and analyses presented in the paper. The authors recognize themselves that this section could be skipped L183-184. My suggestion is to report only the main equations used to compute the sensitivity indices, while details on the derivation of these equations (that were taken from previous papers and that are therefore not really a contribution of this paper, if I understand correctly) can be moved in the supplements/appendix. I am mostly referring to the description of the Sobol’ and HSIC methods, while I think that the description of the random forest method in Section 2.4 reads very well. The main equations and references ofSection 2.3 can be combined with the summary of the GSA methods provided in Section 2.5, to provide the reader only with the information that are needed to understand the methodology and the analyses, while avoiding unnecessary repetitions between Sections 2.4 and 2.5. In addition, I think that an overview of the methodology (why do you need to use the GSA methods?) is needed before introducing the specific GSA methods.
The section on method description has been fully reviewed as suggested. Section 2.3 and 2.5 have been merged and only the main equations relative to each methods now remain together with more practical interpretation of calculated indices. We have also added a justification for method comparison and an overview on the full methodology at the begining of the section.
Equation (17): The sensitivity index for a given input is the average of the first order indices estimated for the different model outputs, weighted by the outputs variance, am I correct? This paper aims to help applying these methods, therefore I think that interpreting the equations in simple (intuitive) terms, would improve readability and clarity. It is very nice to have the formal mathematical proof for the equation, but the proof does not have any practical implications and could be moved into the supplements/appendix (this is an example of how this section could be simplified, see my previous comment).
Indeed, aggregated sensitivity indices correspond to an average of Sobol’ indices on each landscape unit weighted by local output variances. As suggested, the proof has been removed from the main text while a sentence has been added to qualitatively describe the formule for such indices. Only first order indices can be estimated for multidimensional outputs?
In Figure 10 I see that also the total indices are calculated at the landscape scale. How was this done ?
The formulation from previous Eq. (17) can actually be applied to Sobol’ indices from any order. We have clarified the text and have explicitely mentioned the calculation of first and total order indices in Section 2.6.
Equation (24): If Xi and Y are not independent, the value of the dependence measure estimated for a given bootstrap resample (that is in a way obtained by randomly attributing values of Y to each value of Xi, if I understand correctly) will tend to be larger than the dependence measure estimated for the original non-bootstrapped sample? Why?
First, yes a bootstrap resample is indeed obtained by randomly attributing values of Y to each value of Xi. However, if Xi and Y are not independent, the HSIC value for such a bootstrap resample will be lower than the HSIC value for the original sample because the random resampling step breaks the existing dependence relationship. The p-value then will tend to zero.
Section 2.4: The GSA workflow is not well explained in the text. In particular, the references to the sample sizes used are confusing. I read that 1000 points are used for PCE (L382), 4000 points for HSIC (L391), that 1000 points were derived from the 4000 points used for HSIC and that 1000 points are used for RF. It is only by looking at Figure 5 that I finally understood that these numbers are linked: 4000 points initially used for HSIC and then based on HSIC screening 1000 points are selected for all subsequent analyses. However, I am still a bit unsure why it is written L374 that ‘a variance decomposition method was first used’, isn’t it HSIC?
First, a screening test is performed based on the statistical using HSIC from a 4,000-point LHS. Once influential parameters have been identified, a new 1,000-point LHS is generated with only influential parameters. On this new sample, Sobol, HSIC and RF indices are compared for ranking. This description has been explicitely integrated at the begining of Section 2.4, when merging Section 2.3 and 2.5. with clearer references to sample sizes.
L416 p17 ‘100 replications were used’: Why using 100 replications forbootstrapping? 1000 bootstrap resamples are typically used (e.g. Archer et al., 1997; Yang, 2011).
Yes, indeed, we are aware that 1000 is a typical value for bootstrap resamples. However, such value was not affordable for estimating HSIC measures in a reasonable computing time. We then prefered to use 100 replications for all the tested methods, even the ones with low computational cost. Justification for this value has been added in the text.
Table 2: I believe that the LAImin and LAIharv are missing. The Table would also need to include an additional column that specifies at which spatial level the parameters are defined (e.g. soil horizon, plot/VFS). It took me a while and a bit of digging in the manuscript to get this information. I would also add the value of the standard scenario in Table 2, this would further improve readability.
As suggested, we added a column to Table 2 with spatial level definition and we also specified the values for the nominal simulation.
Section 2.5: this section does not clearly explain that the vegetation parameters and hpond are considered for vineyard plots and VFSs separately. As already mentioned in my previous comment, I think that the parameter should be clearly introduced in Section 2.1, which would improve readability and clarity.
Yes, modified as suggested
Section 3: As mentioned in my main comments, the manuscript lacks a discussion of the methodology and results with respect to previous studies, which could be highlighted in an additional discussion section.
As suggested, a discussion section has been added to comment on the global methodology and to put it into perspective in relation to previous studies.
P463 ‘It is commonly stated that [...]’. This sentence needs to be better justified. A reference is missing (e.g. Wagener & Pianosi, 2019). It can also be that many parameters are influential, but have only a small impact on the output except for a few parameters (e.g. five or six) that dominate the output variability. Indeed, the sentence is inaccurate. The screening step intrinsically does not allow to draw conclusions on the number of parameters that dominate the output variability. We propose to eliminate the sentence to avoid confusion and hasty conclusions.
L566-568: Could you explain more why is it more costly to assess the sensitivity analysis at the local scale compared to the catchment scale? From Eq.17, it looks that anyway the catchment scale indices require the calculation of the local scale indices.
Indeed, in this case study we re-use the local scale indices to calculate the aggregated ones implying in this case no difference in computational cost. However, in its paper Gamboa et al. (2014) proposes an estimator for these aggregated indices that does not need the calculation of local indices. As local indices were calculated anyway in our case, we did not try such estimator but we mention it in the text since it seems very interesting to us, in the case the user does not want to compute local indices but directly the aggregated ones.
Reviewer 2
Thank you very much for the careful review and edits to the initital submission. Below we address the comments raised and we have also revised the manuscript accordingly to accommodate these.
Main comments
The novelty of this research needs more emphasis since the methods and algorithms are not new and the application of global sensitivity analysis in complex large-scale model is also not new (see Dai et al., 2017).
Indeed, the methodology we follow to perform sensitivity analysis in this study is a classical approach: first, a screening step and second, a ranking step applied on the reduced set of input parameters. However, the combined specificities of the application (high number of input parameters and high computational cost of a PESHMELBA simulation) are very limiting to perform each step. Alternative approaches that require limited sample sizes and that have been, up to now, poorly applied to pesticide model analysis are then necessary both for screening and ranking. In addition, combining several ranking methods with different definitions of sensitivity to get a comprehensive overview of influential parameters is also new. A discussion section about the full methodology has been added to argue on these points and to emphasize the novelty of this research.
The reasons of doing comparison for these three different sensitivity analysis methods need more discussions. Some conclusions for differences of these three methods are too obvious (e.g., the Sobol can consider the interactions).
Rather than comparing, in this study, we assume that combining different sensitivity analysis methods with contrasted definitions of sensitivity allows for building a robust and comprehensive overview of influential parameters on complex variables. For instance, using the HSIC dependence measure may allow to identify parameters that are influential in other quantities than second moment. This approach may be of particular interest for the variables considered in this case study as they result from the interactions of various physical processus and might be bimodal or highly skewed. However, as implementing several methods may not be possible in every case studies, comparing these methods regarding information it provides, accuracy and ease of implementation may also help future users to choose the most adapted approach for their case study. This justification has been added to the begining of Section 2.4 and the full argumentation has been modified so as not to only considered comparison of methods but also to justify to combine them. In addition, conclusions on the differences (or the lack of differences) between them have been consolidated refering to the difference in the sensitivity definitions they provide.
The screening procedure is unclear, what methods were used? The standard procedure is to use the Morris method or other low computational cost sensitivity analysis methods.
In this study, the Morris method could not be used due to 1) the high number of input parameters that led to fuzzy visual clustering and 2) the computational cost of a simulation that prevented us from running a large number of trajectories (see discussion of the revised manuscript for references of several studies that showed that a large number of trajectory is necessary to get robust screening results). Instead we used a statistical test for independence based on the HSIC measure. Mention to screening based on statisitical test has been added at the begining of Section 2.4 while justification for not using the classical Morris is provided in the discussion section.
The description of aggregated sensitivity indices is ambiguous, and the advantage of using it is not convincing.
Justification for using such aggregated indices is mainly to provide a summary of the overall sensitivity, especially to better target calibration effort. Also, such aggregated indices can be directly estimated, without performing a local GSA on each landscape element. This way, they can provide a rough sensitivity indicator if sufficient computational budget for local indice computation is not available. Justification for using them has been clarified in the manuscript. Also, the proof of such aggregated indice formulation has been replaced by a qualitative description to improve clarity and readability.

Citation: https://doi.org/10.5194/gmd-2021-425-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Emilie Rouzies on behalf of the Authors (20 Jun 2022) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (20 Jun 2022) by Charles Onyutha

RR by Fanny Sarrazin (01 Jul 2022)

Suggestions for revision or reasons for rejection

The authors extensively revised the manuscript and did a good job in addressing my comments. The manuscript now reads mostly very well. The authors have also created two zenodo repositories which contain the model code as well as the code used to perform the sensitivity analyses, and which are clearly documented in two README.txt files.

I have a remaining comment regarding the input samples. The authors created an initial LHS sample of size 4000 to screen the influential parameter. Then they ranked these influential parameters based on a new sample of size 1000. Why is it required to create a new sample for the ranking? Could not the initial sample of size 4000 be used for this purpose (only considering the influential input and dropping the dimensions corresponding the non-influential inputs)? I think this should be clarified in the manuscript, since creating a new sample largely increases the computational cost of the analyses.

In addition, I report below small comments:

p3 L60-61 “Such approach […] information on the input.”: this sentence if not clear and needs reformulation. Do you mean that this is a GSA method that can be applied ‘given data’, and does not require a specific structure for the input sample (see e.g. Saltelli et al., 2021)?

p3 L67: The model used in Vanuytrecht et al. (2014) is not a pesticide model, is it?

p3 L69-70 “This qualitative method is based […] to make clusters appear.”: references are missing to support this statement. For instance, Kim et al. (2022, Sect. 4.5) discusses the difficulties in applying Morris method in high dimensions.

p11 L239 ‘accuracy’: do you mean precision/robustness (assessed using bootstrapping)? I do not think from your analysis you can infer whether sensitivity indices are accurate. Can we know the ‘true’ value of the sensitivity indices?

p11 L254 “accurately”: I would remove this term which I think may be misleading Sobol’ indices are typically calculated numerically and not analytically. Can we be sure that the numerical procedure produces accurate sensitivity indices estimates?

p19 L454-455 ‘It may indicate […] output variance.’: This sentence needs reformulation.

p24 L548: I think this sentence needs clarification. The term ‘relevancy’ is vague. The term ‘confidence’ is a strong word and I do not think it is appropriate here, given the limitation of these methods discussed in the results section.

p25 L551-552: To put this into context, I suggest to refer to Saltelli et al. (2021), who highlights the benefit of ‘given data’ sensitivity analysis.

Minor edits:

p1 L16: add ‘ecosystems’ after ‘aquatic’.
p2 L40: replace ‘vary‘ and ‘are‘ by ‘varies’ and ‘is’, respectively.
p3 L72: I would start a new paragraph here.
p3 L76 ‘synthetic’: what do you mean? Aggregate? Or maybe this term can simply be removed?
p9 L172: there is a typo in ‘availability’
p11 L239: replace ‘it provides’ by ‘they provide’
p13 L270: there is a typo in “circumvent”
p13 L281: replace ‘computed’ by ‘compute’
p18 L404: remove ‘on’ after ‘screening’.
p25 L558: I suggest to replace ‘used’ by ‘performed’ (the verb ‘use’ appears twice in the sentence)

References

Kim, A., Mutel, C., & Froemelt, A. (2022). Robust high-dimensional screening. Environmental Modelling and Software, 148, 105270. https://doi.org/10.1016/j.envsoft.2021.105270

Saltelli, A., Jakeman, A., Razavi, S., & Wu, Q. (2021). Sensitivity analysis: A discipline coming of age. Environmental Modelling and Software, 146, 105226. https://doi.org/10.1016/j.envsoft.2021.105226

Hide

RR by Razi Sheikholeslami (24 Aug 2022)

Suggestions for revision or reasons for rejection

Overall evaluation: The submitted manuscript entitled "How to perform global sensitivity analysis of a catchment-scale, distributed pesticide transfer model? Application to the PESHMELBA model." by Rouzies et al. applies three GSA methods to evaluate the sensitivity of the distributed process-based model to its parameters. The writing is clear and precise, and all sections are understandable. Considering the importance of such analysis for complex hydrologic models, I think the motivation and benefits of this study will be of interest to Geoscientific Model Development readers. Particularly, I like the fact that various GSA methods have been compared in this paper. That being said, the manuscript suffers from some major shortcomings with respect to its novelty and rigor. Here, I outline my comments and suggestions that should allow authors to improve their paper:

Comment 1. The major shortcoming of the paper is that the overall value of this contribution to the hydrologic modelling community is not adequately discussed. The main contribution of this study is applying three GSA techniques to investigate the role of various parameters in pesticide transfer model. However, its merit over previous attempts is still somehow limited/not well presented. As mentioned by the authors, there are several studies where GSA approach has been applied to explore the factor importance in this context. I am not sure if this and similar studies would add much useful information to the existing body of knowledge on uncertainty analysis, parameter estimation, identifiability analysis, etc. I strongly suggest authors to clearly explain the extend to which this study is adding to the previously presented knowledge in the field (e.g., through new approaches to solving existing problems? etc.).

Having discussed the issue from that point of view, I would rather look at it from another perspective as well. Based on the reported results (Figure 7), overall, the estimated sensitivity indices by RF, HSIC, and Sobol methods are quite different. But, it is not convincing from the paper why one should use HSIC instead of Sobol or RF method. The manuscript correctly mentions the conceptual differences between three methods. For example, HSIC assesses the strength of dependencies between inputs and the output, while Sobol method attributes the variance of the output to variations in inputs or sets of inputs. However, it has not been discussed how this can help modelers/hydrologists with respect to hydrological processes’ understanding or model building. To address this issue, I strongly suggest authors provide their “objectives” and “research questions” in the introduction section by bullet points. This can properly highlight the novelty and significance of the study. Furthermore, considering the numerical results, authors should explicitly explain why and how each GSA method might be useful in the context of spatialized pesticide transfer modelling.

Comment 2. In my opinion, another major shortcoming of this paper is that there is no information about the convergence behavior of the GSA algorithms. As authors know, robust sensitivity analysis of the models typically requires many model runs, and hence considerable computational resources. So, due to the high number of model evaluations required by existing sensitivity analysis techniques and the computationally expensive nature of the models, analysts usually tend to conduct sensitivity analysis without evaluating its stability and convergence (for more discussion see, e.g., Sarrazin et al., 2016; Sheikholeslami et al., 2020). It is therefore common to choose the sample size only based on the available computational budget, which in turn can result in lack of robustness, and consequently contaminate the assessment of the sensitivities. In fact, since 5~10 years ago a surge of papers flooded the environmental modelling journals introducing/applying a sensitivity analysis technique to a model without analyzing the robustness and converges of the results. Authors should properly monitor/analyze the convergence properties of the utilized GSA techniques in identifying influential factors, for example by progressively increasing the sample size.

Comment 3. There is another important cost-effective strategy in the literature to accelerate GSA of the computationally expensive models, namely given-data approach to GSA (otherwise known as data-driven methods). To improve the literature review and strengthen the discussion part, authors can mention given-data approach in the revised manuscript. For a general review and discussion on these techniques see Sheikholeslami et al. (2021).

Comment 4. Going back to comment 3, I think an insufficient state-of-the-art has been performed in this study. There are many studies that have been previously undertaken to develop efficient screening techniques. Authors should consider existing literature in this context and perform a critical review. One notable example is the grouping approach introduced by Sheikholeslami et al. (2019). This approach uses agglomerative hierarchical clustering to categorize the parameters into distinct groups based on similarities between their sensitivity indices, and then ranks parameters according to importance group e.g., these could be labeled as “strongly influential”, “influential”, “moderately influential”, “weakly influential”, and “non-influential”) rather than individually (see Huo et al., 2019; Sheikholeslami et al., 2021 for further application of the grouping-based importance ranking approach). Other studies include Tang et al., 2007; Nossent et al., 2011; Touzani and Busby, 2014; Becker et al., 2018; etc.

Comment 5. While “parameter uncertainty” has been thoroughly analyzed in the paper, I could not find proper description about other important sources of uncertainty, particularly input data uncertainties, e.g., soil type, rain and PET forcing data. I recommend authors to add discussion regarding this important source of uncertainty which can significantly affect the model output variability. In fact, these forcings are typically the outputs of a long and complex modelling chains. Thus, PESHMELBA may simultaneously suffer from model parameter, model structure, and input uncertainties or other systematic uncertainties in precipitation bias correction, the estimation of potential evapotranspiration, or the uncertainty of deriving spatial basin scale meteorological input data.

Comment 6. Most of the existing literature on sensitivity analysis has typically been under the assumption that the controlling factors such as model parameters (processes) are independent, whereas, in many cases, they are correlated, and their joint distribution follows a variety of forms. However, very few studies in the field of water and environmental modeling address this issue. By way of example, Strobl et al. (2007) reported that when using permutation-based mean decrease in prediction accuracy as an importance measure, there might be bias in estimating importance of the correlated variables. Authors should highlight this in the revised manuscript by adding discussion on the significance of correlation effects in the utilized methods and then perhaps propose strategies (in future studies) for properly accounting for correlations in parameter (process) space.

Comment 7. I could not find any information on training and tuning of RF model. The possible inconsistency in SA results might be due to the issues in fitting RF to the input-output data. I strongly suggest authors provide details of building RF model. Furthermore, it’s not clear if RF was fitted on scalar quantities or temporal series. Without this information, results are not reliable and cannot be validated.

Comment 8. It would be interesting to see results of parameter ranking as well. Although these methods estimate different values for sensitivity indices in some cases, the ranking provided by these methods may be much more similar. Note that, in complex models, when the number of parameters is very large, we are typically not interested in an exact values of sensitivity indices. Instead, it may be more profitable to use the available computational budget to rank parameters in order of importance, e.g., “strongly influential”, “moderately influential”, and “non-influential”.

Comment 9. A possible direction for future research is to evaluate how sensitivity analysis results change by changing the selected parameter distributions (normal, log-normal, uniform,…) since there is an unavoidable uncertainty associated with defining feasible ranges of parameters.

References:
Becker, W. E., Tarantola, S., & Deman, G. (2018). Sensitivity analysis approaches to high-dimensional screening problems at low sample size. Journal of Statistical Computation and Simulation, 88(11), 2089-2110.

Huo, X., Gupta, H., Niu, G.Y., Gong, W. and Duan, Q., 2019. Parameter sensitivity analysis for computationally intensive spatially distributed dynamical environmental systems models. Journal of Advances in Modeling Earth Systems, 11(9), pp.2896-2909.

Nossent, J., Elsen, P., & Bauwens, W. (2011). Sobol' sensitivity analysis of a complex environmental model. Environmental Modelling & Software, 26(12), 1515-1525.

Sarrazin, F., Pianosi, F., & Wagener, T. (2016). Global Sensitivity Analysis of environmental models: Convergence and validation. Environmental Modelling & Software, 79, 135-152.

Sheikholeslami, R., & Razavi, S. (2020). A fresh look at variography: measuring dependence and possible sensitivities across geophysical systems from any given data. Geophysical Research Letters, 47(20), e2020GL089829.

Sheikholeslami, R., Gharari, S., Papalexiou, S. M., & Clark, M. P. (2021). VISCOUS: A Variance-Based Sensitivity Analysis Using Copulas for Efficient Identification of Dominant Hydrological Processes. Water Resources Research, 57, e2020WR028435.

Sheikholeslami, R., Razavi, S., Gupta, H. V., Becker, W., & Haghnegahdar, A. (2019). Global sensitivity analysis for high-dimensional problems: How to objectively group factors and measure robustness and convergence while reducing computational cost. Environmental modelling & software, 111, 282-299.

Tang, Y., Reed, P., Wagener, T., & Van Werkhoven, K. (2007). Comparing sensitivity analysis methods to advance lumped watershed model identification and evaluation. Hydrol. Earth Syst. Sci., 11, 793-817.

Touzani, S., & Busby, D. (2014). Screening method using the derivative-based global sensitivity indices with application to reservoir simulator. Oil & Gas Science and Technology–Revue d’IFP Energies nouvelles, 69(4), 619-632.

Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC bioinformatics, 8(1), 1-21.

Hide

ED: Reconsider after major revisions (25 Aug 2022) by Charles Onyutha

AR by Emilie Rouzies on behalf of the Authors (15 Jan 2023) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (16 Jan 2023) by Charles Onyutha

RR by Anonymous Referee #4 (12 Mar 2023)

ED: Publish subject to minor revisions (review by editor) (14 Mar 2023) by Charles Onyutha

AR by Emilie Rouzies on behalf of the Authors (20 Mar 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (22 Mar 2023) by Charles Onyutha

AR by Emilie Rouzies on behalf of the Authors (17 Apr 2023) Author's response Manuscript

Short summary

Water and pesticide transfer models are complex and should be simplified to be used in decision support. Indeed, these models simulate many spatial processes in interaction, involving a large number of parameters. Sensitivity analysis allows us to select the most influential input parameters, but it has to be adapted to spatial modelling. This study will identify relevant methods that can be transposed to any hydrological and water quality model and improve the fate of pesticide knowledge.