the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
ML-AMPSIT: Machine Learning-based Automated Multi-method Parameter Sensitivity and Importance analysis Tool
Abstract. The accurate calibration of parameters in atmospheric and Earth system models is crucial for improving their performance, but remains a challenge due to their inherent complexity, which is reflected in input-output relationships often characterized by multiple interactions between the parameters and thus hindering the use of simple sensitivity analysis methods. This paper introduces the Machine Learning-based Automated Multi-method Parameter Sensitivity and Importance analysis Tool (ML-AMPSIT), a new tool designed with the aim of providing a simple and flexible framework to estimate the sensitivity and importance of parameters in complex numerical weather prediction models. This tool leverages the strengths of multiple regression-based and probabilistic machine learning methods including LASSO, Support Vector Machine, Classification and Decision Trees, Random Forest, Extreme Gradient Boosting, Gaussian Process Regression, and Bayesian Ridge Regression. These regression algorithms are used to construct computationally inexpensive surrogate models to effectively predict model outputs from input parameters, thereby significantly reducing the computational burden of running high-fidelity models for sensitivity analysis. Moreover, the multi-method approach allows for a comparative analysis of the results. Through a detailed case study with the Weather Research and Forecasting (WRF) model coupled with the Noah-MP land surface model, ML-AMPSIT is demonstrated to efficiently predict the behavior of Noah-MP model parameters with a relatively small number of model runs, by simulating a sea breeze circulation over an idealized flat domain. This paper points out how ML-AMPSIT can be an efficient tool for performing sensitivity and importance analysis also for complex models, guiding the user through the different steps and allowing for a simplification and automatization of the process.
- Preprint
(12884 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on gmd-2024-56', Juan Antonio Añel, 12 May 2024
Dear authors,
I have to bring to your attention a couple of issues relative to our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlFirst, in your manuscript you have included a link to GitHub in the "Code and Data Availability" section. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other alternatives for long-term archival and publishing, such as Zenodo. In this case you have done it, but you have not included such information in your manuscript. To better comply with our policy I include here the link to the Zenodo repository where you have deposited part of the code used in your work, specifically the ML-AMPSIT code "http://dx.doi.org/10.5281/zenodo.10789930". Please, in future versions of your manuscript remove the GitHub link and include the one to Zenodo.
Also, you must specify in the "Code and Data Availability" section the versions of WRF and Hoah-MP that you use in your work, and you must include a permanent repository with a DOI from where it is possible to retrieve them. Additionally, being this a paper based on a machine learning code, which needs training, you must publish in a permanent repository the data that you use for the training and produces the results that you present.
Therefore, please, reply to this comment as soon as possible with the WRF and Noah-MP versions that you use, and the links and DOIs for them and the training data.
Please, be aware that if you do not fix this problem, we will have to reject your manuscript for publication in our journal. I should note that, given this lack of compliance with our policy, your manuscript should not have been accepted in Discussions. Therefore, the current situation with your manuscript is irregular.
Juan A. Añel
Geosci. Model Dev. Exec. EditorCitation: https://doi.org/10.5194/gmd-2024-56-CEC1 -
AC1: 'Reply on CEC1', Dario Di Santo, 13 May 2024
Dear Dr. Añel,
Thank you for your valuable comments, we appreciate your guidance and aim to address the issues promptly.
Concerning the versions of the Weather Research and Forecasting (WRF) model and Noah-MP used in our research, we used WRF version 4.4, which inherently includes a built-in version of Noah-MP v4.4. Here is the WRF doi:10.5065/D6MK6B4K.
Additionally, we have ensured that all data utilized for our analyses are now available for access. The dataset can be retrieved from Zenodo at https://dx.doi.org/10.5281/zenodo.11184569.
We will promptly include all the necessary links and DOIs in the next version of our manuscript.
Please let us know if there are any further adjustments or missing information needed to meet the journal's requirements.
Best regards,
Di Santo Dario
Citation: https://doi.org/10.5194/gmd-2024-56-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 18 May 2024
Dear authors,
Thanks for addressing this issue.
Best regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/gmd-2024-56-CEC2
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 18 May 2024
-
AC1: 'Reply on CEC1', Dario Di Santo, 13 May 2024
-
RC1: 'Comment on gmd-2024-56', Anonymous Referee #1, 13 Jun 2024
In the manuscript “ML-AMPSIT: Machine Learning-based Automated Multi-method Parameter Sensitivity and Importance analysis Tool” the authors present a software tool to estimate importance of parameters in weather prediction models. The tool is based on the use of several machine learning (ML) models as computationally cheap surrogate models, which can be used to emulate large numbers of experiments and to estimate the importance or sensitivity of the potentially many parameters to be calibrated. The ML-AMPSIT provides the user with several functions that can be used to assess the convergence and the quality of the trained methods, and the multi-method approach also provides an estimate of how reliable the estimated importance is. The authors also present an application of their framework on the Weather Research and Forecasting (WRF) model coupled with the Noah-MP land surface model, by simulating a sea breeze circulation over an idealized flat domain and predicting the effects of 6 parameters on the model output.
Overall, the manuscript is clearly written and understandable. The authors recognize the crucial importance and the challenges posed by sensitivity analysis experiments in the calibration of weather prediction models, which gives the manuscript a clear motivation. The working of the ML-AMPSIT framework is presented well, and so are the results of its application on the case study.
There are only some points where the authors should clarify their statements or expressions, or provide more extensive explanations, in order to improve the clarity and the overall quality of the manuscript. I will list these points below. Once these points have been successfully addressed, I would be happy to recommend the manuscript for publication.
- In the introduction it is mentioned that “ML techniques have gained traction in weather and climate modeling and observations […] particularly in parameter optimization tasks like calibration”, but I feel several relevant works exploring the use of emulators for tuning weather prediction and climate models, closely related to the long-term aims of the authors as far as I can interpret, are missing. I feel these should be cited. Here are a few examples.
- Daniel Williamson, Michael Goldstein, Lesley Allison, Adam Blaker, Peter Challenor, Laura Jackson, and Kuniko Yamazaki, “History matching for exploring and reducing climate model parameter space using observations and a large perturbed physics ensemble” (2013)
- Fleur Couvreux et al., “Process-Based Climate Model Development Harnessing Machine Learning: I. A Calibration Tool for Parameterization Improvement” (2020)
- Katherine Dagon, Benjamin M. Sanderson, Rosie A. Fisher, and David M. Lawrence, “A machine learning approach to emulation and biophysical parameter estimation with the Community Land Model, version 5” (2020)
- Duncan Watson-Parris, Andrew Williams, Lucia Deaconu, and Philip Stier, “Model calibration using ESEm v1.1.0 – an open, scalable Earth system emulator” (2021)
- Davide Cinquegrana, Alessandra Lucia Zollo, Myriam Montesarchio, and Edoardo Bucchignani, “A Metamodel-Based Optimization of Physical Parameters of High Resolution NWP ICON-LAM over Southern Italy” (2023)
- In Page 4 it is stated that “There is no upper limit for the number of parameters that can be analyzed”, but of course the higher the dimensionality the harder the training of a surrogate can become, It would be useful to specify here how the number of simulations required scales with the number of parameters.
- In Page 7, Eq. (2), a definition of the terms V_{i,j,...} is missing, and should be added.
- In the Sections from 2.3.1 to 2.3.5 it is unclear how these different algorithms are used to compute an importance metric for the parameters. As far as I understood, the Sobol indices (the first-order one specifically) are computed only using Gaussian processes and Bayesian ridge regression. What is then precisely done when using the other ML algorithms explained? This explanation should be added to the manuscript.
- In Page 10, Section 2.3.6, the authors state that “GPR is a non-parametric method, i.e., it does not make assumptions about the functional form of the relationship between the input and output variables”. The underlying assumptions on the functional form are contained in the chosen kernel, so there are in fact assumptions one has to make when using Gaussian processes. Maybe the authors here mean that there is no assumption of linearity with the chosen RBF kernel (as they specify later on)? Also, it seems that the authors do train the parameters of the kernel (e.g., lengthscale), so the adjective “non-parametric” may be confusing here.
- In Page 10, Section 2.3.7, it should be specified what E and H in the equations mean in the context of the problem considered.
- In page 11, Section 2.3.7, the authors state “The same procedure used for the GPR algorithm to leverage the probabilistic output for deriving feature importance coefficients is also implemented here to compute the Sobol first-order sensitivity index”. I find confusing why the probabilistic nature of GPR or BRR is important for the calculation of the Sobol indices. In principle also ‘deterministic’ emulators like neural networks can be used to calculate Sobol indices. Can the authors comment on what they mean with this?
- In Page 13, the authors write “The spread of the ensemble tends to be larger over water than over land, especially before sunrise, indicating that the variation of the input parameters has a larger effect on v over water”. Since most of the parameters varied were land-related parameters, I find this seemingly counterintuitive. Do the authors have a qualitative explanation for that?
- From Page 14, when presenting the results the authors refer to the “importance” of the parameters, but no formula for this was given, especially in the context of LASSO, SVM, CART, RF, XGBoost. Please add a proper definition of it in the manuscript.
- In the end, Page 26, the authors state “It is then clear that ML-AMPSIT significantly reduces the number of simulations needed for sensitivity analysis and extraction of feature importance”. I find this a bit of a strong statement that should be mitigated. It is by no means clear that 20 or 30 simulations will be sufficient to train the emulators to reach faithful outputs. Specifically, as pointed out by the authors, the comparable performance of the investigated methods suggests the absence of strong non-linearities, which obviously renders the training of the methods more efficient. I expect that in presence of strong non-linearities the amount of training data will need to be increased, and so it remains a question as to whether this number will be systematically smaller than the other existing methods.
Citation: https://doi.org/10.5194/gmd-2024-56-RC1 - AC3: 'Reply on RC1', Dario Di Santo, 02 Aug 2024
- In the introduction it is mentioned that “ML techniques have gained traction in weather and climate modeling and observations […] particularly in parameter optimization tasks like calibration”, but I feel several relevant works exploring the use of emulators for tuning weather prediction and climate models, closely related to the long-term aims of the authors as far as I can interpret, are missing. I feel these should be cited. Here are a few examples.
-
CC1: 'Comment on gmd-2024-56', Benjamin Püschel, 21 Jun 2024
This comment was written by Benjamin Püschel and Isabella Winterer, two master students in Meteorology at the University of Vienna, enrolled in the seminar course "Paper Club", coordinated by Prof. Andreas Stohl and Dr. Stefano Serafin, both of whom also contributed to this comment.
In this course students are asked to critically review papers that are already published or, in this case, in discussion. The paper by Di Santo et al. was taken as an example at which the students practiced their skills in critically reviewing scientific literature. Below we report the most important points that came up in the process.
General assessment:
We found the paper by Di Santo et al. interesting to read and a timely contribution to the scientific literature but have identified a few points, where improvements could be made. We hope that the authors, reviewers and the editor will find the comments below helpful.
The manuscript presents a novel tool to perform sensitivity and importance assessments of some parameters used in a NWP model. The tool applies multiple established machine learning methods to construct inexpensive surrogate models of the system and calculate parameter sensitivity and feature importance. The output allows users to assess the quality and convergence of the ML methods and their resulting feature importance values. The tool's functionality is demonstrated in a case study using the NWP model WRF coupled with the LSM model Noah-MP which examines the sensitivity of land/sea breeze simulations to six surface parameters.
Major Comments:
- Sec. 2.3.1-2.3.7 & L80-82 While it is stated that the included ML methods are among the most commonly used, further justification is needed as to why exactly these seven methods are utilized. In particular, the utilization of tree-based methods requires explanation, as they demonstrate lower performance compared to other methods. Could they perform better or give additional insights in other cases? Otherwise, they might not be useful enough to be included in the tool.
- Sec. 2.3.1-2.3.5 We suggest a more detailed description of how feature importance is calculated/extracted for the methods LASSO, Support Vector Machine, Classification and Decision Trees, Random Forest, Extreme Gradient Boosting. We realized that the sum of the importances of all features does not equal 1 for all ML methods, suggesting that the feature importances are not normalized (e.g. Figs 10 & 11). However, non-normalized feature importances would not allow for direct comparisons of values between different ML methods (as done in e.g. L443-446). An explanation of the feature importance calculation would greatly clarify these ambiguities.
- Sec. 2.3.8 The algorithm depends on an initial guess of the plausible ranges of the hyperparameters/features whose importance is being estimated. The range boundaries of the six tested hyperparameters are not clearly justified in this work, and they do not seem to be adjustable by the user (in configAMPSIT.json). Likely, the feature importance estimate will be inaccurate if the initial parameter ranges are unrealistic. Some additional discussion of this aspect, and greater flexibility in the configuration of the algorithm, would be desirable.
- The paper is highly technical but lacks physical interpretation of the results. Physical explanations like the one given in lines 434-435 should be added also elsewhere. This would help the readers to better understand the usefulness of the tool in the concrete case presented.
Minor Comments:
- In the model setup, while other boundary conditions are reported, the sea surface temperatures used are not.
- Reduce the number of plots/subplots, especially if they don’t contain additional information. e.g. only show subplots with interesting vertical variation of Figs 12 & 13; One plot showing the mean vertical variation in MSE over land instead of Fig 14 & 15 would be enough to visualize the takeaways in L460-465.
- The quality of most figures is not entirely satisfying but could be improved with relatively little effort. For instance:
- Add a grid to the background of all figures.
- Increase font size in legends of Figs 3 & 4.
- Increase font size of labels in Fig 5 and title of subplot c).
- Add a second y-axis for the p-value in Figs 5, 8, 9 as it is close to 0.
- Swap x- and y-axis in Figs 12, 13, 14, 15 since height coordinates are usually represented on the y-axis.
- Increase line width and use both colors and line styles to differentiate between lines in all plots. This would greatly increase visibility, especially for color-blind people.
- Is there a reason why the area under the curves is colored in the feature importance timeseries? (Figs 5, 10, 11).
- Typos in L123, 128, 151, 170, Fig 1: scriptnames should be *.ipynb instead of *.ipybn.
- L432 Fig 11 should be linked.
- The paragraph L411-422 could link to Figs 8 & 9 more often for clarity and convenience of reading.
Citation: https://doi.org/10.5194/gmd-2024-56-CC1 -
AC4: 'Reply on CC1', Dario Di Santo, 02 Aug 2024
We thank Benjamin Püschel, Isabella Winterer, Prof. Andreas Stohl and Dr. Stefano Serafin for their insightful comments. We appreciate that you have chosen our paper for your seminar course. Your detailed suggestions will be very useful to improve the quality of the paper. Thanks! Please find attached the detailed report outlining our responses to each specific comment.
-
RC2: 'Comment on gmd-2024-56', Anonymous Referee #2, 26 Jul 2024
General comments: I found the paper by Di Santo et al. to be an interesting contribution to the scientific literature.
The manuscript presents a framework that aims at providing a flexible and easy-to-use framework for performing sensitivity and importance analysis for complex models using surogate models generated through several ML techniques.
However, I have identified several areas where improvements could be made. I hope that the authors, reviewers and editor will find the suggestions below constructive and useful.
A general suggestion to the manuscript readability due to the high number of acronyms and codes declared in the manuscript, is to add an acronym table that condensate abbreviations and, other table in the methodology section with the main characteristics of the mathematical techniques to help the reader to not to be overwhelmed with the immediate information of all these methods and their details
Ending the Introduction to make smooth transitions a connector paragraph is needed to have smooth transitions between sections.
Could be helpful as well an introductory paragraph of the Methods section.
Specific comments:
-In Page 7, Eq. (2). Terms definition missing V_{i,j,...}
-Page 12 Eq(5). Introduce terms in the equation that are not described in the text θ_s and put units into []. What is TOPMODEL?
Paragraph 355, page 13. give more arguments about the selection of the two locations (one over land and one over water), I know are very different locations but explain the reader that you want to have two places that represent different dynamics in the model due to the input parametrizations of each site.
Section 3.2 Model setup should have a Figure with the characteristics of the domain or at least a table that summarize the main characteristics of the model domains.
Agree with the minor comment on CC1: 'Comment on gmd-2024-56', Benjamin Püschel, 21 Jun 2024 :
"
- The quality of most figures is not entirely satisfying but could be improved with relatively little effort. For instance:
- Add a grid to the background of all figures.
- Increase font size in legends of Figs 3 & 4.
- Increase font size of labels in Fig 5 and title of subplot c).
- Add a second y-axis for the p-value in Figs 5, 8, 9 as it is close to 0.
- Swap x- and y-axis in Figs 12, 13, 14, 15 since height coordinates are usually represented on the y-axis.
- Increase line width and use both colors and line styles to differentiate between lines in all plots. This would greatly increase visibility, especially for color-blind people.
- Is there a reason why the area under the curves is colored in the feature importance timeseries? (Figs 5, 10, 11).
"
About the references section
- Is suggested to add a couple references more from year 2024 to update the state of the art of the manuscript
- Put all the dates in the reference section homogeneously, i.e all "....(year).....no: "..........(month year)..........."
Citation: https://doi.org/10.5194/gmd-2024-56-RC2 - AC2: 'Reply on RC2', Dario Di Santo, 02 Aug 2024
Model code and software
ML-AMPSIT Dario Di Santo, Cenlin He, Fei Chen, and Lorenzo Giovannini https://doi.org/10.5281/zenodo.10789930
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
433 | 172 | 40 | 645 | 25 | 18 |
- HTML: 433
- PDF: 172
- XML: 40
- Total: 645
- BibTeX: 25
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1