Determining the sensitive parameters of WRF model for the prediction of Tropical cyclones in Bay of Bengal using Global sensitivity analysis and Machine learning

The present study focuses on identifying the parameters from the Weather Research and Forecasting (WRF) model that strongly influence the prediction of tropical cyclones over the Bay of Bengal (BoB) region. Three global sensitivity analysis (SA) methods namely the Morris One-at-A-Time (MOAT), Multivariate Adaptive Regression Splines (MARS), and surrogate-based Sobol' are employed to identify the most sensitive parameters out of 24 tunable parameters corresponding to seven parameterization schemes of the WRF model. Ten tropical cyclones across different categories, such as cyclonic storms, severe cyclonic storms, and very severe cyclonic storms over BoB between 2011 and 2018, are selected in this study. The sensitivity scores of 24 parameters are evaluated for eight meteorological variables. The parameter sensitivity results are consistent across three SA methods for all the variables, and 8 out of the 24 parameters contribute 80%-90% to the overall sensitivity scores. It is found that the Sobol' method with Gaussian progress regression as a surrogate model can produce reliable sensitivity results when the available samples exceed 200. The parameters with which the model simulations have the least RMSE values when compared with the observations are considered as the optimal parameters. Comparing observations and model simulations with the default and optimal parameters shows that predictions with the optimal set of parameters yield a 19.65% improvement in surface wind, 6.5% in surface temperature, and 13.2% in precipitation predictions, compared to the default set of parameters.


Introduction
The Indian subcontinent is vulnerable to tropical cyclones which develop in the North Indian Ocean (NIO) that consists of the Arabian Sea and the Bay of Bengal (BoB). These cyclones invariably cause widespread destruction to life and property. During the pre-monsoon and post-monsoon seasons, the it brings several obstacles. First, a vast number of model simulations are required to perform parameter optimization, and the order goes beyond 10 4 with an increase in parameter dimension. Second, the WRF model can simulate various meteorological variables, and each parameter may influence more than one variable. Thus, the parameter optimization needs to consider several variables at once, which increases the computation cost even further (Chinta & Balaji, 2020). With the current situation and availability of computational resources keeping in mind, performing thousands of numerical simulations for long periods such as tropical cyclones is extremely expensive. The best remedy is to first use sensitivity analysis to identify the parameters that significantly impact the model simulation. This is to be followed by calibration of these parameters to reduce the model prediction error.
Sensitivity analysis is the method of uncertainty estimation in model outputs contributed by the variations in model inputs (Saltelli, 2002). Several researchers (Green & Zhang, 2014;Quan et al., 2016;Di et al., 2017;Ji et al., 2018;Wang et al., 2020;Chinta et al., 2021) have conducted sensitivity analysis of a number of parameters using various methods in the WRF model. Green & Zhang (2014) studied the sensitivity of four parameters in the WRF model to the intensity and structure of Hurricane Katrina, using single and multi-parameter designs, and reported that two parameters significantly affect the intensity and the structure. Quan et al. (2016) examined the sensitivity of 23 adjustable parameters of the WRF model to 11 atmospheric variables for the simulations of 9 five-day summer monsoon heavy precipitation events over the Greater Beijing, using the Morris One-at-A-Time (MOAT) method. The results showed that 6 out of 23 parameters were sensitive to most variables and five parameters were sensitive to specific variables. Di et al. (2017) conducted sensitivity experiments of 18 parameters of the WRF model to the precipitation and surface temperature, for the simulations of 9 two-day rainy events and nine two-day sunny events, over Greater Beijing. The authors have adopted four sensitivity analysis methods, namely the delta test, the sum of trees, Multivariate Adaptive Regression Splines (MARS), and the Sobol' method. The results showed that five parameters greatly affected the precipitation, and two parameters affected surface temperature. Ji et al. (2018) investigated the sensitivity of 11 parameters to the precipitation and its related variables using the WRF model, for the simulations of a 30-day forecast, over China. The MOAT and surrogate-based Sobol' methods for the sensitivity analysis were used, and it was seen that the Gaussian Process Regression (GPR) based Sobol' method was found to be more efficient than the MOAT method. The results also showed that four parameters significantly affect the precipitation and its associated quantities. Wang et al. (2020) studied the sensitivity of 20 parameters to various meteorological and model variables, for 30-day simulations, over the Amazon region. The MOAT, MARS, and surrogate-based Sobol' methods for sensitivity analysis were employed, the results showed that the three methods were consistent, and six out of twenty parameters contribute to 80% − 90% of the total variance. Chinta et al. (2021) studied the sensitivity of 23 parameters to eleven meteorological variables for the simulations of twelve 4-day precipitation events during the Indian Summer monsoon, using the WRF model. The sensitivity analysis was conducted using the MOAT method with ten repetitions, and the results showed that 9 out of 23 parameters have a considerable impact on the model outputs. These studies show that hundreds of numerical simulations are required to perform sensitivity analysis. Thus, while selecting the sensitivity analysis methods and the number of parameters, the computational coast is a critical factor to be taken into consideration. Razavi & Gupta (2015) extensively studied the impact of numerous sensitivity analysis methods and reported that each method works based on a different set of ground-level definitions, and the results from these methods do not always coincide. The studies proposed that while selecting a global sensitivity analysis method, one needs to consider four important characteristics, namely (i) local sensitivities, (ii) the global distribution of local sensitivities, (iii) the global distribution of model responses, and (iv) the structural organization of the response surface. The studies also reported that relying on only one sensitivity analysis method may not yield feasible results since one single method may not be able to bring out all the characteristics fully. From these studies, one can infer that more than one SA method needs to be explored to improve the level of confidence in the results obtained from sensitivity studies.
The objective of the present study is to assess the sensitivity of the WRF model parameters to various meteorological variables such as surface pressure, temperature, wind speed, precipitation, and model variables such as radiation fluxes and boundary layer height, for the simulations of tropical cyclones over the BoB region, using three different global sensitivity analysis methods. This paper is organized as follows: a brief description of sensitivity analysis methods is presented in section 2. Section 3 presents the design of numerical experiments and sensitivity experimental setup.
Section 4 shows the results of the three sensitivity analysis methods and a comparison between simulations and observations, and section 5 gives the summary and conclusions.

Sensitivity Analysis Methods
Sensitivity analysis is the assessment of uncertainties in model outputs that are attributed to the variations in inputs factors (Saltelli et al., 2008). The sensitivity analysis proceeds as follows: (1) selecting the right model and corresponding best set of physics schemes, (2) identifying the adjustable input parameters and corresponding ranges, (3) choosing the sensitivity analysis methods, (4) running the design of experiments to generate the sample set of input parameters and running the model using these parameter sets, and (5) analyzing the model outputs obtained by different parameter samples and quantifying the sensitivity of selected parameters.
Sensitivity analysis methods are classified as derivative-based, response-surface-based, and variancebased approaches Wang et al. (2020). In mathematical terms, the change of an output concerning the change in the input is referred to as the sensitivity of that input, which is the principle of derivativebased SA. The Morris One-at-A-Time (MOAT) is a derivative based SA method(see subsection 2.1).
The response-surface-based approach works on the differences between the responses of a mathematical model with all the input factors against that built with all but a particular input factor. The Multivariate Adaptive Regression Splines (MARS) method comes under this category (see subsection 2.2). For the variance-based approaches, the sensitivity of an input variable is defined as the contribution of the variance caused by the variable in question to the total variance of the model output. In mathematical terms, if the model output variance is decomposed by the contributions of each individual and combined interactions, then the highly sensitive factors will have a more significant variance contribution. The Sobol' sensitivity analysis comes under the variance-based approach (see subsection 2.3). The MOAT method requires a uniform space-filling design, whereas the MARS and Sobol' methods require random space-filling designs. The MOAT and MARS methods give a more qualitative analysis, whereas the Sobol' method gives a quantitative analysis (Wang et al., 2020). As already stated by Razavi & Gupta (2015), unique sensitivity analysis methods for all applications are scarce in the literature. Furthermore, they observed that the use of more versatile SA methods could improve the confidence in sensitivity results by compensating for the drawbacks of the individual SA methods. Thus, in the present study, three widely used SA methods are selected for sensitivity analysis because of the differences that exist in their methodology, as a consequence of which the parameters that are sensitive to the numerical model are studied. One can then extract those parameters which turn out to be significant in all the methods under consideration, thereby buttressing the argument. These are the most influential parameters we need to work out in order to improve the forecast skill.

The MOAT method
The MOAT is a derivative-based sensitivity analysis method, also known as elementary effects method, which evaluates the parameter sensitivity according to the elemental effects of individual parameters (Morris, 1991). Consider a model with n input parameters X = (x 1 , x 2 , ...x n ) with variability in their ranges. The parameters are normalized to bound between [0,1]. The parameter space is divided into p equally dispersed intervals, which can be filled with the discrete numbers of [0, 1 p−1 , 2 p−1 , ..., p−2 p−1 , 1]. Here p is a user defined integer. An initial vector of input parameter X 1 = (x 1 1 , x 1 2 , ...x 1 n ) is randomly created by taking values from the defined parameter space. Following the One-at-A-Time method, one parameter is selected and perturbed by ∆, i,e., X 1 m = (x 1 1 , x 1 2 , ..., x 1 m ± ∆, ..., x 1 n ). Here ∆ is a randomly selected multiple of 1 p−1 . The model is run using these initial and perturbed vectors, and the elemental effect of of that parameter is calculated as: The subscript m implies the m th parameter is perturbed and the superscript 1 is the indication of 1 st MOAT trajectory. In a single trajectory, this process is repeated for all parameters to compute the elementary effects of every parameter. The entire trajectory is replicated r times randomly to obtain the reliable sensitivity results. At the end of the process, a total of r × (n + 1) model simulations are evaluated to complete the MOAT sensitivity analysis. A modified mean of |EE m |, µ m , and the standard deviation of |EE m |, σ m are constructed as the sensitivity indices of input parameter x m , as given below A high value of µ m implies that the parameter x m has a more significant impact on the model output. In contrast, a high value of σ m indicates the nonlinearity of x m or high interactions with other parameters.

The MARS method
MARS is an extension of Recursive Partition Regression model with the ability of continuous derivative (Friedman, 1991). The model is constructed by a forward and a backward passes: the forward pass divides the entire domain into a number of partitions and a overfitted model is produced by localized regressions in every partition, and the backward pass prunes the overfitted model to a best model by repeatedly removing least concerned basis function at a time. The MARS model can be decomposed as: The basis functions can be a constant (B 0 ), a hing function (B m (x i )), or a product of two or more hing functions (B m (x i , x j )). The coefficients (a 0 , a 1 , ..a m ) are determined by linear regression in every partition. The Generalized Cross Validation (GCV) score of every model during the backward pass is calculated as: Here N is the number of samples before pruning, y i is the target data point,f m (X i ) is the m th model estimated data point corresponding to the input data X i , and C(m) is the penalty factor accounted for the increase in variance due to the increase in complexity. The difference between the GCV scores of the pruned model with the over-fitted model is measured as the importance of that parameter that has been removed. This implies that a higher difference indicates a higher sensitivity of that parameter.

The Sobol' method
The Sobol' sensitivity analysis works on the basis of variance decomposition (Sobol, 2001). Consider a response function f (x) of a random vector x. The ANalysis Of VAriance (ANOVA) decomposition of f (x) is written as: The variance of f (x) can be expressed as the contributions of variance of each term in the equation (6), i,e., Where n is the total number of parameters, D is the total variance of output response function, D i is the variance of x i , D ij is the variance of interactions of x i and x j , and D 12...n is the variance of interactions of all parameters. The Sobol' sensitivity indices of a particular parameter are defined as the ratio of individual variances to the total variance, and these can be written as: ... and S 12...n = D 12...n D These indices explain the effects of first order, second order, and total order interactions, respectively.
From equations (8) and (9) it is evident that the sum of all the indices is equal to 1. Finally, the total order sensitivity index of i th parameter can be calculated as the sum of all the interactions of that parameter, i,e., Generally, while the computation of first and second order effects is rather straight forward, the calculation of higher order effects is very expensive because the dimension of the higher order terms is very large.
To solve this problem, Homma & Saltelli (1996) introduced a new total sensitivity index as Where D −i indicates the total variance of response function without the consideration of the effects of the i th parameter. A higher total order sensitivity index implies higher importance of that parameter.

WRF model configuration and adjustable parameters
In the present study, the Advanced Research WRF (WRF-ARW) model version 3.9 (Skamarock et al., 2008) is Parameterization schemes represent the physical processes that are unresolved by the WRF model.
The WRF model consists of seven different parameterization schemes: microphysics, cumulus physics, short wave and longwave radiation, planetary boundary layer physics, land surface physics, and surface layer physics. The parameterization schemes used in this study are: rapid radiative transfer model (Mlawer et al., 1997) for longwave radiation, Dudhia shortwave scheme (Dudhia, 1989) for shortwave radiation, revised MM5 scheme (Jiménez et al., 2012) for surface layer physics, Unified Noah land surface model (Mukul Tewari et al., 2004) for land surface physics, Yonsei University Scheme (YSU)  for planetary boundary layer physics, Kain-Fritsch (Kain, 2004) for cumulus physics, and WRF Single-Moment 6-class (WSM6) scheme (Hong & Lim, 2006) for microphysics. A total of 24 tunable parameters are selected based on the guidance from literature (Di et al., 2015;Quan et al., 2016). The list of parameters and corresponding ranges are presented in Table 1. Though the selected parameter may not cover the entire existing parameters, the availability of computational resources limits the experimental design. Even so, the experimental design is based on the most critical parameters that are more likely to influence the model output significantly.

Simulation events, WRF model output variables, and observational data
In the present study, ten tropical cyclones that originated in the Bay  (Srikanth et al., 2012). Figure 2 illustrates the IMD observed tracks of selected cyclones, with a clear indication of their category. Table 2 presents  data (Ashrit et al., 2020) and Integrated Multi-satellitE Retrievals for GPM (IMERG) dataset (Huffman, G and Savtchenko, AK, 2019). The IMDAA data is available at 0.12 • × 0.12 • resolution with a six-hour latency and the IMERG data is available at 0.1 • × 0.1 • resolution with a thirty-minute latency. Since the model resolution is close to the validation data resolution, it results in very little or no loss of data after regridding takes place. The accumulated precipitation data for validation is taken from IMERG data, while the remaining variables are taken from IMDAA data. Apart from this data, the maximum sustained wind speed (MSW) observations at the storm center for every cyclone, provided by the IMD at 3-hour intervals, are also used for validation.

Experimental setup
The sensitivity analysis requires a large set of values of the parameters that are then assigned to the WRF model, following which simulations are performed. Uncertainty Quantification Python Laboratory In contrast, the MARS and Sobol' methods require a different set of samples compared to the MOAT method. Based on the previous studies (Ji et al., 2018;Wang et al., 2020), the quasi-Monte Carlo (QMC) Sobol' sequence design (Sobol', 1967) is employed to create 250 parameter samples, using UQ-PyL package for each event. Similar to the MOAT method, these parameter samples are assigned in the WRF model, and another 2500 simulations are performed for the cyclones under consideration. The output variables are extracted and stored at 6-hour intervals. The evaluation of sensitivities using the MOAT method requires simulations only from the WRF model. In contrast, the MARS and Sobol' methods require skill score metrics between the simulation and observations. In the present study, the RMSE score between simulation and observation is employed as the skill score metric, which is formulated as Where I and J are the number of grid points in lateral and longitudinal direction, K is the dimension of times, L is the number of cyclones, sim is the simulated value, and obs is the observed value. Since the same parameter set is employed for all the cyclones, equation (12) is employed to get one RMSE value corresponding to one parameter sample. The parameter set and RMSE are given as inputs and targets to the MARS solver, and the MARS sensitivity indices are computed following GCV equation (5).
The Sobol' method, as a quantitative sensitivity analysis method, gives more accurate and robust results, albeit at a much higher computational cost. The Sobol' method may require [10 3 ∼ 10 4 × (n + 1)] (i.e., n is the number of parameters) number of model runs to get accurate results. This is exceedingly challenging even if supercomputing facilities are available. To circumvent this difficulty, one can use the surrogate models instead of running the WRF model for more simulations. The surrogate models are powerful machine learning tools that can correlate the empirical relations between inputs (i.e., parameter set) and the targets (i.e., RMSE matrix). In the present study, five different surrogate models namely Gaussian Process Regression (GPR) (Schulz et al., 2018), Support Vector Machines (SVM) (Radhika & Shashi, 2009), Random Forest (RF) (Segal, 2005), Regression Tree (RT) (Razi & Athappilly, 2005), and K Nearest Neighborhood (KNN) (Rajagopalan & Lall, 1999) are selected for evaluation. The surrogate models are provided with the parameter set as inputs and the RMSE as the target, and the models are trained on this data. The goodness of fit is considered as the accuracy metric, which is calculated as, Where N is the total number of samples, y i is the true value,ŷ i is the predicted value, and y is the mean of true values. The accuracy of the surrogate models is examined by the application of 10 fold cross-validation, which is implemented as follows. The entire data is divided into ten equally spaced subsets. The data in k th fold is kept as the test set, whereas the data from the remaining folds is taken as the training set. The surrogate model gets trained on this training set, and the predictions corresponding to the test set are estimated. This procedure is iterated for all folds, and the predictions of all folds are stacked into one set. This way, an entire prediction set corresponding to the test set is generated.
These two sets are provided as predictions, and true values to equation (13), and the goodness of fit  Wernli et al. (2008), is used to quantify the accuracy of simulations with optimal and default parameters. The structure is calculated as the difference between simulated and observed fields of the ratio of the total value to its maximum value. The structure provides information regarding the shape and size of the variable. The amplitude value is calculated as the difference between the domain average from the simulations and the observations. The location values are measured as the normalized distance from the center of mass of the simulated field to the observation. A better simulation will have S, A, and L values close to zero, signifying a higher accuracy of that simulation.

Results and Discussion
In this section, the results of the sensitivity studies are presented in subsection 4.1, physical interpretation of the parameters is discussed in subsection 4.2, and the results of a comparison study between simulations using the default and optimal parameters are presented in subsection 4.3.  Figure 3 shows that parameter P14 has the highest sensitivity to most of the variables, followed by parameter P6. The parameters P3, P4, P10, P15, P17, P21, and P22 also show high sensitivity to at least one of the variables.
In contrast, the parameters P1, P8, P11, P13, P16, P18, and P20 seem insensitive to any one of the variables, and the remaining parameters have a minimal contribution. A close observation of Figure 3 reveals that the variables OLR and DSWRF having the highest sensitivity to just one parameter each, whereas the remaining variables exhibit the highest sensitivity to at least two parameters.
The quantification of uncertainties that lie in these sensitivities is achieved by observing the distribution of the parameters. Since the available data points are limited to only ten samples, a resampling method can be employed to procure more samples without further numerical model runs. The bootstrap resampling (Efron & RJ, 1993) is an efficient way to generate the same number of samples as the original dataset, with replacement allowed. In the present study, the bootstrap method is employed for 100 applications to generate ten samples with replacement. In this way, a new dataset of (100 × 10) is created for one parameter corresponding to one variable. The distribution of each parameter is illustrated as a boxplot in Figure 4. In this figure, for every parameter, the horizontal red line inside the box indicates the median value, the upper and the lower bounds of the box are (mean ± one standard deviation value), and the upper and lower whiskers are the maximum and minimum values. The boxplot shows that the most sensitive parameters exhibit either a higher variance or a higher median value (Wang et al., 2020).
For the variable OLR, Figure 4(f) shows that the parameter P10 has the highest median value with large variance, whereas the parameters P6 and P12 have the least median value with large variances. Figure   4(g) shows that parameter P14 has the highest sensitivity to the variable DSWRF and has a very minimal variance, whereas the sensitivity of the remaining parameters is comparably very minimal. Figures 4(c,e,h) show that the variables SAP, PBLH, and DLWRF have more than three sensitive parameters.
The results show that except for the variables DSWRF and OLR, all the variables have at least two high sensitive parameters. The results obtained by the boxplot strengthen that of the heatmap results.

MARS method
The GCV scores of 24 parameters corresponding to the selected variables are calculated based on the MARS method. Figure 5 illustrates the heatmap of normalized GCV scores, with 1 indicating the highest sensitivity and 0 indicating the least sensitivity. The intensity signatures of Figure 5 are very consistent with that of Figure 3. The results show that most of the variables are sensitive to P14, followed by Parameter P6. In addition to this, the parameters P3, P4, P10, P15, P17, and P22 are seen to affect at least one of the dependent variables. The results also reveal that P1, P2, P8, P11, P13, P16, P18, P19, P20, P21, and P24 do not exert a significant influence on any of the variables. A close observation of Figure 5 reveals that the variables SAT, OLR, and DSWRF are sensitive to only one parameter each. In contrast, variables SAP, PBLH, and DLWRF are influenced by more than three parameters. The distribution of results is obtained by applying the bootstrap method, which is employed for 100 applications to generate 250 samples with replacement. In this way, a new dataset of (100 × 250) is created for one parameter corresponding to one variable. Figure 6 shows the boxplot of the MARS GCV scores generated by the bootstrap resampling dataset. Figures 6(b,f, Validation of surrogate models:. Figure 7 shows the distribution of R 2 scores of different surrogate models for the selected meteorological variables by applying bootstrap resampling. In Figure 7, each subplot corresponds to one meteorological variable, and the horizontal and vertical axes indicate the surrogate models and the goodness of fit (R 2 ) value, respectively. For every meteorological variable, the GPR model has the highest R 2 value, which is close to 1, and the variance is also minimal. This implies that the GPR model can accurately correlate the empirical relations between inputs and outputs. In contrast, the remaining surrogate models show high variance in respect of at least one of the variables. Figure 7(c) shows that the regression tree has the highest variance with the least R 2 value, and the minimum whisker lies below zero, which indicates the inability of the RT in capturing the correlations. In every subfigure, the R 2 value of KNN is close to 0.5, which implies that the model can explain only 50% of the total variance around its mean. The surrogate models SVM and RF have very close accuracy except for the variable ORL, in which the SVM shows high variance with R 2 value close to 0.5. These results indicate that all the remaining models have inconsistencies in their accuracy except for the GPR model. Figure 8 shows a scatter plot of the WRF model output against the GPR fit output for the eight variables under consideration. In Figure 8, each subplot corresponds to one meteorological variable, and the horizontal and vertical axes indicate the output of the WRF model and GPR fit, respectively. From the R 2 value shown in the plots, it is clear that the GPR model can explain 98% of the variability of the output data around its mean, except for the variable surface pressure, for which the R 2 value is 0.9 (as shown in Results of surrogate-based Sobol' method:. The GPR model, which is built upon 250 samples, is used to predict the outputs of 50000 samples generated by Sobol' sequence, and these outputs are used to estimate the Sobol' sensitivity indices, corresponding to each variable. Figure  In each subfigure, the blue bar shows the first-order (primary) effects, the red bar shows the higher-order (interaction) effects, and the sum of these two show the total order effects. The advantage of Sobol' method is that the method can provide quantification of interaction effects, which is not possible by MOAT or MARS methods. Figures 11(c,f) show that the SAP and OLR have considerable higher-order effects, and variable DLWRF has very minimal secondary effects, which indicate that the interactions are predominate in these two variables. Figure 11(b,g) show that the variables SAT and DSWRF have only one sensitive parameter each, while Figures 11(c,e,h) show that the variables SAP, PBLH, and DLWRF are influenced by more than three parameters. These results strengthen the analogy obtained through a heatmap.
The results from Sobol' method indicate that only a few parameters contribute much to the sensitivity of the output variables. Figure 12 shows the aggregate relative contribution of total order effects of each parameter, corresponding to the selected variables. The abscissa indicates the output variables, and the ordinate indicates the relative importance, and the parameters are indicated by different colors. From the figure, it is evident that only 8 out of 24 parameters, namely: P3, P4, P6, P10, P14, P15, P17, and P22, are responsible for more than 80% of the total sensitivity of every variable. Unlike MOAT or MARS methods, Sobol's deterministic nature gives results that are more accurate as they are free of any uncertainties.

Physical interpretation of parameter sensitivity
The results obtained by the three sensitivity analysis methods suggest that only a few parameters strongly influence the meteorological variables under consideration in this study. The sensitivity indices of parameters obtained by the three methods are added over all variables and are normalized to [0, 1].
The results are shown in Figure 13 in descending order, which indicates the ranks of the parameters. The results show that the parameter P14 is the most sensitive parameter among all. This represents the scattering tuning parameter used in the shortwave radiation scheme proposed by Dudhia (1989).
This parameter is used in the downward component of solar flux equation (Montornès et al., 2015). This parameter is the main constant associated with the scattering attenuation and directly affects the solar radiation reaching the ground in the form of DSWRF. When a cloud is present in the atmosphere, it attenuates the downward solar radiation; simultaneously, it contributes to the downward longwave radiation by means of multiple scattering. Since the Dudhia (1989) scheme does not have a representation of the multi-scattering process, parameter P14 attenuates the downward radiation without any contribution to the heating rate (Montornès et al., 2015). This leads to changes in the DLWRF. The land surface model transforms the solar radiation into other kinds of energies, such as latent heat (LH) and sensible heat (SH) near the surface. This implies that the changes caused to the downward radiation will also affect the LH and SH. The planetary boundary layer is governed by the LH and SH. Therefore, the changes in the DSWRF will ultimately affect PBLH Montornès et al. (2015). A higher value of P14 leads to a decrease in downward solar radiation and the surface level heating, which ultimately reduces the surface atmosphere temperature (SAT). Studies of Quan et al. (2016) show that the changes in SAT lead to variations in relative humidity. Due to the correlation between SAT, humidity, and SAP, variations in SAT and humidity lead to variations in the SAP.
The parameter P6 is the multiplier for entrainment mass flux rate in the Kain-Fritsch cumulus physics scheme. This parameter determines the amount of ambient air entraining into the updraft flux, which further dilutes the updraft parcel. A high value of P6 indicates a high amount of ambient air entrainment into the air parcel, and the atmosphere becomes more stable. This suppresses the formation of deep convection, which leads to a decrease in convective precipitation. Since the shallow convection removes a large amount of the instabilities, this leads to more stable stratiform clouds, ultimately resulting in high precipitation. The occurrence of precipitation decreases the SAT and increases the relative humidity, and this leads to change in the SAP. This parameter alters the formation of clouds, which in turn affects the variables that depend on clouds, such as OLR, DSWRF, and DLWRF Quan et al. (2016)  Evaporation is the main constituent of cloud formation. Since the parameter P17 affects evaporation, the DLWRF, which depends on clouds, will also be affected by P17.
The parameter P10 is the scaling factor applied for icefall velocity used in the microphysics scheme, proposed by . This parameter controls the ice terminal fall velocity, which governs the sedimentation of ice crystals. The cloud constituents such as cloud water and cloud ice are affected by the sedimentation of ice crystals. Since the cloud water and cloud ice reflect radiation into the outer space, any change in the parameter P10 causes variations in the OLR (Quan et al., 2016;Di et al., 2017;Ji et al., 2018). The parameter P4 is the Von Kármán constant used in the surface layer scheme (Jiménez et al., 2012) and PBL scheme . This parameter relates the flow speed profile in a wall-normal shear flow to the stress at the boundary. This parameter directly influences the bulk transfer coefficient of momentum, heat, moisture, and diffusivity coefficient of momentum. This implies the changes in P4 will bring implicit variations in surface pressure and moisture, which lead to changes in the precipitation Wang et al. (2020).
The parameter P22 is the profile shape exponent for calculating the moment diffusivity coefficient used in the PBL scheme. This parameter is directly related to P4 since both are used in the diffusivity coefficient of the momentum equation. This parameter regulates the mixing intensity of turbulence in the boundary layer, and in view of this, the planetary boundary layer height (PBLH) will be affected (Quan et al., 2016;Di et al., 2017;Wang et al., 2020). The parameter P15 is the diffusivity angle for cloud optical depth computation used in the longwave radiation scheme, proposed by Mlawer et al. (1997). The longwave radiation irradiating back to the Earth's surface is attenuated by the diffusivity factor (which is the inverse of cosine of diffusivity angle) multiplied by the optical depth. Thus, changes in P15 directly cause variations in DLWRF (Quan et al., 2016;Di et al., 2017;Iacono et al., 2000;Viúdez-Mora et al., 2015). The parameter P3 is the scaling factor for surface roughness used in the surface layer scheme (Jiménez et al., 2012). A smooth surface lets the flow be laminar, whereas a rough surface drags the flow, thereby affecting the near-surface wind speed (Nelli et al., 2020). This way, parameter P3 is directly related to the wind speed. Thus, any changes in P3 results will also affect the surface wind speed (Wang et al., 2020).

A comparison between simulations with the default and optimal parameters
The objective of the present work is to identify the most important parameters which greatly influence the model output variables. To illustrate whether parameter optimization can improve model prediction, a comparison of WRF simulations with the default and optimal parameters for meteorological variables, such as precipitation, SAT, wind speed, and maximum sustained wind speed, was conducted. The parameters with which the model simulations show the least RMSE error with respect to the observations are selected as optimal parameters. The RMSE values of surface wind, surface temperature, and precipitation of the default and optimal simulations are evaluated and are shown in Table 3. The results show that optimal simulations have smaller RMSE values for surface wind (2.089 m/s) compared to default simulations (2.6 m/s). The percentage improvement is calculated as the percentage of reduction in RMSE score between the default and optimal simulations over the default simulations. Table 3 shows that a 19.56% of improvement is achieved by using the optimal parameters over the default parameters in respect of the wind speed at 10m height denoted as WS10. In the same way, the optimal parameters yield a 6.5% and 13.2% improvement for surface temperature and precipitation, respectively, over the default parameters. Figure 14 shows the domain averaged spatial distribution of the anomaly in the variables evaluated as the difference between the simulations with the default set of parameters and the observations on the left panel, and the difference between the simulations with the optimal set of parameters and observations on the right. The IMDAA data is used to validate the wind speed and SAT, and the IMERG rainfall data is used to validate precipitation. Since the spatial plots provide only a visual comparison, the SAL metric is also used to analyze the results quantitatively, as shown in Figure 15. For surface wind, Figures   14(a,b) show that the default simulations have large positive bias over the Bay of Bengal, whereas the optimal simulations have ±1 m/s bias all over the domain, and Figure 15(a) shows that the optimal simulations have SAL scores closer to zero than compared to default simulations (For a better forecast, the SAL scores should be close to zero). A positive value of structure indicates that the structures of the simulated variables are larger than the observed structures, and a negative value indicates the opposite. The structure (S) score for the default simulations (0.0405) is close to zero compared to that of the optimal simulations (-0.0644), which indicates that the default simulations captured wind patterns better than the optimal simulations. This can be attributed to the large spatial distribution of negative bias of optimal simulations compared to the default simulations, as shown in Figures 14(a,b). A positive value of amplitude implies that the domain averaged values of simulated quantities are greater than the observed ones, and negative values imply the opposite. The amplitude (A) score of optimal simulations (0.0908) is closer to zero compared to the default simulations (0.1666), which indicates that the optimal parameters simulated the intensities better than the default parameters. The location (L) score of optimal simulations (0.1153) is closer to zero compared to the default simulations (0.2107), which indicates that the optimal simulations have a smaller spatial deviation than the default simulations. For SAT,14(c,d) show the default and optimal simulations have similar spatial distribution, with the default simulations showing positive bias over the north-west region, and the optimal simulations have negative bias over the south-west region. Similar results are observed in SAL scores, as seen in Figure 15 scores from the optimal parameters are better than the scores from default parameters. Figures 14(e,f) show that the optimal simulations have less positive and negative biases over most of the places compared to default precipitation, and the findings are quantified using SAL scores, as shown in Figure 15(c). The S, A, and L values for the optimal simulations (0.2372, 0.02547,0.070377) are closer to zero compared to those of the default simulations (0.2866,0.1089,0.0805), which implies that optimal simulations are better than the default simulations.
The Maximum Sustained Wind speed (MSW) is one of the primary measures of the intensity of a cyclone. In addition to the spatial distributions of variables, MSW is also compared for default and optimal simulations and shown in Figure 16. From the WRF simulations using QMC samples, MSW values of the ten cyclones are extracted at 6-hour intervals, beginning from the 18 th hour to the 84 th hour and the data thus obtained is averaged over all the cyclones. A boxplot is generated using the data, and this shows that uncertainties in the parameters significantly affect the MSW simulations. The simulated MSW values with the default and optimal parameters are plotted along with the observed IMD MSW values, which show that the optimal simulations match quite well with the observations compared to the default simulations. These results indicate that the calibration of parameters will definitely improve model predictions.

Conclusions
The present study evaluated the sensitivity of the eight meteorological variables, namely surface wind speed, surface air temperature, surface air pressure, precipitation, planetary boundary layer height, downward shortwave radiation flux, downward longwave radiation flux, and outgoing longwave radiation flux, to 24 tunable parameters for the simulations of ten tropical cyclones over the BoB region. The tunable parameters were selected from seven physics schemes of the WRF model. shape exponent for calculating the momentum diffusivity coefficient, P3 -scaling related to surface roughness, and P15 -diffusivity angle for cloud optical depth) were found contributing to 80% − 90% of the total sensitivity metric. A comparison of the WRF simulations with the default and that with optimal parameters with respect to observations showed a 19.65% improvement in the surface wind prediction, 6.5% improvement in the surface temperature prediction, and a 13.3% improvement in the precipitation prediction when the optimal set of parameters is used instead of the default set of parameters. These results indicate that the calibration of model parameters using advanced optimization techniques can further improve the prediction of tropical cyclones in the Bay of Bengal. The authors are also thankful to the international scikit-learn community for developing and providing the APIs. The figures are plotted using the python language and NCAR Command Language (NCL).

Declarations
Funding No funding was received to assist with the preparation of this manuscript

Conflicts of interest The authors declare no conflict of interest
Availability of data and material The datasets generated during the current study are available from the corresponding author on reasonable request    The MSW of all cyclones at corresponding forecast hours are averaged, using which the boxplots are generated. The green line shows the simulation with default parameters, the blue line shows the simulations with optimal parameters, and the red line shows the observed MSW. The data is collected at 6 hourly interval and is plotted accordingly. . . . 47         are averaged, using which the boxplots are generated. The green line shows the simulation with default parameters, the blue line shows the simulations with optimal parameters, and the red line shows the observed MSW. The data is collected at 6 hourly interval and is plotted accordingly.