Assessing optimal set of implemented physical parameterization schemes in a multi-physics land surface model using genetic algorithm

Optimization of land surface models has been challenging due to the model complexity and uncertainty. In this study, we performed scheme-based model optimizations by designing a framework for coupling “the micro-genetic algorithm” (micro-GA) and “the Noah land surface model with multiple physics options” (Noah-MP). Micro-GA controls the scheme selections among eight different land surface parameterization categories, each containing 2–4 schemes, in Noah-MP in order to extract the optimal scheme combination that achieves the best skill score. This coupling framework was successfully applied to the optimizations of evapotranspiration and runoff simulations in terms of surface water balance over the Han River basin in Korea, showing outstanding speeds in searching for the optimal scheme combination. Taking advantage of the natural selection mechanism in micro-GA, we explored the model sensitivity to scheme selections and the scheme interrelationship during the microGA evolution process. This information is helpful for better understanding physical parameterizations and hence it is expected to be effectively used for further optimizations with uncertain parameters in a specific set of schemes.


Introduction
Land surface models (LSMs) have significantly advanced in recent years, but their optimization has been challenging due to their increased complexities and number of uncertainties, which require tremendous computing resources.LSMs are generally developed to represent better large-scale characteristics of surface hydrology, biophysics, and bio-geochemistry in terms of interactions between the land and atmosphere.However, models inevitably have uncertainties due to our insufficient knowledge of nature.The uncertainties often result from the unreasonable representation of the spatio-temporal surface heterogeneity.In order to address this issue for more extensive application studies, model optimizations through parameter calibrations have been essentially used.Model optimization, which calibrates uncertain parameters based on observations, is one of the widely used methods that apply model runs to large-scale studies.Such methods often include parameter sensitivity analyses for effective optimizations (Gupta et al., 2000;Jackson et al., 2003;Mo et al., 2008;Nasonova et al., 2011;Rosero et al., 2010;Williams and Maxwell, 2011).To make model runs more reliable, previous studies have calibrated several uncertain parameters in only one or two schemes related to their targeted variables (Cretat and Phol, 2012;Essery et al., 2013;Miguez-Macho and Fan, 2012), sometimes multiple parameters in many schemes (Moriasi et al., 2007;Rosolem et al., 2013).However, this type of optimization tends to be limited to only a few sites due to the tremendous computing resources and time.
Model optimization techniques that maximize computation effectiveness were recently developed.One of the methods increasingly used for model calibration is "the genetic algorithm" (GA).The fundamental concept of GA is the natural selection of genes (parameters) for object evolution (Holland, 1992).A marked advantage of GA is its smart search for an optimal combination of parameters by considering interactions among various uncertain parameters, skipping separate model sensitivity experiments.In light of this advantage, GA has been used for various numerical models and spotlighted as an effective and reliable technique for handling the issues of the quantitative increase in uncertain parameters in numerical models (Fang et al., 2003;Rosolem et al., 2013;Yu et al., 2013).
Another considerable problem in model optimizations is a possible opposite effect among the implemented schemes in a complex model.For example, improvement of a single physical process through certain parameter optimizations can be followed by exacerbations of other processes.However, this opposite effect may not be sufficiently addressed by considering merely uncertain parameter interactions, because the increased complexity of models is associated with not only the increases in the number of uncertain parameters but also the augmentations of new parameterization schemes.Rosero et al. (2010) revealed that model sensitivity to parameters can vary according to the choice of scheme as well as parameters associated with land surface heterogeneity.Their study implies that interactions among the implemented schemes in a model may induce further considerable uncertainty for model optimization.This issue is of great importance when constructing a new model from various pre-developed parameterization schemes for their large-scale applications.
In the preceding context, the model's sensitivity to the implemented schemes and the interrelationship among schemes need to be investigated for more effective model optimization.In this study, we designed a model optimization system by applying a GA technique to a multi-scheme LSM in order to investigate the model sensitivity to schemes and the scheme interrelationship.In this system, GA induces scheme-based optimization by controlling the LSM.We used an efficient version of GA, the so-called "the micro-genetic algorithm" (micro-GA), that uses a small number of investigation samples.
The purpose of the application of this coupled system is (1) to maximize the effectiveness of extracting an optimal scheme set from the LSM and (2) to assess how the interrelationship among the schemes contributes to the effectiveness of posterior parameter optimizations by taking advantage of the smart searching mechanism of GA.In order to disclose the interrelationship among the implemented schemes, we target two different variables -evapotranspiration (ET) and runoff in this study.These are important processes to evaluate the accuracy of the model's surface water balance.Since the two variables are closely linked in terms of surface water partitioning, it will be useful to check a possible opposite effect between them.

Noah LSM with multiple physics options
For a multi-scheme LSM, a new version of the Noah LSM with multiple physics options (hereinafter Noah-MP) was used.The Noah LSM has evolved through the cooperative efforts of various institutions such as the National Center for Environmental Prediction (NCEP) and the Air Force Weather Agency.Using Noah LSM 3.0v as the baseline, Noah-MP was augmented with multiple physics options with regard to 10 different land surface processes (Niu et al., 2011).The augmentations were basically intended to improve the Noah LSM in terms of phenology computation, snow treatment, and groundwater representation.Among the ten physical categories, dynamic vegetation and its paired scheme for the stomatal resistance, the Ball-Berry scheme (Ball et al., 1987;Collatz et al., 1991), were fixed in this study.Thus, eight parameterizing categories were totally used to generate scheme combinations as summarized in Table 1.
There are two available options (schemes) in the paratemterizing surface exchange coefficient for heat (SFC).The difference in two options is related to the effect of surface roughness length on the heat transfer of the land surface to the atmosphere.SFC(1) determines the surface heat coefficient by the roughness length of heat and momentum while SFC(2) treats the effect of the roughness lengths with the zero-displacement height (Chen et al., 1997).The schemes for supercooled liquid water in frozen soil (FRZ) deal with liquid water around frozen soil particles that has different freezing-point depressions with soil depths.FRZ(1) uses more general form of the equation (Niu and Yang, 2006), but FRZ(2) uses a variant freezing-point depression equation with increased soil-water interface (Koren et al., 1999).The schemes for frozen soil permeability (INF) are involved in parameterizations of soil hydraulic properties such as infiltration in frozen soil, with two available options; INF(1) defines the soil permeability using soil moisture (Niu and Yang, 2006) while INF(2) uses only the liquid water volume (Koren et al., 1999).There are two available options in the snow surface albedo schemes (ALB); ALB(1), adopted from Biosphere-Atmosphere Transfer Model (BATS), computes more albedo (considering various snow properties such as snow age and impurity) than ALB(2), adopted from Canadian Land Surface Scheme (CLASS).Four options are available in the runoff category (RUN), and their differences are mainly related to dealing with subsurface runoff.The base scheme of RUN(1) and RUN(2) is the topography-based hydrological model (TOPMODEL).A simple groundwater model was added to TOPMODEL in RUN(1) (Niu et al., 2007) in spite of an equilibrium water table concept in RUN(2) (Niu et al., 2005).RUN(3) and RUN(4) are the scheme from the original Noah LSM (Schaake et al., 1996) and from BATS (Yang and Dickinson, 1996), respectively, both of which uses free drainage concept for subsurface runoff.Three options are available in the schemes for the soil moisture factor controlling stomatal resistance (BTR); BTR(1) from Noah LSM uses soil moisture (Chen et al., 1996) and BTR(2) and BTR(3) that are from BATS (Oleson et al., 2004) and the Community Land Model (CLM) (Xue et al., 1991), respectively, uses metric potential to compute the factor.Vegetation distribution affects radiation transfer at the land surface with solar zenith angle.The three schemes of twostream radiation transfer (RAD) treat the gaps differently with regard to the pattern of vegetation distribution as follows; RAD(1) considers three-dimensional structure of vegetation canopy (Niu and Yang, 2004), RAD(2) does not consider the canopy gab, and RAD(3) treats the gab from unity minus the green vegetation fraction.Partitioning of precipitation into rainfall and snowfall (SNF) is determined generally based on air temperature in most LSMs.While SNF(2) and SNF(3) use only a single reference temperature to differentiate between rain and snow, SNF(1) uses a complex functional form for differential partitioning rate based on air temperature (Jordan, 1991).Using the eight categories, the total number of possible scheme combination is 1728.The more detailed description of each scheme, including equations and parameter settings, the reader should refer to Niu et al. (2011).

Micro-GA
GAs, the idea behind which was borrowed from biological evolution and adaptation concepts in genetics, are heuristic optimization methods based on natural genetic variation and natural selection that pursue a cost-effective solution (Holland, 1992;Hu et al., 2006;Mitchell, 1998;Wang et al., 2002).These methods have been increasingly applied to parameter optimizations in various hydrological models (Bastani et al., 2010;Bulatewicz et al., 2009;Uddameri and Kuchanur, 2007) and to those in numerical weather predictions (Fang et al., 2009;Krishnakumar, 1989;Lee et al., 2006;Yu et al., 2013 improved version of GA with smaller generation sizes and simplified genetic modifications, hence efficiently reducing the computational resources (Krishnakumar, 1989;Reeves, 1993;Wang et al., 2010).The fundamental mechanism of micro-GA for model optimization is to evaluate individual members in a group (or generation) to select an elite (natural selection) based on a fitness function (or skill score), to bear new offspring members using the genes in the selected elite through stochastic modifications of their chromosomes, such as a crossover mechanism, and then to reconstitute a new generation with the offspring members.While the reconstitutions with the natural selection and crossover mechanism are repeated, the generations evolve and converge to the optimum.

Coupling micro-GA with Noah-MP
Micro-GA was applied for the scheme-based optimization using Noah-MP (hereinafter MP-MGA).In this coupled system, a scheme combination represents an individual, and a group of multiple scheme combinations represents a generation.Each scheme combination is constituted with the eight physical categories as mentioned in Sect.2.1, and each category acts like a chromosome.The generation of a scheme combination from various multi-optional physical processes is easily controlled by discrete numbers from micro-GA; each scheme option in a category is assigned to a unique discrete number in the MP-MGA coupled system.
In the MP-MGA, micro-GA controls the choices of physical schemes to produce scheme generations and conducts a series of model runs using Noah-MP.The flow chart in Fig. 1 summarizes the optimization process in MP-MGA.Two basic parameters must be set to operate micro-GA: the number of individual members in a generation and the number of generations for iteration.These numbers are critical to evaluate the efficiency of the MP-MGA optimization.In this study, each generation was set to 10 individual members and the number of generations was set through the validation experiment (see Sect. 3).
The first basic task of micro-GA is to generate 10 sets of scheme combinations by selecting one discrete number for one category.Once micro-GA establishes the 10 scheme combinations, it applies these combinations to Noah-MP by editing the "namelist.input"file one by one.The second task is to select the best scored scheme combination based on a customized fitness function (see Sect. 2.2.2).Until Noah-MP finishes all the 10 simulations, micro-GA collects the skill scores of each simulation.The skill scores are calculated by comparing the simulations to the observations based on the fitness function.The last task of micro-GA is to generate the next generation through the "crossover" mechanism (exchange of the chromosomes) using the best scored scheme combination from the previous generation.Micro-GA continues this task, exploring better generations than the previous ones until the optimization process converges upon the optimal value.The MP-MGA operation procedure can be summarized as follows (see Fig. 1): 1. Micro-GA produces an initial set of scheme combinations through random selections.The random selections are generated from a uniform random deviate between 0 and 1.
2. Micro-GA controls Noah-MP by editing the "namelist.input"file for the scheme choices and executing the model.
3. After Noah-MP finishes one simulation with the given scheme selections, its skill score is calculated through statistical comparison to the observation data based on the fitness function.
4. Micro-GA collects the skill scores until all the simulations for the current generation are completed.
5. Once micro-GA collects all the skill scores of the generation, it selects the best one (elitism) and stores the selected scheme combination for the next generation.
6. Another set of scheme combinations for the next generation is produced through the "crossover" mechanism using the selected one from the previous step.
7. Micro-GA repeats the steps from two to five above until the generations converge upon the optimal value.
Through the evolution process above, the optimization can possibly converge upon two types of optimal values: global (or true) and local (or false) optimum.The global optimum is the true best skill score the model can produce.In other words, it is the ultimately maximized evolution point of the generations.The local optimum, on the other hand, is a possible premature point that the optimization process can fall into.To avoid any convergence of model optimizations upon a local optimum, micro-GA exerts "restart" of the natural selection processes at some iteration point, keeping alive the best individual obtained through the previous iterations.This "restart" point is determined through the calculation of the total number of chromosome crossovers.The number of chromosome crossovers is supposed to gradually decrease with the progression of the generation evolution because the natural selection mechanism gives more surviving chance to the chromosomes that have advantages to improve model accuracy.Thus, micro-GA decides that the model optimization process is converged upon an optimum when the number of chromosome crossovers at an iteration is less than 5 % of the total number of chromosomes.Once micro-GA decides the optimum convergence, a random generation is regenerated through random chromosome selections, with the addition of the finally survived elite member from the previous iteration.

Fitness function
In the GA optimization, we need to define a functional to be optimized -the fitness function.Evaluation skills used in the fitness function are very subjective, depending on the study objectives.We may use various statistical indices such as correlation coefficient and root mean square error for a reliable model evaluation.However, because GA uses only one fitness function, we should be very careful to select an evaluation skill for more comprehensive model evaluation.In this study, we used a skill scoring technique, using commonly used statistical indices: correlation coefficient (R), normalized standard deviation (σ norm ), and normalized average (υ norm ) based on observation data.It produces a single skill score (S) from the three statistical indices following (Taylor, 2001) as (1) The closer the skill score is to unity, the more accurate the model output is.In this study, we evaluate the MP-MGA coupled system for surface water budget components: evapotranspiration and runoff.

Study domain and data
The land surface processes in the model were forced by six meteorological fields from the Global Land Data Assimilation System (GLDAS) Version 1 data (Rodell et al., 2004) -(1) precipitation, (2) downward shortwave radiation, (3) downward longwave radiation, (4) near-surface air temperature, (5) near-surface wind speed, and (6) surface pressure.This data set has been developed by the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (GSFC) and the National Oceanic and Atmospheric Administration (NOAA) NCEP.The GLDAS meteorological forcing has been processed, using the atmospheric data assimilation system (ADAS) reanalysed by various operational forecasts and observations.Especially The 3 year GLDAS forcing data from 2001 was processed for the Han River basin in Korea (Fig. 2).The basin area is about 23 000 km 2 .For the model's static input for the land surface characteristics, the Kongju National Land Cover data (KLC; Kang et al., 2010) and soil texture map from the Farm and Organization of the United Nations (FAO, 2002) were used.The KLC data are recently developed and represent land cover for East Asia, using the satellite vegetation products from Moderate Resolution Imaging Spectro-radiometer (MODIS).The study region is under a moderate humid climate condition with about 2.5 mm day −1 and 12 • C of annual average precipitation rate and temperature, respectively.The major vegetation is mixed forest according to KLC.The previous six month data (July to December 2000) were utilized for the model initialization.
For the evaluation of ET simulations, the monthly MODIS ET data set (MOD16A2) was utilized.This data set was estimated from land cover, surface albedo, leaf area index, and fraction of photosynthetically active radiation using an improved ET algorithm based on the Penman-Monteith equation (Monteith, 1965;Mu et al., 2011).Mu et al. (2011) addresses that the MODIS ET uncertainties can be introduced from many sources: uncertainties from input data such as MODIS leaf area index and meteorological data, those from up-scaling processes (tower to landscape), and those of model parameters used in the derivation algorithm.The ET data have been validated for North American flux tower sites.There is a flux tower (located at 37.76 • N, 127.15 • E) available for ground-based ET monitoring in the Han River basin (Kang et al., 2009), but the measurements are not covering the research period of this study (2001)(2002)(2003).However, according to the flux tower observations, the 5 year average of ET from 2004 to 2008 was about 0.93 mm day −1 .MODIS ET of the grid point covering the flux tower site showed 0.95 mm day −1 with a reasonable correlation coefficient (0.86) with the observation for the same period.Thus, it is considered that the possible errors of MODIS ET would not be great at least for the Han River basin.
For the runoff observation data, we obtained the stream flow observation from the Water Management Information System in Korea (WAMIS), and the daily-base observations are available via http://www.wamis.go.kr.We assumed that the monthly-base stream flow observation at the final outlet point of the Han River basin represents the monthly variation of runoff over the entire basin.
The models were evaluated for ET and runoff through monthly-averaged comparisons after the area aggregations for the Han River basin.The actual spatial and temporal resolutions in the simulations were 0.25 degrees and 3 h, respectively.

Evaluation of MP-MGA for the scheme-based optimization
This section addresses the evaluation of MP-MGA, in terms of capability and efficiency by comparing 3 year simulations in the Noah-MP stand-alone mode and the MP-MGA coupled mode.

Model sensitivity to scheme choices in a physical category
For proper model initialization, we first examined the spin-up time when the ET and runoff simulations from the Noah-MP model reaches equilibrium.Since each set of scheme combination represents an individual model, it is interesting to investigate how the model spin-up time varies with the choices of schemes.Model runs for 2001 were repeatedly performed for all scheme combinations until the variables reached equilibrium.The criterion for the equilibrium is defined as the time when the difference in annual means between two consecutive one-year simulations is less than 0.1 % of the means (Cai et al., 2014;Yang et al., 1995).The spin-up periods of the all scheme combinations were no more than two years, and their differences were quite small among the scheme choices even though the scheme choices in RUN showed the greatest differences.The spinup periods for ET simulation were slightly lower than those for runoff simulation, but the differences are also negligible.From the Noah-MP stand-alone mode, the three statistical indices and the skill scores for all scheme combinations were obtained.Table 2 displays averaged indices for simulations of all scheme combinations with one fixed scheme in a certain physical category.The physical categories that show bigger differences in the averaged skill scores are considered to be more sensitive to model performances.
For the ET simulations, the scheme combination having the best skill score (0.806) is as follows: SFC(2), FRZ(1), INF(1), ALB(1), RUN(3), BTR(1), RAD(1), and SNF(3).The order of categories based on the sensitivity (or skill-score differences with scheme choices) from the largest to the smallest is RAD > SFC > RUN > BTR > INF > FRZ > ALB > SNF.For the runoff simulations, the best skill score was 0.809 but scheme combination was very different from the best one for ET: SFC(1), FRZ(1), INF(1), ALB(1), RUN(4), BTR(1), RAD(2), and SNF(2).The order of categories based on the sensitivity is: This result obviously shows that the scheme selections in RAD category are the most important to obtain higher simulation accuracies for ET (RUN for runoff simulation) due to their direct involvement in the variable estimations.It also turns out that three categories -RAD, SFC and RUN -are the most important for accurate simulations of both ET and runoff through proper combination of schemes.
It is noteworthy that the scheme sets chosen to produce the most accurate simulations are very different between ET and runoff, especially in the most influential categories (i.e., RAD, SFC, and RUN in this study).This implies that the model optimization for a single variable (e.g., ET) can degrade simulation accuracy of the other variables (e.g., runoff).This opposite effect between the ET and runoff simulations is presented in Table 3, which displays each statistical index when the model returned the best skill score for ET (upper row) and runoff (lower row).For example, the scheme combination for the best ET simulation (S = 0.806) showed a lower skill score for the runoff simulation (S = 0.693).Similarly, the scheme set for the best runoff simulation (S = 0.794) depicted a lower skill score for the ET simulation (S = 0.659).It is also observed that the normalized means (υ norm ; bold numbers in Table 3) show the largest differences between the ET and runoff simulations.For instance, the normalized mean of ET with the best skill score was 1.037, but it was lowered to 0.815 with the best scheme combination for runoff.In the same way, the normalized mean of runoff with the best skill score was 0.860, but lowered to 0.743 with the best scheme combination for ET.
This indicates that the opposite effect on the skill scores between ET and runoff is mainly reflected in the normalized means of the ET and runoff simulations.The opposite effect between the variables can make difficulty in optimizations for more than two variables.Since this opposite effect is mainly related to the quantitative estimations, interrelationship between parameterizations of physical processes associated with surface water partitioning into ET and runoff might have played a crucial role in this discrepancy.It implies that, to make the optimized scheme combinations have the best performance for the ET and runoff simulations simultaneously, uncertainties of parameterizations in the processes related to both ET and runoff should be depleted; hence further optimizations of model parameters involved in the surface water partitioning estimation are essential as the next step.

The efficiency of the MP-MGA coupled system
In the previous section, we showed how the model is sensitive to scheme choices and what scheme combination shows the highest accuracy through the stand-alone Noah-MP experiments.In this section, we address how accurately and efficiently the MP-MGA coupled system can search for the optimal scheme combinations that produce the highest skill score.
Separate experiments with the MP-MGA for ET and runoff optimization were performed to examine how fast the MP-MGA reaches the global optimum (the highest skill score).We set 10 for the number of individual members in a generation and 30 for the number of generations Figure 3 shows the evolution of generations with an increase in the number of generations.In the ET optimization process (Fig. 3a), the MP-MGA reached the maximized evolution (the point that the MP-MGA converged upon the highest average skill score) at the 18th generation and found the global optimum at the 12th generation.Meanwhile, in the runoff optimization process (Fig. 3b), MP-MGA converged upon the maximized evolution at the 12th generation and found the global optimum at the 3rd generation.The big drop in Fig. 3b indicates the "restart" point (see Sect. 2.2.1) due to the early convergence of the runoff optimization.In both the ET and runoff evolution processes, the MP-MGA didn't fall into any local optimum, and the global optimums are exactly the same as the ones from the stand-alone Noah-MP experiments shown in the previous section.However, there exists a significant difference between the ET and runoff optimizations in the speed of searching for the global optimum.This indicates that the efficiency of micro-GA to find the optimal scheme set for the best output may vary with the target variable.Although the speed of the ET optimization process is slower than that of the runoff optimization process, the MP-MGA required only 120 simulations (12 generations with 10 scheme sets each) to obtain the global optimum.
In addition, Fig. 3 also shows the optimization evolution of one target variable (say, ET) along with that of the other variable (say, runoff) for comparison.It confirms the opposite effect between the ET and runoff optimization as described in the previous section.In other words, an optimization targeting one component of water balance may debase the forecast accuracy of the other component.Specifically, our results reveal that the accuracy of ET simulation improves through  optimization but that of the runoff simulation deteriorates, and vice versa.

The natural selection mechanism in MP-MGA as model sensitivity to scheme selections
Another important capability of the MP-MGA is that it provides information on the model sensitivity to scheme choices in terms of the model accuracy through the natural selection mechanism in micro-GA.The sensitivity analysis to scheme selections can be performed more easily, but more clearly by simply counting the schemes selected via the MP-MGA.Through the natural selection mechanism, micro-GA gives more surviving chance (or selective chance) to the chromosomes (or scheme options) that have more advantage to achieve higher model accuracy.Therefore, the larger number of selections of a certain scheme through the entire optimization process accounts for its higher advantage to the model accuracy.
Figure 4 displays how many times each scheme was selected throughout the optimization process and hence which scheme is dominant in its physics category.It is noted that schemes in three categories -FRZ, INF and BTR -have mostly been selected during the optimization processes of both ET and runoff.Furthermore, each category had highly selected schemes in common for both ET and runoff optimizations, that is, FRZ(1), INF(1) and BTR(1).This clearly indicates that these highly selected schemes are  advantageous to produce better performance in both ET and runoff simulations, and that they are not likely to cause the opposite effect between the two simulations.On the other hand, some other physics categories, such as SFC, ALB, RAD, and SNF, showed counter scheme selections between the two optimization processes.For instance, in terms of the SFC category, SFC(2) had been highly selected in the ET optimization while SFC(1) had been mostly chosen in the runoff optimization.It seems that such schemes cause the opposite effect between the ET and runoff optimization (see Sect. 3.1).
Interestingly, in terms of the RUN category, the ET optimization process was dominated by two schemes -RUN(3) and RUN(4), showing similar chances of being selected.This signifies that the selection between these two schemes makes no significant difference in the ET simulation accuracy.However, considering that RUN(4) is remarkably dominant in the runoff optimization process, we can conclude that RUN(4) is the best choice for higher accuracy in both ET and runoff simulations.
Finally, the combination of the dominant schemes becomes the optimal set of schemes through the natural selection mechanism.

The MP-MGA optimization for the water balance
Additional experiments were performed for the schemebased optimization in terms of water balance (WB) without any parameter calibration.Here the WB optimization is targeting two variables simultaneously, i.e., both ET and runoff.Skill score for evaluating WB simulation (S WB ) used in this study is defined as a simple multiplication of each skill scoring function (S WB = S ET × S runoff ) from Eq. (1).Thus, S WB ranges from 0 to unity; the closer to unity the skill score is, the better performance the model has in the WB simulation.The parameter settings for micro-GA run are the same as those from the previous separate MP-MGA optimization experiments.
Figure 5a shows the evolution progress of the WB optimization in the MP-MGA coupled system.It also displays the evolution progress of the skill scores of each WB component (i.e., ET and runoff).The highest skill score obtained from MP-MGA was S WB = 0.608.This skill score was first obtained at the 12th iteration, and was exactly the same as the skill score obtained from the stand-alone Noah-MP experiments.When the combined WB optimization reached the highest S WB value, the skill score of each WB component (S ET = 0.785 and S runoff = 0.774) was lower than that from the separated MP-MGA optimization (S ET = 0.806 and S runoff = 0.794) in the previous sections.In this combined WB optimization, both the ET and runoff optimizations are performed simultaneously and the opposite effect between ET and runoff is not observed as in the separate optimizations (cf.Fig. 3).In addition, the fluctuation of evolution process in the runoff optimization is larger than that in the ET optimization, which implies that the accuracy of the WB simulation might be sensitive to the simulation performance of runoff than that of ET.
The analysis of the natural selections during the WB optimization process is displayed in Fig. 5b.Note that in the separate optimizations, categories SFC, ALB, RAD and SNF showed different scheme selections for different target variable -ET vs. runoff (see Fig. 4).In the combined WB optimization, the natural selections in categories SFC was made in favour of the runoff optimization, that is, SFC(1) was mostly selected to improve the runoff simulation accuracy.In contrast, the most highly selected scheme from category ALB and RAD in the combined WB optimization was ALB(1) and RAD(1), respectively, which were both favoured by the ET optimization.In category RUN, the RUN(4) scheme was most widely selected, which was favoured by both ET and runoff optimizations.It is interesting that category SNF had SNF(2) as the most selective scheme in the WB optimization, but it was not favoured by either ET or runoff in the separate optimizations.This implies that each physics category plays its own role in the combined WB optimizations through the natural selections in micro-GA runs.For example, in category SFC, micro-GA selects schemes with more weight to the runoff optimization, while in categories ALB and RAD it selects schemes with more weight to the ET optimization.In category RUN, a scheme is chosen in the way to give similar weight in both the ET and runoff optimization.It seems like micro-GA tend to make adjustments or balance between the ET and runoff optimization by selecting a scheme (SNF(2)) that played an important role in neither the ET optimization nor the runoff optimization.

Summary and conclusions
This study was conducted to design and test a framework for a scheme-based model optimization by coupling an intelligent model optimization technique with a multi-physics land surface model.The micro-genetic algorithm (GA), which enables smart and fast searching for the optimum, was introduced and applied to a new version of the Noah Land Surface Model (LSM) with multiple physics options (Noah-MP) in various physical parameterization categories.We implemented an interface between Noah-MP and micro-GA and developed a coupled system (MP-MGA).The optimization in terms of water balance (ET and runoff) through this MP-MGA coupled system successfully demonstrated how micro-GA can effectively seek the optimal set of the physical scheme combination from Noah-MP over the Han River basin in South Korea.Through the stand-alone Noah-MP experiment, we first examined the model sensitivity of ET and runoff simulations to scheme selections in each available parameterization category.Then we evaluated the effectiveness of MP-MGA system on the scheme-based model optimization.The searching speed for the global optimum and the optimal set of schemes is quite outstanding though the speeds are different between the ET and runoff optimization processes -only 120 of the total 1728 simulations for the ET optimization and much less for the runoff optimization.This indicates that, compared to the brute force approach, the MP-MGA coupled system significantly reduces the computing time for searching and extracting the optimal scheme combination through the natural selection and crossover mechanism in micro-GA.
In addition, this study showed a potential advantage of the MP-MGA coupled system to model diagnosis.That is, the natural selection mechanism through the micro-GA's evolutionary process of generations (multiple sets of scheme combinations) provides information on scheme sensitivity and interrelationship that is useful to build a valuable base for further calibrations and improvements.This information is beneficial especially when the optimization is performed for more than two variables.For example, we identified that a scheme set optimized for a single target variable (e.g., ET) may degrade the simulation accuracies of other variables (e.g., runoff).Since ET and runoff are mutually related, such opposite effect between the ET and runoff estimations makes difficulty to assess model accuracy in terms of surface water balance (WB).
It is also noted that the natural selection mechanism in micro-GA provides information on schemes mainly contributing to the opposite effect in the separate ET and runoff optimization.For example, the dominant schemes commonly selected in both the ET and runoff optimizations through the natural selection bring a synergistic contribution to better model performance.The dominating schemes that were selected disparately from the same categories between the two separate optimizations (e.g., SFC(2) for ET vs. SFC(1) for runoff) cause the opposite effect.In the combined WB optimization process with both ET and runoff as target variables, the most highly selected schemes, via the natural selection, played their own role to improve the model accuracy for the target variables.For example, some schemes acted upon improvement of the ET optimization while some others operated upon amelioration of the runoff optimization.
It appears that the scheme-based optimization through the MP-MGA coupled system does not assure the ultimate modelling performance we can obtain from Noah-MP.Further optimizations on uncertain parameters in parameterization schemes are still required with deep understanding of the physical parameterization methods in the model.Furthermore, additional experiments for combined scheme-and parameter-based optimization are essentially required to investigate the effect of parameter calibrations on the model sensitivity to scheme selections.However, considering that the recent trend of increased model complexity with the consequent increase of uncertain parameters may require more efforts and time for parameter optimization and calibration, the approach such as the MP-MGA coupled system provides more insightful understanding of the implemented physical schemes and their interrelationships that are essential for more effective model optimization.For instance, the optimal set of scheme combination obtained from the MP-MGA system can be a base model for faster and more effective model optimization and improvement.
Moreover, the model sensitivity to scheme selections shown in this study is likely to be region specific because the simulation performances will also vary with large-scale characteristics such as by climate, land covers, and even geographical features.For example, the schemes that showed relatively low sensitivity were related mainly to winter season variations.This might have been induced by moderately warm climate condition of the Han River basin.The regions in higher latitudes may show very different sensitivities to those schemes.Thus, additional experiments for different regions are required to obtain more generalized and extensive model applicability.

Figure 1 .
Figure 1.A flow chart describing the scheme-based optimization process from the coupled micro-GA and Noah-MP model.

Figure 2 .
Figure 2. Geographic location of the Han River basin (dark shaded area) in South Korea.The gray lines in the basin indicate the river channels.

Figure 3 .
Figure 3. Evolution of generations during the processes of the MP-MGA optimization for (a) ET and (b) runoff.The red and blue lines indicate the averages of ET and runoff in each experiment, respectively.

Figure 4 .
Figure 4. Percentage of selections of each scheme to the total number of simulations (300 simulations) from micro-GA during the entire optimization processes for (a) ET and (b) runoff.Each colour indicates the available options in each scheme category.

Figure 5 .
Figure 5. (a) Evolution of generations during the process of the MP-MGA optimization for the WB estimation as in Fig. 3, and (b) percentage of selections of each scheme to the total number of simulations during the optimization process as in Fig. 4.

Table 1 .
Summary of scheme options available in Noah-MP.

Table 2 .
Averaged statistical indices for (a) ET and (b) runoff to the observations (R: correlation coefficient; υ norm : normalized mean; σ norm : normalized standard deviation; S: skill score).Each index indicates the averaged one for simulations of all scheme combinations with one fixed-scheme in a certain scheme category.

Table 3 .
Statistical indices when the model returned with the best skill score for ET (upper row) and runoff (lower row).The bold numbers indicate the indices showing the largest differences between ET and runoff simulations.