|This paper used the adjoint technology, L-BFGS-B optimization algorithm and Gibbs sampler to optimize the parameters of JULES. Generally speaking, this paper is well written since it has been reviewed and revised, but it also has three problems that I have to point out. (1) Data assimilation. This paper used ‘adjoint’ to provide the gradient information. Although ‘adjoint’ has been used in data assimilation, it is actually not ‘data assimilation’. So the ‘data assimilation’ in the title, as well as in the main body, are suggested to be replaced by ‘adjoint’. (2) Local optimum. If I understand correctly, the gradient based BFGS algorithm is used to find the local optimum, but if there are multiple local optimums, how to jump out of the local optimums to find the global one? (3) Posterior distribution. The posterior distribution seems really weird and I guess the reason is a too strong assumption: truncated multivariate normal distribution. If the true posterior distribution is not Gaussian, the results given by Gibbs sampler with such a strong assumption might be very misleading.|
(1) Title: data assimilation
What’s the definition of ‘data assimilation’ in your paper? For my all due respect, this title is really misleading. Usually, data assimilation means adjusting the state variables, such as soil moisture, temperature, and pressure, with observed values; while parameter optimization means tuning the parameters according to the deviation between observed and simulated output. Although they have some common techniques, these two concepts are fundamentally different. Kalman filter, 3D-Var and 4D-Var are very popular in the community of data assimilation, while in the community of parameter optimization, Genetic Algorithms, Metropolis-Hastings, and Gibbs samplers et.al., are more frequently used.
Consequently, it is really weird to use ‘optimization through data assimilation’ in the title of this paper, because the adjoint (Hessian) of JULES, BFGS gradient based optimization and Gibbs sampler, were used to tune the parameters, while the state variables remain untouched. Although the ‘adjoint’ technique has been used in data assimilation, it is not equals to data assimilation. I think it might be better to replace ‘data assimilation’ with ‘adjoint’.
(2) Page 2, line 5: two major sources: (a) process uncertainty, and (b) parameter uncertainty…
You are missing another important source, the uncertainty due to initial and boundary conditions (forcing data, for land surface models). [Kavetski et al., 2006a, 2006b][Ajami et al., 2007]
(3) Page 3, line 15, Page 5, line 1 and 12, etc.
Replace ‘data assimilation’ with ‘Adjoint based optimization’ or something that can accurately describe the methodology used in this paper. See my previous comment (1).
(4) Page 6, line 12: gradient descent algorithm L-BFGS-B
Page 6, line 22: locally optimal parameter vector
Page 6, lind 24: locally optimized parameters
This paper used the gradient descent algorithm L-BFGS-B as optimization method, and the gradient information was provided by ‘adjoint’. My concern is, if the optimization algorithm is gradient base, how to deal with the local optimums? If the response surface has multiple local optimums, the gradient-based methods will be trapped in a local optimum so that they cannot find the global optimum. There have been a lot of approaches to find the global optimum, such as Genetic Algorithm [Wang, 1991], SCE-UA [Duan et al., 1992], or multi-start methods [Krityakierne and Shoemaker, 2015]. Although the performance of JULES has been significantly improved, it is still not perfect for me because the model performance can be further improved if the algorithm can find the global optimum. Please provide convincing information that the algorithm can successfully jump out of local optimums, or use another algorithm to find the global optimum.
(5) Page 9, line 17: a truncated multivariate normal distribution … using Gibbs sampling…
This is actually a very strong assumption about the posterior distribution. If you want to obtain the posterior distribution of parameters but do not know too much about the shape of it, more general approaches, such as Metropolis-Hastings[Hastings, 1970], Adaptive metropolis[Haario et al., 2001], or DRAM[Haario et al., 2006], might be better.
I think the reason of strongly correlated joint posterior distribution is your strong assumption of truncated multivariate normal distribution. If you assumed the posterior distribution to be multivariate normal, it is impossible to get any other shape of distribution. Due to my experience, the response surfaces of land surface models are usually not Gaussian, i.e. have multiple local optimums, valleys and peaks, so that such a strong assumption will give totally misleading results. So I suggest to use more general MCMC approaches, such as MH, AM or DRAM to replace Gibbs.
(6) Page 14. Section 3.2.1 should be section 4.
(7) Page 17, line 10: Alternative methods … could avoid this issue, but are more computationally costly.
How many model evaluations did it cost in this paper? Some classical global optimization methods, such as SCE-UA [Duan et al., 1994], are able to converge within several thousands of model evaluations, while the most recent surrogate based optimization algorithms, such as [Wang et al., 2014; Gong et al., 2015, 2016], are able to obtain global optimum with only hundreds of model evaluations. The global optimization methods are not as costly as you think.
Ajami, N. K., Q. Y. Duan, and S. Sorooshian (2007), An integrated hydrologic Bayesian multimodel combination framework: Confronting input, parameter, and model structural uncertainty in hydrologic prediction, Water Resour. Res., 43, W01403, doi:10.1029/2005wr004745.
Duan, Q. Y., S. Sorooshian, and V. K. Gupta (1992), Effective and Efficient Global Optimization for Conceptual Rainfall-Runoff Models, Water Resour. Res., 28(4), 1015–1031, doi:10.1029/91wr02985.
Duan, Q. Y., S. Sorooshian, and V. K. Gupta (1994), Optimal Use of the SCE-UA Global Optimization Method for Calibrating Watershed Models, J. Hydrol., 158(3-4), 265–284, doi:10.1016/0022-1694(94)90057-4.
Gong, W., Q. Duan, J. Li, C. Wang, Z. Di, Y. Dai, A. Ye, and C. Miao (2015), Multi-objective parameter optimization of common land model using adaptive surrogate modeling, Hydrol Earth Syst Sci, 19(5), 2409–2425, doi:10.5194/hess-19-2409-2015.
Gong, W., Q. Duan, J. Li, C. Wang, Z. Di, A. Ye, C. Miao, and Y. Dai (2016), Multiobjective adaptive surrogate modeling-based optimization for parameter estimation of large, complex geophysical models, Water Resour. Res., 52, 1984–2008, doi:10.1002/2015WR018230.
Haario, H., E. Saksman, and J. Tamminen (2001), An adaptive Metropolis algorithm, Bernoulli, 7(2), 223–242, doi:10.2307/3318737.
Haario, H., M. Laine, A. Mira, and E. Saksman (2006), DRAM: Efficient adaptive MCMC, Stat. Comput., 16(4), 339–354, doi:10.1007/s11222-006-9438-0.
Hastings, W. K. (1970), Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57(1), 97–109, doi:10.1093/biomet/57.1.97.
Kavetski, D., G. Kuczera, and S. W. Franks (2006a), Bayesian analysis of input uncertainty in hydrological modeling: 1. Theory, Water Resour. Res., 42, W034073, doi:10.1029/2005wr004368.
Kavetski, D., G. Kuczera, and S. W. Franks (2006b), Bayesian analysis of input uncertainty in hydrological modeling: 2. Application, Water Resour. Res., 42, W034083, doi:10.1029/2005wr004376.
Krityakierne, T., and C. A. Shoemaker (2015), SOMS: SurrOgate MultiStart algorithm for use with nonlinear programming for global optimization, Int. Trans. Oper. Res., n/a–n/a, doi:10.1111/itor.12190.
Wang, C., Q. Duan, W. Gong, A. Ye, Z. Di, and C. Miao (2014), An evaluation of adaptive surrogate modeling based optimization with two benchmark problems, Environ. Model. Softw., 60(0), 167–179, doi:10.1016/j.envsoft.2014.05.026.
Wang, Q. J. (1991), The Genetic Algorithm and Its Application to Calibrating Conceptual Rainfall-Runoff Models, Water Resour. Res., 27(9), 2467–2471.