|Second review of Xu et al. “Parameter calibration in global soil carbon models using surrogate-based optimization”.|
This paper tests the performance of surrogate modeling approach for estimating parameters in three soil carbon models relative to selection of commonly used direct optimization approaches and an MCMC approach. The results suggest their surrogate modeling approach is amongst the most effective and is the most efficient (by their measure at least).
The description of the surrogate modeling and optimization approaches are relatively complete, but details about the models themselves, and how they have been used, are incomplete and confusing. This distracts from the main aspects of the work detailing the optimizations. This confusion is not helped by the authors repeatedly making claims about working with Earth system models and global land surface models, when in reality they working with a matrix approximation of a sub-component of such complex models, or models that are not part of such systems at all. Whilst it is exciting to see such approaches being applied in this way, the usability and applicability with fully functionally ESMs must not be overstated.
It is unclear how the three soil C models were actually run. What are the inputs? What is the climate forcing? Where do they come from? Are they a time series, or mean values used to estimate a steady state solution? Rather than describing the model equations – which could be moved to appendices – a more holistic description of the modeling approach including details of domain, forcing, NPP inputs would be beneficial.
Related to this is confusion as to how the CLM-CASA’ C-only model is being used. Section 2.1 talks about the matrix approximation in general terms – but not specifically about its use in this case. Whilst acknowledging there are references describing this, some detail is required directly in this manuscript. It is important to have clarity here, as one of the main arguments you make is how beneficial surrogate models are when working with computationally expensive, highly complex Earth system models. Yet, in reality the surrogate model is being built for the matrix approximation, which in itself was developed to address many of the issues related to parameterization of ESMs. Often in the manuscript you described the days to weeks of wall clock time associated with MCMC model runs in comparison to the efficiency of the surrogate. But is this really the case with the matrix model? Similarly, the 2- and 4-pool models are not highly complex (relative to an ESM), and it is unclear if there are real world benefits from developing a surrogate for such models (that is to say, is the efficiency gain really worth the expense in developing the surrogate for a model of this complexity?)
The results themselves are encouraging, and what might be expected, with the exception of the relatively poor performance of the MCMC scheme in the case of the 2- and 4-pool models. The authors suggest this “may be mainly due to the different targets of the parameter selection”. This needs further elaboration because I don’t really understand what that means, and it really is a surprising result.
Figure 2 needs some clarification. Why is the data so “blocky” and covers such a limited extent of the land mass? Is that the domain over which the model was actually run?
It seems like much of the appendix replicates Section 3.3 – but is actually written better and is clearer. I would suggest merging them into Section 3.3 alone.
The quality of the English is very variable across different sections of the manuscript. The abstract and introduction are very poor with numerous language errors – far too many for this reviewer to list individually and any resubmission will require a thorough proof reading.