The authors have mostly addressed my comments, and I am happy about the addition of appendix B. However, results from the additional twin simulation experiment (TSE) that was conducted, highlight differences between parameter estimation experiments including 5 compared to 17 observation types. Yet, this aspect is not examined adequately in the manuscript.
# general comments
The introduction provides a nice overview of recent parameter estimation studies for BGC ocean models. However, it focuses mostly on the computational difficulties associated with the estimation of a large number of uncertain parameters. Other big problems that are not mentioned are that of parameter dependency and limited data availability (not in the spatial or temporal sense, but in the type of data commonly observed). For example, low phytoplankton growth and mortality rates may yield similar phytoplankton model estimates to high growth and mortality rates -- the values of phytoplankton growth and mortality rates are difficult to determine with phytoplankton data alone. In terms of data availability, chlorophyll data (the most commonly used data type for BGC data assimilation) may be useful for constraining phytoplankton parameters -- with the caveats just mentioned above -- but almost useless in estimating a parameter related to detritus, e.g., a decomposition rate. Both of these problems are more likely to occur in BGC models with large numbers of uncertain parameters. In fact, they may explain some of the differences seen in the TSE in which all state variables are observed (the 30-day TSE) compared to that in which only 5 variables are observed (the annual TSE). Fewer data types constrain fewer parameters and, since the objective function is based on fewer data types, sensitivity values can differ greatly.
Here, it is unfortunate that the two TSEs in question differ in two aspects (number of observed variables/data types and length of the observations), so it becomes difficult to estimate to what degree each aspect causes the difference in the results. I would suggest a new experiment in which only one of the two aspects is changed from the 30-day TSE, but I can understand if this is not possible due to computational constraints.
Based on my comment above, and even without additional experiments, the authors should mention and discuss parameter dependency and limited data availability as important parameter estimation problems in the introduction, which is currently very much focused on computational issues. Furthermore, the difference in the number of observed variables should be emphasized more in section 4.1 when interpreting the results of the TSEs. The expanded interpretation of results may also warrant a new paragraph in the Conclusions section.
# specific comments (line numbers correspond to tracked-changes document)
L 4: "ocean high-dimensional (in parameter space) BGC models" → "high-dimensional (in parameter space) BGC ocean models"
L 9: "objective functions": some readers may be more familiar with the terms "cost function" or "loss function" and I would suggest adding a brief description (such as "which quantifies the error between observations and the model estimate") to make the abstract more accessible.
L 27: "Subsequent to verification using the TSE, we use the method to estimate parameters for the two sites, both individually and together.": Mention here that "real" data are used for this.
L 102: "these weights can be adjusted as desired, for example [...] to provide greater weight to state variables for which observational uncertainties are smaller": While true, this statement makes it appear as if this objective function does not provide greater weight to state variables for which observational uncertainties are smaller, when in fact it does so via the \sigmas. I would suggest changing the statement to "these weights can be adjusted as desired, for example [...] to provide greater weight to select ocean sites".
Fig 1. caption: "calculates the error": I would suggest using "calculates the current value of the objective function" just because the diagram shows the "Objective Function Calculator" and no mention of "error".
Fig 5. caption: Mention what S-hat is. Consider adding a shaded area or dotted blue lines to indicate the +/-5% range around the baseline value, indicating that a parameter is considered "recovered".
Fig 5. caption: Consider using "p-hat_i - p-hat_O, the difference between the test and baseline normalized parameter values" instead of "the difference between the test and baseline normalized parameter values, p-hat_i and p-hat_O, respectively", it directs the reader's eyes directly to "p-hat_i - p-hat_O" on the right y-axis.
Fig 5. caption: To me personally, the sentence "The parameter values are normalized between 0 and 1 based on the upper and lower bounds for each parameter." is not helpful here.
L 230: "Five parameters fell into this category": Due to changes in the text preceding this statement, it is no longer clear what "this category" is referring to, I would suggest rephrasing it.
L 231: "maximum objective function evaluation for each parameter between the positive and negative 5% perturbation cases": The "between" makes it sound like all parameter values in the +/- 5% range were tested, which I do not think is the case. I would further suggest using "value" instead of "evaluation".
L 480: "This shows that the water is clearer than initially estimated.": I would be a bit more careful in the wording when interpreting estimated parameter values; my suggestion: "This result suggests that the water is clearer than initially estimated."
L 636: "The run time for a single model evaluation is approximately 5 min.": Run on a single node, single core, or what configuration?
Fig. B1: The apparent lack of correlation between initial sampling and final objective function values may suggest that it is not important to start the gradient-based optimization at a low value, but rather to start the optimization at different locations in parameter space to avoid local minima. But any additional experiments would be beyond the scope of this study. |