Comment on gmd-2021-251

This papers proposes an objective method to tune ESMs in a global approach. The authors proposes a mathematical equivalent to the way modellers tune their models so as to rationalize and automatize the tuning process, based on a Green’s functions approach. This is a very important topic for ESM developers that come to a limitation in the model developments due to the long process of human-made iterative tuning. This study is perfectly timely. The authors provides an example of application and discuss some of the assumptions made. However, some assumptions are not fully explained or would deserve more comments. I therefore recommend the manuscript to be accepted after minor revisions.

detailed as well as the methodology to set them. For instance in section 4.3, it is said that "R is chosen diagonal": apart from being mathematically convenient, what are the physical hypothesis behind such a choice? What makes this choice sensible? Isn't it a strong hypothesis? What happens if reality is far from diagonal? These points should be discussed with a physical point of view in the manuscript to better illustrate the advantages and flaws of the method used.
Similarly, section 5.1 is a bit hard to read. The physical idea behind the choice of functions is not clear enough. What is the physical meaning of Q choices proposed? This section need to be detailed to better catch the physical hypothesis that are made depending on each choice of covariance matrix.
There is also a crucial point that should be at least discussed in the paper: in such complex ESM there is a long spin-up process of several decades that has to be taken in to account. If only taking the first 10 years of ESM simulations for tuning, what is the result of such a sort term tuning on the long term equilibrium? How do the authors proposes to handle this problem?
Line 230: the fact that the surface temperature is less well reproduced after the tuning process is very interesting. Even if it reflects a flaw in the present choice of observational targets, the discussion on this should be enriched. What type of other observational targets could be proposed to avoid this? Section 5.2: by increasing the number of parameters, there is a risk to over-fit the model to the set of observation chosen as discussed in Dommenget and Rezny, (2018). This risk should be discussed in the last section. Additionally, by increasing the number of parameters the final fit is necessary improved as more degree of freedom have been used. Do the authors could use a methods similar to the LASSO technique (Tibshirani, 1996, used in linear regression) to penalize the results according to the number of degree of freedom? This would help selecting only the more useful parameters and avoid overfitting.
Ln 304-305: "The largest cost reduction was found between 5 to 10 optimized parameters." And what about the ratio projected cost/optimized cost? Maybe this information could be added on figure 6. Is this ratio stable depending on the number of parameters?
Section 5.4: what would be the cost if the first 5 years were used? Would it be different from the last five years? If yes, this would probably says that the spin-up is important.
But then, what about taking the last 10 years over a 20 years simulation. The initial choice of 10 years appears as a bit short against the spin-up timescale of coupled models. The spin-up problem would be discussed more thoroughly. Maybe it would be nice to show the model initial drift.
I may have misunderstood the methodology, but reading again the introduction section, I am not convinced that the proposed methodology addresses all the points raised in the introduction.
-ln 49-52 :"An experiment with both perturbed would not reveal which had the greater impact on the solution, and experiments with only one at a time perturbed would not behave as an experiment with both perturbed. The sequential methodology, therefore, for this example, may result in a sub-optimal combination of these parameters." I do not understand how the Green's function approach do something different to "one at a time" perturbed experiment. Maybe it is just a question of formulation here? -ln 58 :"The second drawback is the large number of needed simulations, since a new set of experiments is required for each set of observational targets". Using the standard by hand method, developers usually have a look at a panel of diagnostics and do not need to redo the experiments for a new diagnostic, I do not see any difference on this point. This point is repeated on line 74, I would not emphasize this much.
-ln 71-71: "The interdependence is accounted for in the constraints on the error covariance for each parameter." This points of interdependence is not well developed in the manuscript. I would promote to better illustrate this point.

Miscellaneous
Section 2: please indicate the ocean model time-step and better separate the ocean and atmosphere models description in two distinct paragraphs. Table 4 and 5: these tables are large and difficult to interpret as is, the exact values not behind of strong interest. Please consider replacing them by plots.
Section 5.2: Concerning LW and SW radiation, is the change from downward to net? Why changing this point? Could the authors discuss a bit more this choice?