|Second review of: An unusual way to validate regional chemistry-transport models” by Lauren Menut et al.|
The methodology proposed by the authors is original and has potential to complement the traditional approach. Unfortunately, as presented in this publication, the added value of this work remains limited because subjective judgements must be made to interpret the results. I also feel that my previous comments have been accounted for, only very partially.
1. The bibliography has been extended but those references are not used much in the text. For example the decomposition of the main indicator RMSE (or MSE) into its three components (Solazzo and Galmarini, Thunis et al., Taylor et al.) could be used as starting point to identify the different indicators and justify the choice of the correlation as central one for this study. The decomposition into a systematic and unsystematic error has already been done in other works that are not referenced.
2. Even if not applicable in a diagram, information on the bias or on the standard deviation could be provided. Values of SN and D could be calculated and added to the analysis. The fact that the bias is low and does not show any variability (e.g. for T2m) is an interesting information per se. It could also be that the bias shows more variability for other species and then become the crucial parameter to analyse.
3. My point about the use of the “score” terminology has not been addressed. This term is used (55 times in the whole document) indifferently to describe the indicator (e.g. p2 l5-6, p3 l63, p3 l87) and the value taken by this indicator (p4 l55), which makes it confusing to understand. Could the Author give a definition of “score” and check that it is consistent though the document?
4. The title does not reflect what the methodology is about. I agree with the Authors that the methodology is unusual but many things can be defined as unusual. I believe that the title should provide insight on the novel aspects discussed in the document (the use of different meteorological years)
5. Regarding the qualitative aspects of the methodology. I agree that the indicators are calculated quantitatively but the judgement made on whether the results are good or bad remains subjective. This judgement is based on expert knowledge (e.g. that a correlation of 0.5 is very good for a given species). In my view, this limits the benefit of the methodology, as users need to know a-priori what a good behaviour is. I understand that the key point is in the use of several years of data but if at the end the interpretation of the indicator depends on expert judgement, this is a limitation. Examples (p5 l25-27; p5 l47-50; p6 l11)
6. I still do not understand why observations cannot be used to fix a minimum threshold. According to me, values of SN and D calculated on the only basis of the set of observations (substituting the model value by the observation of the reference year) could be calculated to make the approach a little bit less subjective.
7. English has been improved but many misspells and unclear sentences remain (only few examples provided below).
8. P2 l32 require
9. P2 l64 A couple of lines would be needed to indicate that the Authors now start the description of the methodological approach.
10. P3 l66-69: unclear
11. P3 l72: The bias is an indicator, not a score!
12. P3 l73: I disagree with the Authors, the RMSE is not driven by bias. Depending on the variable and the period of time considered, the RMSE can be dominated by correlation, bias or by standard deviation.
13. P3 l78: unclear formulation, please re-phrase.
14. Figure 1: I guess MYV should be Imv
15. p5 l 23: have --> has
16. p5: why a subscript “s” in Ds
17. Figure 2; MYV should become Imv I guess
18. P5 l70: unclear, please re-phrase
19. P6 l14: year --> years
20. P6 l16-18: please re-phrase
21. P6 l27-28: disagree: see point 12