First, regarding the observation error, which includes data, model (and representation) errors. I agree the χ2r test is a good way to assess the observational error is well defined. However, for the optimisation problem at hand, I have three comments:
1. this test is chi-squared per degrees of freedom: χ2r = χ2 / N, where N is the number of degrees of freedom, i.e., number of observations minus number of parameters (e.g., Taylor, 1997). By including only the formed, the authors might thus underestimated the value of χ2r.
2. the authors state that “Because representation errors will be large at low-resolution this analysis cannot be performed using the low-resolution model used elsewhere in this analysis.” (P7, L7-8). I wonder then what is the informative value of a test performed on a different configuration than that used for the rest of the study? All the more that this “representation error” directly influence the value of χ2r, as it is included in the observational covariance error matrix Cd.
3. More generally, it seems to me that the value of χ2r should anyway expected to be larger than one. Indeed, no optimization is performed in this study, so that model-data won’t likely be excellent, and the χ2r value precisely translates a goodness-of-fit given the trust put in the observations. Contrarily to what is stated in that paragraph (P7, L5-11), my understanding is that having a value of χ2r lower than one should not be a target as it translates overfitting and underestimation of observational error (too much trust in the observations, i.e. model outputs and/or data), all the more if the prior model-data fit is not expected to be great. Note that in (Kuppel et al., 2013), χ2r was estimated using the optimised model (Eq. 6) –albeit with a slightly different formulation following (Tarantola, 2005).
I would thus recommend the authors to assess the reduced chi-squared statistics at the same resolution as that used for the rest of the analysis, and discuss on this basis. The advantage is that the impacts of both neglected structural and representation errors are included here. If the value of χ2r is significantly larger than one, it can arise from underestimated structural error, underestimated representation error, and/or model-data misfit. Acknowledging this from the start and for the discussion of the presented parametric uncertainty reduction would give more depth to the paper.
Second is the systematic error. I have a hard time understanding how the author can propagate a “systematic error of unknown sign”, for two reasons:
1. the very basis of error propagation used in this study assumes Gaussian error that can be propagated linearly, i.e., using matrices (P3, L14-22). The fact that the sign of the systematic error is not known does not necessarily make it suitable for statistical analysis, let alone be characterized as Gaussian. How are the authors including this “systematic random error” of ± 0.1 W.m-2.μm-1.sr-1 in their propagation framework? What “seasonality” is applied to it? Note that the sign of a bias is indeed generally known and its magnitude rarely is, but that is not enough for statistical analysis (Richardson et al., 2012, pp. 175-177).
2. The author claim (P9, L21-22) to perform “a sensitivity test of incorporating this systematic uncertainty into the error propagation system to indicate how an error in the zero-level offset may propagate through to uncertainty in GPP.” Following my previous comment, I was curious to see how the authors applied their methodology. However, I could not find where the results are reported. What is then the basis for asserting that “We also find that the effect of incorporating the error from the zero-level offset in the SIF observations is negligible on posterior parametric uncertainties” in the Discussion (P19, L19-20)?
3. Later in the Discussion, the authors state that “A known systematic error in forcing variables (e.g. Boilley and Wald, 2015) cannot be considered in the present error propagation system, however, in such a case a correction to the data should be performed as it will bias carbon flux estimates”. Why not apply the same framework for measurement error?
As it is formulated, the methodology is imprecise and the result not reported, so it is hard to assess what the authors have effectively done.
Other comments
• This is more of personal preference, but I think Eq. (6) should read $d_i = d^t_i + \epsilon_i + z$
In my view, this translates more clearly the fact that retrieved measurements derive from a true value affected by random and systematic errors, and not the other way around.
• Some questions marks are present in the revised manuscript where references should be, presumably from missing LateX bibliography pointers.
References
Kuppel, S., Chevallier, F. and Peylin, P.: Quantifying the model structural error in carbon cycle data assimilation systems, Geosci. Model Dev., 6(1), 45–55, doi:10.5194/gmd-6-45-2013, 2013.
Richardson, A. D., Aubinet, M., Barr, A. G., Hollinger, D. Y., Ibrom, A., Lasslop, G. and Reichstein, M.: Uncertainty quantification, in Eddy Covariance, pp. 173–209, Springer., 2012.
Tarantola, A.: Inverse problem theory and methods for model parameter estimation, Society for Industrial and Applied Mathematics, Philadelphia, PA., 2005.
Taylor, J.: Introduction to error analysis, the study of uncertainties in physical measurements., 1997. |

Thank you for your resubmission to

GMD of your article.

I like to inform you that your paper is again subject to major revision for Reviewer #1's comments. On the other hand, tour reply to the comments from former another reviewer in the last version was already fine.

If you resubmit your article, please note that you carefully respond to all the comments one by one.

Best regards,

Tomomichi Kato