Reply on RC2

First, I suggest the author further consolidate their objectives of their manuscript. It reads like that the authors are going to address the scaling issue from point to global models in the introduction part, but the results only stay at model evaluation at a fixed scale (i.e. 1km). If the objective is scaling, the claim that “the regional AquaCrop model proves to be useful in assessing crop production and soil moisture at various scales and could serve as a bridge between point-based and global models” is not well backed up by the analysis in the manuscript. There is only one scale for model simulation in the current manuscript, i.e. 1km scale. Most importantly, it is unclear how the regional model simulation can serve as a bridge between point-based and global models. The scaling issue from point-based to global models is not touched in this manuscript at all, but deserves a further investigation in the framework the authors developed. For example, when assessing the soil moisture, the authors aggregated the 1km soil moisture simulation to 9km. In other model setups like GGCMI, the model would be running at 9km or even larger scales. How do the performances of different model setups vary and what are the controlling factors for those performance variations? I think those are the key questions to be answered and would also be more interesting for the crop model and land surface modeling communities. The authors can actually test those questions for both biomass and soil moisture simulations with their regional model simulation platform.

First, I suggest the author further consolidate their objectives of their manuscript. It reads like that the authors are going to address the scaling issue from point to global models in the introduction part, but the results only stay at model evaluation at a fixed scale (i.e. 1km). If the objective is scaling, the claim that "the regional AquaCrop model proves to be useful in assessing crop production and soil moisture at various scales and could serve as a bridge between point-based and global models" is not well backed up by the analysis in the manuscript. There is only one scale for model simulation in the current manuscript, i.e. 1km scale. Most importantly, it is unclear how the regional model simulation can serve as a bridge between point-based and global models. The scaling issue from point-based to global models is not touched in this manuscript at all, but deserves a further investigation in the framework the authors developed. For example, when assessing the soil moisture, the authors aggregated the 1km soil moisture simulation to 9km. In other model setups like GGCMI, the model would be running at 9km or even larger scales. How do the performances of different model setups vary and what are the controlling factors for those performance variations? I think those are the key questions to be answered and would also be more interesting for the crop model and land surface modeling communities. The authors can actually test those questions for both biomass and soil moisture simulations with their regional model simulation platform.
If the objective is model evaluation, I suggest the authors rewrite their motivation part in the introduction. Model evaluation with newly remote sensing data is also interesting, especially in the context of further data assimilation (as indicated by the authors in the conclusion part) experiments in which we need to have some information about the model uncertainties.
Answer: We agree to rewrite the introduction to clarify the motivation of this study. The flexible set-up of this model could serve for many different applications such as scaling, but indeed, that is not analyzed here and this part will be removed from the introduction. Instead, this work focuses on the evaluation of large-scale model simulations which will next be used in the setup of a satellite-based data assimilation system for sequential state updating. We therefore do not look at absolute differences between model simulations and observations, but instead aim to capture the seasonal and inter-seasonal variability of the model output.
Second, the authors make many simplifications in their model set up. For example, they set up a generic C3 crop in their simulation. However, this is not well justified. At least, I see a hot spot for corn production in their region. The authors may need to take into consideration of C4 crops or at least quantify the uncertainty of neglecting it (which is not reasonable). They also found that the soil moisture simulation performance is higher at areas with smaller AEI (indicating irrigation area fraction). However, the irrigation is not simulated in their set up. This raise a question: why do we care the performance of an unrealistic model set up? The performance evaluation is only valid when the modelers tried their best to mimic the reality. Otherwise, it is too arbitrary to say anything about the model performance when there is great uncertainty in both model simulations and satellite observations. Answer: To be able to run the model over such a large domain, we had to make some simplifications. As previously mentioned, the way the model is currently set up is to analyze relative biomass and moisture changes and not to evaluate absolute values. The main difference between C3 and C4 crops in the AquaCrop model, is the Water Productivity factor, which is much higher for C4 crops. This would however not affect the relative temporal pattern of biomass production, which is analyzed here. We would also like to emphasize that our aim is not the estimate yield production of the crop, which would require much more specific information about the crop type, but we aim to estimate variations in soil moisture and crop biomass over time. With the subsequent data assimilation system, we want to (i) further correct the temporal variability in the simulations via state updating and (ii) possibly correct the absolute values of the simulations via parameter estimation.
This will be further clarified in the introduction and new discussion section.

Third, it seems that the transpiration simulation in AquaCrop plays a very important role in simulating biomass and soil moisture. Why not do some assessment on transpiration simulation with flux tower and remote sensing ET data?
Answer: Thanks for the suggestion. We will take a look at the datasets from FLUX towers over Europe (at cropland sites), but a first screening shows that the data accessibility and availability is somewhat limited for our simulation period. We have submitted a request to find out more, but in any case, these data would only enable to evaluate the total evapotranspiration, not transpiration separately. As an alternative, the evapotranspiration data from GLEAM are satellite-and model-based, and offer separate transpiration and evaporation estimates for our study domain and period. However, these data are produced at a much coarser resolution (25 km) than our model simulations. For the reasons previously mentioned, we would again resort to evaluation metrics such as correlation and ubRMSD. In short, we will include an evaluation of (evapo)-transpiration, but we believe that it might inevitably be less comprehensive than the evaluation we already provided with a range of soil moisture data.
Forth, the authors directly jumped to conclusion after showing their results. Are there any insights to be discussed from this model evaluation effort? I suggest the authors bring up their most important findings and give more implications about crop model set up and evaluation at regional scale in the discussion part, which is now totally missing. Otherwise, the scientific merit of this manuscript is largely limited.
Answer: Thank you for pointing this out. In the revised version, we will rearrange the discussion and add a section of the model findings in relation to this regional model setup, dissecting the advantages and possible improvements.
Other comments: L80-81: please specify the soil layer depths you used in your regional setup. This is critical information when you want to compare your simulation with satellitebased soil moisture retrievals.
Answer: A map was used from ESDAC to define the soil depth for each pixel. The soil layering in AquaCrop will be better explained and be made more clear by adding the flowchart. The soil moisture analysis was done using the AquaCrop top compartment of the soil, which is about 10cm for soils deeper than 1m.

L81-L85: more description about hydrology (runoff, percolation, …) in the model is required as evaluating soil moisture simulation performance is an important component of this manuscript.
Answer: A description of each component in the water balance will be added to the revised manuscript.
Section 2.2: it would be good to have a flowchart for the regional setup.
Answer: Thank you for the suggestion, a flowchart will be added to the revised manuscript.

L277-L278
: how about also aggregating CGLS-SSM to 9km and compare it with model simulations at the same scale with SMAP data? That would be a more fair comparison.
Answer: Good suggestion, this could indeed be done to better assess the regional performance of both satellite products, and compare SMAP and CGLS-SSM in a 'more fair' way by filtering out the noise in the CGLS-SSM product. However, our goal is not to compare the satellite products, but to use them for the evaluation of our 1-km model simulations. The CGLS-SSM product is now used at the resolution for which it is intended to be used. If useful or needed, we suggest to aggregate both the 1-km model simulations and 1-km CGLS-SSM to the 9-km resolution of the SMAP data and include those skill metrics.