Comment on gmd-2021-403

The authors present an attempt at describing the implementation of a phosphorus cycle into a major land surface model and its evaluation. They compare simulated carbon variables with few observations and provide another model-based quantification of P effects on NPP, C stocks under 2 years of elevated CO2 which is compared to existing model predictions. The inclusion of P cycles in ESM is certainly a timely and important endeavor given the growing evidence of the importance of phosphorus cycling on land surface conditions and greenhouse gas fluxes . However, my main concern is that this study adds little to existing studies: (1) the evaluation of the model falls short thereby no new insights could be gained; (2) the eCO2 experiment is a repetition of intermodel comparison of Fleischer et al. (2019) and provides no new insights. Potential new contributions could have been (1) resolving root exudates. But (guessing from the incomplete model description) it seems a simply a relabelling of the 'excess NPP' (i .e NPP which cannot be allocated to new biomass growth (Thornton et al 2007, Goll et al 2012) as now 'root exudates'. On top of that, no attempt has been made at its evaluation and the authors seem to confuse observed BP as NPP (BP + root exudates). Thus, I am not sure anything new has been learnt here about. (2) the use of response of biota to nutrient addition from the AFEX experiment.

cycles are closely intertwined, so you must have made some simplification to switch them on and off. As interactions between NP cycles are not explained one can only guess how NP affects C fluxes. This prevents the reader from understanding the implications of the model results as major underlying model assumptions are not given. it is not clear which modelling approaches are novel and which are based on concepts from previous studies/models. The majority of process representation seem to be taken from earlier work (like early P work in JSBACH, CLM, ORCHIDEE, CABLE). The authors fail repeatedly to credit earlier works (most references given are related to JULES) and to justify their modelling choice. The presentation of model equations is poor making and many inaccuracies make it hard to follow (see minor points related to eq listed below). Model input parameteres are not given 2 Some of the assumptions / choices are in contrast to current understanding and consensus while no explanation was given.
The assumption that CNP cycles are in steady-state with present day conditions (1999-2019) is not appropriate. There are multiple lines of arguments, that the historic increase in CO2 has led to a progressive limitation of nutrients (e,g, Luo et al 2004, Goll et al 2012, Penuelas et al 2013 and that present day land carbon cycle is not in equilibrium. The non-steady state of the present day CNP cycles is accounted for in the majority of modelling exercises (including Fleischer et al 2019. Trendy modelling protocol). The historic increase in CO2 is likely the more dominant factor affecting the present day state of C cycle compared to (progressive) NP limitation. Thus this omission is a major shortcoming, in particular as model predictions which account for this exist (i.e. Fleischer et al 2019). You should at least test what the implications of omitting this on the results are / better redo the whole experiment. There are several highly uncertain parameters in your model. It is not clear why you varied in the sensitivity test, only (a few) parameters which happen to be among the few observed ones and not choose more uncertain parameters? Besides, the impact of varying stoichiometry has been investigated in earlier models with a comparable plant P cycle (Goll et al 2012). Some processes usually included in model have been omitted without giving any rationale why. E.g. Why is phosphatase mediated mineralization being omitted? It seems all major P models account for it. I don't imply the author must account for it, but if they choose not to, an explanation should be given. However, Fig2 indicates modelled soil organic P is quite high compared to observation, which would point towards missing biochemical mineralisation (which reduces organic P by enhancing its turnover) is problematic; Why is plant internal nutrient and carbon storage being omitted? Previous modelling studies showed the importance of accounting for such storage pools, see e.g. Yang, Xiaojuan, et al. "Global evaluation of ELM v1 and the role of the phosphorus cycle and non-structural carbon in the historical terrestrial carbon balance." AGU Fall Meeting Abstracts. Vol. 2020. 2020. What about N losses like leaching or erosion, inputs from atmospheric deposition? Are they omitted? why?
The observed NPP is based on biomass increments (Fleischer et al 2019; SI table 1). It should be thus referred to BP and not NPP. Also the observed CUE is BP/GPP and NPP/GPP. There is little support that CNP performs better than the C version. The short period of 2 years considered in the evaluation is inappropriate. Fluxes like NPP and GPP show large interannual variations. The fit of modelled fluxes with a longterm mean of fluxes (e.g GPP from Fleischer et l 2019) could be merely by chance. The justification for using only two years ( line 470 ) is not plausible as you don't use only the soil P measurement for evaluations (see first point). You should evaluate modelled variables over a longer time period. The soil P measurements were used to calibrate the model not to evaluate (misleading labelling in table 3) There are several datasets available to evaluate nutrient cycles in ESM. Some of the remote sensing products have a spatial resolution sufficiently high to compare to site simulations. e.g. Sun et al 2020, Hou et al 2020. Sun, Y., Goll, D. S., Chang, J., Ciais, P., Guenet, B., Helfenstein, J., Huang, Y., Lauerwald, R., Maignan, F., Naipal, V., Wang, Y., Yang, H., and Zhang, H.: Global evaluation of the nutrient-enabled version of the land surface model ORCHIDEE-CNP v1.2 (r5986), Geosci. Model Dev., 14, 1987Dev., 14, -2010Dev., 14, , https://doi.org/10.5194/gmd-14-1987Dev., 14, -2021Dev., 14, , 2021. There seems to be eddy covariance tower nearby. Why hasn't been any data other than long-term everage GPP being used to evaluate the model? It is misleading to refer to a control plot of AFEX as a fertilizer experiment. I assume there is no data available from the fertilizer experiment, otherwise you should take advantage of this data to evaluate the model (see main point). I would suggest dropping AFEX and referring to the plots as nearby plots of K34 tower. more specific points:

Model description
Which processes / C Fluxes are affected by N and P limitation? How are interactions among N P limitations accounted for? Does P affect N fluxes and vice versa? Eq6,7,9,10: the flux decPi,n -you should specify the i here -otherwise it seems the same fluxes are substrate from two different pools / i.e double accounting of fluxes. Eq9,10: What are the factors 0.46 and 0.54 -how are they derived? They should be parameters listed in Table2. Eq11: It is not clear how the delta P_O_S equals the sum of all P subpools. Do you mean the change of all P subpools? 205-207: can you indicate the equation describing the fluxes (as done before e.g. line 195) The calculation is missing for total exudates, and the subcomponents related to growth and spread (eq16) Plant P:C ratio (eq20) What is dC/dt in eq 19,20 Eq19 how does this NPP differ from the one in eq17 -or is it the same? Eq28: how does R_in related to R_n? Eq29: what is DEM_DPMn and DEM_RPMn? What is the rationale behind this formulation? Eq30: why is the plant demand a function of soil P availability and not plant P demand? Eq31-36: give the rationale behind the choice of equation. Table 1: misses variables e.g. desorption fluxes Table 2: misses value for parameters which are PFT or depth dependent Eq39,40 is nleaf the same as N? Line 248: : Zaehle & Friend has no P cycle, the references seems inappropriate Line 438: why do you need to cap these fluxes? Why can these max be assigned to observed stocks? You should explain. Line 458: why do you prescribe LAI? Isn't that computed prognostically by JULES?
The choice of various different ways to label fluxes make it hard to read the equations. I would suggest adopting a more homogenous way. Here are some examples how inhomogeneous labelling: compare fluxes in eq9 Eq10 and eq12 use two different spellings to refer to desorption. Same goes for occlusion in eq 12 and 14. For carbon fluxes (eq16) you again use also a mix of labels (e.g. greek letters for exudate, NPP. etc). See labelling of CP ratio in eq1-3 vs eq19ff You use P subscript to refer to potential and phosphorus. Eq 42-44 vs equations before: concerning leaf, root, wood c mass I didn't list all the inconsistencies here, there are more but it shouldn't be the job of the referee ... Line 228-235: this is hard to follow. It seems you list definitions of potential NPP, actual NPP and BP as a justification for your modelling choice. I struggle to make the connection.
Line 244-246: maybe first follow the BP idea before. Also I think it needs some more explanation to model litter production as a function of NPP and not plant pools.
Line 284: Zaehle & Friend has no P cycle.
Line 300-306, I can't follow the equation because of the confusing way of labelling to C:P ratios (see point above).
Line 308: remove 'However' Line 381-382: this repeats info given before (line 376 ..) Line 391-394: Please give information how data from plots were aggregated. You also indicate vegetation information from AFEX in Tab3.. How were vegetation C stock derived.
Line 398: you need to explain how photosynthesis is implemented in JULES. Is it a big leaf approach?
Line 428-436: you should explain which observable pools were assigned to which modelled pools. Line 470: why did you limit it to two years? Your evaluation is based on C variables, so your argumentation based on soil P measurement seems invalid.
Line 483: Be clear about the timestep use here.
Line 500-503: how was this done exactly? E.g. Did you vary the ratio one by one? The relative differences between C:P ratios is critical for the availability of P. Did you test for this effect (as has been done in earlier studies (e.g. Goll et al 2012)). The N:P ratios are usually much more constrained than the C:P ratios ? Did you take this information into account when varying C:P ratios?
Line 514: which pools exactly? How did you calibrate them?   Line 655: the units of GPP are misleading, i.e. monthly GPP as yr-1.
Line 674: why is it the higher soil water content which enhances uptake and not something? This needs to be demonstrated or explained in more detail.
L690: this is grassland study. You need to explain why this can be compared to a tropical forest.
L709: you should make the link between high competition for P and unclear role of P for plant CO2 response. L711: it's odd to refer to the site as well document. From your work it seems there is hardly any data available for model evaluation available.
Line 712: you included all major processes in detail ? Then the role of P should be clear now or?
Line 720-725: unclear formulations. Weren't the soil P pools optimized in JULES?
Lin 7 27: inappropriate reference Line 829: 'a cornerstone' -a more humble forumation could be use here