Comment on gmd-2021-16

Personally, I would be more careful in recommending the application of PARTYSOCv2.0EU in European agricultural topsoils (line 43), since the performances were lower both in the ‘leave-one-out-site’ and in the two independent sites validation. This is an indication that PARTYSOCv2.0EU would benefit of additional training sites; due to great variability of pedo-climatic and agroecosystem conditions in Europe, application of machine learning methods outside the range of their training can be critical.

The methodology is amply described as well as the validation process, therefore, I have only minor recommendations.
Personally, I would be more careful in recommending the application of PARTYSOCv2.0EU in European agricultural topsoils (line 43), since the performances were lower both in the 'leave-one-out-site' and in the two independent sites validation. This is an indication that PARTYSOCv2.0EU would benefit of additional training sites; due to great variability of pedo-climatic and agroecosystem conditions in Europe, application of machine learning methods outside the range of their training can be critical.
Also the discussion presents, sometime, some repetitive concepts. I would also have expected more about comparison with other approaches especially in term of cost-benefit. While this method can be applied on existing monitoring schemes (as it requires a soil sample), there is no information of the cost of the thermal analysis, the complexity etc., which are interesting aspects if the aim is to propose a routine method. Line1: I would suggest adding in brackets after 'active' (with turnover time of months to a few years).
Line 77-80: it seems that the approach proposed is quite insensitive to the number of samples training the model. Indeed only 7 sites are used and the 'leave-one-site-out' validation is worse that the 'internal'. This is a likely sign that also this model would benefit from additional training. So I don't understand the concept that other methods 'need to be inferred from statistical models or infrared spectroscopy'. Isn't it also what the authors are doing, training a ML on measurements?
Line 82. I generally agree, although I would like to point out that some fractionation methods don't necessary aim at isolating kinetically defined pools, but rather pools underlying pathways of SOC formation and stabilization (e.g. the work on MEMS by Cotrufo et al.) .
Lines 85-86: yes, unless models are built to predict model fractions (eg. MEMS, COMISSION and others) Fig.1 left panel. Is it a conceptual figure or size fractions were effectively analyzed by RockEval?
Line 195: It seems that authors suggest that the lowest SOC treatment received less 'unwanted' C input, while its lower value may be due to any source of uncontrolled variability. Are the results very sensitive to this approach and is there any risk to, opposite, underestimate the centennial carbon pool?
Line 275: maybe 'inferred' is better than 'calculated' as a fitting procedure was generally used. Line 358: if I well understood, out of those 105 samples, the centennial stable pool was inferred only from the LTBF and, then, assumed to be the same for all other treatments within the same site. I was wandering if some agronomic pracrices (for instance organic application) can bias this assumption. In fact, as far as understood (line 315), treatments with repeated application of some types of exogenous organic matter were not considered. My question is whether this poses a limit in the wide applicability of the method, since lot of soils receive manure and compost in Europe. Table 2. Adding one site (La Cabana), the rank of the variable importance changed as well as the predicted centennial SOC proportion on a different extent depending on sites (Fig.  2b, Fig. 3a,b). Have authors considered to introduce additional variables in the Random Forest model (eg. texture) to make it more robust?