Comment on gmd-2021-361

General comments The manuscript by Chien and colleagues describes the coupling of MOPS to FOCI and evaluates its large scale performance against extensive observational datasets as well as other Earth system models. The latter was summarized in form of standard statistical metrics. The content has sufficient details and fits well the scope of GMD. The outline is well organized and easy to follow. Figures are clearly presented and easy to interpret. In some cases, too many references to the Appendix makes the paper difficult to read, but this is minor shortcoming. I provide some specific comments for the authors to consider to further improve the paper. The remaining minor comments should be straight forward to address.


General comments
The manuscript by Chien and colleagues describes the coupling of MOPS to FOCI and evaluates its large scale performance against extensive observational datasets as well as other Earth system models. The latter was summarized in form of standard statistical metrics. The content has sufficient details and fits well the scope of GMD. The outline is well organized and easy to follow. Figures are clearly presented and easy to interpret. In some cases, too many references to the Appendix makes the paper difficult to read, but this is minor shortcoming. I provide some specific comments for the authors to consider to further improve the paper. The remaining minor comments should be straight forward to address.
Specific comments. While the authors provide an extensive overview of the model-data evaluation, there is not much discussions on why the model performs well or badly for some variables or in some regions. For instance, there are only five lines on section 3.3.2 air-sea CO2 fluxes. The readers could spot biases in the eastern equatorial Pacific, Arabian Sea, and the Southern Ocean. But nothing is further described on the the reasons for this mismatch. Is the high DIC bias in the bottom watermass linked to the too strong outgassing in the Southern Ocean? In other variable evaluations, it would provide valuable insights to other modelers if the authors can provide additional information on why the model behave as such, not only biases but also when it fits nicely with observations (is this for the right/wrong reasons)?
In the last paragraph of introduction: "We also discuss the variability among ensemble members of each set-up.." I assume this is the three ensemble members, e.g., in the historical simulation. Unless I missed it, there really is not much discussions on the different ensemble members.
A brief discussions and perspective on the future applications or development plans for MOPS could also be included in the conclusions. The authors nicely present several shortcomings and limitations of the model. What are the plans/strategies to alleviate them?
Climatology and large-scale mean states of the model are extensively evaluated, how about the surface seasonal cycle? If this is not done yet, I don't expect the authors to do a full evaluation on seasonality but this could be mentioned in the future plans, given its importance for improving future projections.
The author correctly stated that bias in physics could have significant impact on the ocean BGC. It would be valuable to show evidence that this is actually the case. For instance on Sect 3.2.1, the authors mention that positive bias in DIC and ALK (also shown in nutrients, and negative bias in O2) might indicate a too sluggish ventilation of deep waters. On the other hand, the simulated bias in surface productivity could also lead to similar bias. The authors could improve this statement by supporting with some basic physical evaluation in the paper. E.g., how well does the physical model represent the large scale surface circulation? Geometry of the interior watermasses, strength of AMOC, etc.?
One limitation of MOPS is that CaCO3 dissolution is independent of CO3 saturation state and could impact the interior alkalinity (L233). This could also be mentioned in conclusions, if there is any plans to improve this or any implications on future projections using MOPS.
Please briefly describe the land carbon cycle component, and how the land carbon fluxes is estimated. E.g., in the CO2 emissions-prescribed simulations, you showed that the atmospheric CO2 is higher than the observations-based estimates; is this related to too low land carbon fluxes? If you prognostically calculate the land CO2 fluxes, you can estimate the historical compatible emissions following Liddicoat et al. (2021) to determine whether the indeed the land uptakes is too low.
In many of the figures where model results are compared with observations, please state in the captions the source of observation, including the window periods, and references for obs.
Minor comments: L5: air sea gas exchange is part of the carbon and oxygen cycles. Suggest rephrasing to: " … the marine carbon, nitrogen, and oxygen cycles with prescribed or prognostic atmospheric CO2 concentration." L9: … changes in ocean carbon and heat contents, are … L11: remove 'also' L21: leads to changes in ocean circulation L22: wind patterns .. change the Southern Ocean upwelling and increase natural CO2 outgassing. The increased natural outgassing in the Southern Ocean is also shown in modeling studies (Zickfeld et al. 2007;Tjiputra et al. 2010). L32: ocean … is required. This includes an adequate representation of the marine carbon uptake variability on … L40: MOPS; L42: as mentioned above, air sea gas exchange is part of the cycle. L44: iron and silicate L45: 'has the advantage' I am not sure advantage relative to what? I presume other models also calibrate their parameters. L75: replace for with from L77: Altogether, the ocean … L78: Could you list the tracers and the corresponding chemical elements? L82: Together with a constant remineralisation rate, this would, in the absence … L93: add comma between burial and homogeneously; remove 'at the sea surface' L91-98: does this mean there is no sediment module in MOPS, e.g., there is no organic matter remineralization in the sediment, hence no fluxes of carbon, O2, ALK, across the sediment-water interface? L105: that were derived from … L141: Are plankton, DOM, and detritus concentration initialized to zero at the start of spinup? L142: specify what is "small", e.g., XX% per 100 years. Please also provide the drift rate for NO3. L158: The remaining small … L159: simulations, where they were .. L164-5: did you mean "…subtracting the piControl and ESM-piControl simulation trends from the corresponding historical runs."? L167: the absolute tracer ….. which is initialized from the end … L170: DOP; the model .. L179: 8 mmol C m^{-3} L185: … by the temperature-dependent solubility. … balance between ocean …. L197: In the interior, the model-data misfits are .. L185-195: Could you explain the low oxygen bias in the bottom water in both Pacific and Atlantic. Seems consistent with the too much regenerated PO4. What causes the excess O2 bias in Fig. 4h-k. Are these associated with/sensitive to the parameter b described in A1.2? L234: which is also consistent with … L236: In this section please state clearly whether the spatial distribution here refers to surface only or also includes water column? L240: due to the lack of … L244: be more specific on the region, i.e., in the equatorial because on the surface Southern Ocean, both PO4 and phytoplankton in the model are higher than observations, L250: why the primary production, phytoplankton, and zooplankton are overestimated around the Equator but POP is underestimated? L252: The spatial distribution of dissolved … L252: Please also mention uncertainty range (if known) in the observations. L253-4: explain why DOP is negatively correlated with POP L254: "too long remineralization timescale": but if you shorten the remineralization timescale, wouldn't this lead to even higher surface PO4, which is currently already overestimated? L256: suggest: Statistical performance of the simulated inorganic and organic tracers L259: high bias here only at 0-100m, right? L271: the term RMSE' first shows up here, please define it. L276: component is dynamically L277: spatial shift in ocean current L279-80: For unfamiliar readers as myself, please briefly explain what BD, HD, and L1 represents, e.g., what is considered as 'good' values and why. L287: tracers. L290: and with a high bias L307: ventilation, particularly in the … L309: not sure if global flux is the correct term, perhaps global denitrification rate? L345: It's not quite similar. The heat content generally has positive trends from 1900, whilst surface temperature has weak-to-no trends between roughly 1940 to 1970. L362: The O2 anomalies … L368: The NO3 concentrations … L386: … the model large-scale performance is comparable with other CMIP models. L387: The spatial patterns of … L412: 'fixed nitrogen loss and gain' do you mean constant? Please clarify. L421: 'light during a day' sounds strange, perhaps 'daily light intensity'? L443: I believe you need to multiply this by phytoplankton concentration. L447: please check eq. A6, the unit doesn't add up. L463: The DOP is remineralized in all … L469: detritus (DET) is remineralized with … L563: variable 'a' is already used as C:P ratio, use other symbol. L575: 'passive tracers', my understanding of passive tracers is that there are no sources and sinks in the interior. This doesn't seem like the case in MOPS. L590: column 3 in Weiss (1974). L593: .. of it is buried .. L628: suggest removing this sentence and simply cite Kriest et al. (2020) in the end of the previous sentence. L670-1: the numbers are different than that shown in Table B1. L676: conversion L679: remove 'but' L722: normalized by the observed mean L746: is evaluated.  Fig. 16 caption: state which panels are 'column-integrated' values. Fig. 18 caption: what is 'summary tab'? Fig. 18 panels a-c: please use different color for the ESM-piControl ens mean line (difficult to distinguish with ESM-Hist). Fig. 19 caption: replace 'Differences in' with 'Cumulative' Fig. 19 is a nice figure to illustrate the dominant role of ocean physics in regulating the spatial pattern of oceanic carbon storage (see also Tjiputra et al., 2010), but this figure is only briefly mentioned on line 341. Some explanation of why such pattern exist would be useful. Why there are considerable difference between Hist and ESM-Hist in panel (a), e.g., between 45S and 65S? How this difference affect the interior DIC distribution, i.e., Fig. 9 of ESM-Hist run? Fig. B1 caption: missing closing parenthesis after B2 Table1: -why use quotation marks in ESM-spinup -piControl description: … 500 of the …. -ESM-piControl description: … 250 of the …. -Hist description: … 500 of the … -ESM-Hist description: … 250 of the … -*Hist description: Historical simulation following the CMIP6 protocol with prescribed …. Table 2: Are these values from comparison with non-interpolated data (see L265), please clarify. Also some brief introduction/motivation of comparing your results with I2013, S2013, K2014 would be appreciated.