Comment on gmd-2021-290

This manuscript describes a collection of datasets (the “curated ESGF replica”), a modified analysis package (a “paleo version” of CVDP), and a set of scripts written in NCL and Python for additional data analyses and visualization. Although not a model or stand-alone software package, it certainly codifies and advances the analysis of “CMIP-like” modeloutput data sets. The current approach involves downloading hundreds of files, a lot of fixing up and intermediate analysis, and finally summarization and visualization, with each investigator and group applying their own preferred set of tools, often involving other fixups and ad hoc analysis steps. The workflow described here goes a long way toward establishing a “best practices” approach to the analysis of CMIP-type data sets in general, and to PMIP paleoclimatic simulations in particular.

One thing that is absent from the beginning of the paper is a discussion of the "why do we need this?" question. A naïve reader might wonder if given a bunch of standardized files (from the ESGF), how hard could it be to produce some figures? Although the files may indeed be standardized, the models have, for example, differing resolutions, the simulations are of differing lengths, and there is considerable data-reduction and transformation that can not be transparently implemented using simple scripts, plus there are many, many variables-hence the utility of the present paper. The place for this discussion might be a new intro paragraph to Section 2 (or at the end of the Section 1) that describes the problem/motivation as in the previous sentence, with the solution being the workflow and tools described here. The sentence beginning on line 293 is a good overall summary, but it's at the end of the paper.
In fact, the workflow and tools could be better illustrated-the components are scattered across several GitHub repositories, and on first reading of the manuscript, I didn't immediately see how one would use the "curated ESGF replica" (or where it was). It took plowing around in repositories to figure that out. I think a figure that illustrates the data assembly, curation, development of second-order data sets (i.e. the CVDP output), and production of summaries and illustrations would be good substitute for Fig. 2, which focuses on a different subject.
There are lot of technical terms, both climate-and IT-related. Readers might benefit from a few short "in-line" definitions on first use, or URLs to appropriate web pages.
The term "ensemble" is used in several distinct ways here, a) to represent the whole collection of simulations, as in the title; b) to describe all of the individual simulations for a particular experiment (i.e. "lig127k ensembles", line 87); and c) to represent multi-model means, e.g. lines 94, 163). I suggest that the whole collection (i.e. PMIP4-CMIP6, PMIP3-CMIP5, etc.) be referred to as the "collection", all of the simulations for a particular experiment as the "ensemble", and the multi-model means as "multi-model means". There's also a little fuzzy usage of "ensembles" (line 89), where the term could mean "ensemble average" or "ensemble members" ("the subset of models").
The paper does need some mechanical work and smoothing out for readability.
Specific comments: line 9: "… to test the out-of-sample response to…" It's not the boundary conditions that are being tested, but instead the response of the models (to boundary conditions different from present).
line 15: I think the non-MIP-enabled reader might benefit from a short introduction, maybe a sentence, to the overall notion of using multiple models to simulate the climate while adhering to an identical experimental design, and evaluating those simulations using obervations.
line 16: "… improved forcings and boundary conditions by the new generation of climate models…". There's really two things there, updated experimental designs, and new models.
line 23: Not just topography, but also ice-sheet size.
line 41: Reverse the order of "Curating and Collating" to reflect the order that the work is done in. Also, I'm not sure "collating" is the right word; it's main definition includes the ideas of critically comparing or arranging items in order, while I think the work that was done here was perhaps more like "collecting" ESGF data, other data, and the results of the CVDP output.
line 45: How many source files are there? (i.e. just the PMIP-CMIP ones, not the whole repository).
line 50: You might do an in-line explanation of what "r1i1p1f1" means. Same for "areacello" a few lines down.
Line 58: "curated replica". I would reverse the order of discussion to state that you sought to build a replica of part of the ESGF data base (and why), and then how it was populated.
line 60: "varying years" Does this mean varying length of simulation? Because individual models use fake years for paleo simulations, there's really little overlap in years. Did you use the last, say, 100 years of each simulation as the common period?
line 74: "Calculation of regional mean temperature… used the adjusted monthly temperatures." line 77: "midHolocene experiment." line 79: I would substitute "while" for "despite". line 80: "Interactive Atlas" This is the first mention of this, and probably should have URL.
line 85: "This has the disadvantage…" It's not really a disadvantage, it's just the way it is. Calculating annual mean temperature as the simple average of monthly mean value, without weighting for month length will always yield a different result than calculating the annual average from, say, daily data (unless the months have equal number of days). line 165: "The plotting routines allow 'resources'…". This is a generic feature of NCL and is not specific to the code described here. For the benefit of non-users of NCL, it might be good to include a couple of sentences that describe the basic architecture of the language.
line 189: Define "Docker" (or provide a URL), and "bespoke NCL analyses". line 197: "These" Ambiguous. "summary data" or "CVDP output"? line 198: "… collect together a single piece of information…" Single pieces? line 204: "the global-mean surface temperature (something) also includes" line 205: "equilibrium climate sensitivity"? line 205: "temperature changes" Does this mean "long-term mean differences"? line 209: "… climate models as reported in the Technical Summary…" line 216: "… and monsoon (something)…" line 222: I think "temperature change" is ambiguous. These are long-term mean differences, paleo minus present (the usual convention, but not everybody knows that).   line 283: "NAF expansion" It's not the region (NAF) that's expanding, but the areal extent of the monsoon in that region. line 295: "…this manuscript obviates…. The manuscript surely helps explain things, but it's the modification of the CVDP and the NCL and Python scripts that will reduce the need for reverse engineering. line 228: "The process used to create (these images? these directories?)…" line 229: This list isn't parallel. Change (4) to "regrid the difference". ("Anomaly" in climatology generally refers to a particular observation minus its long-term mean.) line 243: "built" line 243: "We provide example code…" line 249: delete "an"; hyphenate "ensemble-mean" line 264: "The CVDP can also be used to analyse…" line 268: Replace "at" with "is"?