The data request of the Coupled Model Intercomparison Project Phase 6 (CMIP6) defines all the quantities from CMIP6 simulations that should be archived. This includes both quantities of general interest needed from most of the CMIP6-endorsed model intercomparison projects (MIPs) and quantities that are more specialized and only of interest to a single endorsed MIP. The complexity of the data request has increased from the early days of model intercomparisons, as has the data volume. In contrast with CMIP5, CMIP6 requires distinct sets of highly tailored variables to be saved from each of the more than 200 experiments. This places new demands on the data request information base and leads to a new requirement for development of software that facilitates automated interrogation of the request and retrieval of its technical specifications. The building blocks and structure of the CMIP6 Data Request (DREQ), which have been constructed to meet these challenges, are described in this paper.
The Coupled Model Intercomparison Project Phase 6 (CMIP6) seeks to improve understanding of climate and climate change by encouraging climate research centres to perform a series of coordinated climate model experiments that produce a standardized set of output. Twenty-three independently led model intercomparison projects (MIPs) have designed the experiments and have been endorsed for inclusion in CMIP6
The resulting collection of output variables (usually in a gridded form covering the globe and evolving in time) and the associated temporal and/or spatial constraints on them are referred to as the CMIP6 Data Request (DREQ). The modelling centres participating in CMIP6 are now archiving the requested model output and making it available for analysis. The DREQ is significantly more complicated than the data requests from previous CMIP phases, complexity which arises from the size of CMIP6 and the inter-relationships of MIPs. In this paper we describe the challenges, introduce the tools which were provided to capture and communicate the DREQ, provide some headline statistics associated with the DREQ, and outline some of the problems encountered and potential solutions for future exercises.
The challenges in the informatics domain associated with specifying a vast range of technical information are compounded by organizational and communication challenges associated with the diverse range of stakeholders and scientific contacts, many of them in ad hoc organizations which are themselves evolving in response to the broader CMIP challenge.
In Sect.
In the 1990s the data request for the first Atmospheric Model Intercomparison Project
The DREQ builds on the methodology established to provide those lists but has been adapted and extended to deal with new challenges both in the complexity of the underlying science and in the nature of the expanding community.
The transition from CMIP5 to CMIP6 is described in
The endorsed MIPs are organized by researchers with an interest in addressing specific scientific questions with the CMIP models. As part of the endorsement process, each MIP must demonstrate the backing of modelling groups who will execute the numerical experiments they specify.
The challenge of the process arises from the scale and diversity of the subject matter. The 23 participating MIPs
are all international consortia, some of them organized many years ago and others formed specifically for the CMIP6 exercise.
The syntax of the technical requirements relies largely on the NetCDF Climate and Forecast (CF) metadata conventions
Evolving requirements added complexity to the design and implementation of the DREQ. These requirements arose through interactions between the data request, the MIPs, the committees governing CMIP6 and other elements of the infrastructure described in
The diagram of a section of floating land ice and some sea ice. The vertical black lines delineate the boundaries of five hypothetical grid boxes.
The sophistication of climate models continues to increase
The models, and hence the variables described in the DREQ, distinguish between land ice formed on land from the consolidation of snow and sea ice formed at sea by the freezing of sea water. They have different properties, both at the microscopic scale (land ice generally contains trapped air bubbles) and at the macroscale (sea ice is typically up to a few metres thick and land ice is often hundreds of metres thick). A few of the details shown in the figure are represented for the first time or better represented in some CMIP6 models. These include the representation of sea water extending under floating ice shelves, more detailed representation of snow on ice (with different model representations of snow on sea ice versus snow on land ice), more detailed representation of snow and other frozen precipitation, and both the representation of melt pools on sea ice and potential ice covering of those melt pools.
In the atmosphere, snow is made up of ice crystals and it is standard usage to consider “snow” as part of the atmospheric ice content. On the land surface, however, a snow-covered surface is generally understood to be distinct from an ice-covered surface. Hence, at the surface we have parameters for heat fluxes from snow to ice and rates of conversion from snow to ice (i.e. a mass flux from snow to ice). This distinction may sound obvious, but this subtle shift in the relationship between snow and ice occurring when the snow lands on the ground or on surface ice can cause confusion in technical terms.
In the CMIP5 climate simulations the boundary between land and sea was clearly defined and fixed in time, but, in at least some models, the CMIP6 ensemble introduces more complexity. For the first time, some models have a realistic simulation of floating ice shelves. These deep layers of ice form on land but flow to cover large areas of ocean such as the Weddell Sea. The extent of the ice shelves can also, in a small number of experiments and models, vary in time. This introduces a range of possible interpretations for the boundary between land and sea: the leading edge of the ice shelf, the grounding line underneath the ice shelf or perhaps the line where mean-sea-level intersects the surface under the ice.
In the context of CMIP6, the Earth surface modelling is mainly motivated by a desire to represent energy and material cycles that affect the climate. For these purposes, it generally makes sense to ignore these distinctions between grounded ice sheets, floating ice shelves, and bare land masses. Hence, for the data request, most surface
Categories of DREQ variables. The second column shows the number of DREQ MIP variables which fall into each category. These six categories account for over 50 % of the variables in DREQ.
The complexities that we see in the cryosphere apply right across the domain simulated by CMIP6. Table Animals digesting plant matter
Alongside the multiplicity of variables is a multiplicity of potential applications, not all of which require the highest possible output frequency – which is fortunate, as it would be completely infeasible to archive all variables at high frequency. However, this leads to the requirement of identifying, and specifying, output frequency requirements. In some cases output frequency can be reduced by carrying out processing within the simulation so that only condensed diagnostics are needed, and, in others, snapshots are all that is required. In all cases, the output frequency is related to potential application objectives.
The DREQ is designed to support a wide range of users belonging to four broad categories: the MIP science teams, modelling centres (data providers), infrastructure providers and data users.
The MIPs contributing to CMIP6 provide input into the DREQ but also use it to coordinate their requirements with other MIPs and to obtain quantitative estimates of the data volumes associated with their planned work.
The modelling centres have two independent uses of the DREQ: first as a planning tool and second as a specification for the generation of data. When used as a planning tool, it allows for exploration of the consequences of various levels of commitment in terms of data volumes and numbers of variables. When a centre has begun generating data, the DREQ provides the specifications for each variable.
The main infrastructure providers, who depend on the DREQ, are the developers of the Climate Model Output Rewriter (CMOR) package,
Users are mainly expected to use portal search interfaces (e.g. the ESGF search interface) to locate existing CMIP6 data, but, especially in early stages, may also rely on the DREQ to determine what data may eventually be found there.
The timetable for generation of the DREQ did not allow for a formal specification of technical requirements. The following list sets out the high-level requirements that emerged from a range of informal discussions:
provide feedback to MIPs on feasibility of data requests, especially regarding estimated data volumes; provide precise definitions and fully specified technical metadata for each parameter requested; provide a programmable interface that supports automated processing of the DREQ; support synergies between MIPs, maximizing the reuse of specifications and of data.
Item (a) is extremely important because attempting to store all variables at high frequency for all experiments would be impractical, resulting in unmanageable data volumes. Data volume estimates provided through the DREQ can only be indicative because the actual volumes will be influenced by many choices taken by modelling groups during the implementation of the request, but these estimates have nevertheless provided a useful guide for resource planning. CMIP gains immense impact from the synergies of the many science teams working on overlapping science problems. The synergies (d) supported by the DREQ include providing standard definitions of diagnostics which can be used across multiple MIPs and making it possible for related MIPs to request output from each other's experiments.
Delivering on the above high level requirements led to four further technical requirements:
the utilization of a flexible structured database rather than simple lists, with an informative human interface, an application programming interface to provide support for automation and regular systematic checks to enforce consistency of technical information.
Many of these requirements were already recognized in CMIP5; the major advance in CMIP6 was the ability to tailor data needs to each individual experiment and its scientific goals, as well as the introduction of a programmable interface supporting the automated process of the DREQ.
Choices confronting data providers within the CMIP6 Data Request (DREQ).
The intent of the DREQ is to provide all the information needed for a modelling group to archive variables of interest for subsequent analysis. In doing so, it must support the CMIP ethos of both facilitating intercomparison of an inclusive range of models and addressing significant new areas of climate science. It must also facilitate contributions from both well established and new participants.
In order to achieve this, CMIP6, following practice of earlier CMIP phases, allows participating institutions to be selective about the range of experiments they conduct and the diagnostics that they generate. This is facilitated by experiments defining various levels of priority for the variables requested. Hence, although the DREQ specifies all the variables requested for each experiment and ensures coherence in the data archive, it also allows some flexibility.
Table
This approach ensures that CMIP has a large and representative model ensemble, but it also means that users who would like to have all models running the same collection of experiments and producing the same set of variables will not find the consistency that they want. The data provided by some models will be more limited than for others.
To ensure some consistency across the CMIP archive, the DREQ is structured to provide a menu of choices defining blocks of variables with differing priorities and scientific objectives.
The data request contains an extensive range of specifications which define climate data products, which will be held in the CMIP6 archive. CMIP6 – Coupled Model Intercomparison Project Phase 6: See WIP position papers
In order to manage these specifications,
which are aggregated across the many participating endorsed MIPs,
the specifications themselves are required to fit within an information model, which we call the Data Request Information Model (DRIM)
to distinguish it from the data model of the NetCDF files described by
The nature of the process of establishing the CMIP6 Data Request has required that the DRIM itself evolve as information is gathered. In order to manage this process, the DRIM is constrained to stay within a predefined framework.
The schematic structure of the DREQ, showing the three key sections: Framework, Configuration and Content.
F1: a set of core attributes are used to define additional attributes (see also Table
The DREQ is constructed through three key sections: Framework, Configuration and Content, which are shown schematically in Fig.
Taking these in reverse order, the
The The redundancy between “unid” and “units” has not yet been eliminated because in the absence of a fully developed suite of tools for managing linked content, such redundancy has some value. It allows for easy reading of content (via the This attribute is not fully implemented in the existing DREQ.
The
Example attributes of the MIP variable record for
The reference document for the data request content is an XML document
The request document aims to be self descriptive: each record is defined by its attributes and for each attribute there is a record defining its role and usage. The apparent circularity is resolved as shown in Table Sections and Attributes:
The DREQ is presented as a document of 33 sections, where each section has the following characteristics:
The section is described by eight attributes; Each section contains a list of records, each having a set of attributes; Each record attribute is defined by the properties listed in Table
Main elements of the DREQ schema. The rounded double-edged blue shapes represent the core request elements (Sect.
The core DREQ sections are shown in Fig.
Each MIP Variable element may be used by multiple CMOR Variable elements, which specialize the definition of a quantity by specifying its output frequency, coordinates (e.g. should it be on model levels in the atmosphere or pressure levels), masking (e.g. eliminating all data over oceans), and temporal and spatial processing (e.g. averaging or summing). For instance, the near-surface air temperature is a MIP variable,
Each MIP determines which CMOR variables are needed for their planned scientific work, and they are asked to assign to each variable a priority from 1 to 3, with 1 being the most important, to each variable. The Request Variable section specifies variable priority on an experiment-by-experiment basis, leading to over 6000 distinct Request Variable elements.
The 3-level hierarchy of MIP Variable, CMOR variable and Request Variable provides some flexibility to reuse concepts, improving consistency in the DREQ. The foundation is provided by standard names from the CF convention: 927 of these are used in the CMIP6 Data Request and for 728 of these there is a unique associated MIP variable.
The CF Standard Name may be reused multiple times: 145 standard names used twice and 25 used three times. The standard name reused most often (33 times) is
There is a similar story with the relationship between MIP variables and CMOR variables: 857 MIP variables are associated with a unique CMOR variable, 283 have two and 57 have three.
The MIP variable which is most heavily reused is “
The
When MIPs request data, they need to provide information about the experiments that the data is required from: we do not expect all defined variables to be provided from all experiments, as that would generate substantial volumes of unnecessary output.
The process of linking the 6423
The
The sections denoted by orange chamfered shapes in Fig.
The
The central role of the changes in atmospheric composition in the climate is shown by the fact that the most frequently used units of measure are
mass fluxes (kg m
The
The
The
The
The DREQ sections labelled
The
The
The DREQ can be thought of in terms of triads (or triples) linking variables, experiments and objectives. That is, whenever a variable is requested from an experiment, it is linked to one or more objectives. There are over 350 000 potential variable–experiment–objective triads in the CMIP6 Data Request, arising from various combinations of 2068 variables, 273 experiments and 93 objectives. These three-way links may be supplemented with additional information, such as specific sampling periods or a preferred spatial grid.
Less than 1 % of the possible combinations are used, but this is still too many to manage individually, so, rather than explicitly listing all these virtual triads, the data request organizes them in groups. This results in just 411 request links, with groups of variables needed to address one or more objectives linked to groups of experiments.
Figure
This three-way linkage is a significant additional complexity compared to the two-way linkage between variables and experiments in CMIP5. While there were different parts of the CMIP5 request originating from different groups, the option for models to be run in support of particular scientific objectives is new to CMIP6.
If one looks at just the variable–experiment links, on average around 25 % of all variables are requested for any one experiment.
Around 80 % of all variables are requested from the historical experiment. Among the variables
The DREQ defines a large collection of diagnostic quantities and specifies, for each diagnostic, the set of experiments from which
it should be provided and the objectives that it is intended to support. The objects in the centre of the diagram represent the
Much of the DREQ structure is formalized by use of the XSD mechanism; however, there is a significant amount of additional semantic structure within the DREQ that is not explicitly represented by the XSD semantics. Prominent examples include constraints on acceptable units, the use of guide values, conditional variable requirements and vertical domain requirements.
CF standard names have a
a vertical coordinate (e.g. a variable describing a property of an atmospheric layer) required by a standard name is present; a cell methods string is consistent with the CF conventions syntax rules; the spatial and temporal dimensions of a variable are consistent with the cell methods string (e.g. a time mean or maximum, specified in the cell methods string, requires a time dimension with a bounds attribute);
The CMIP5 request had four guide values for some diagnostics: minimum and maximum acceptable values and also minimum and maximum acceptable values of the global mean of the absolute value of the diagnostic. These ranges were not intended to provide any guide to physical realism, but rather to catch data processing errors such as sign errors that might arise from institutional sign conventions opposite to those of the DREQ or incorrect units (e.g. submitting data in degrees Celsius with metadata units describing the data as kelvin).
With a wider range of diagnostics, for CMIP6, guide values are not always appropriate and/or available (e.g. for novel diagnostics). The DREQ supports a three-level indication of the robustness of any specified guide values, to avoid inappropriate warnings. As an example, an analysis carried out by
The DREQ schema allows for the specification of conditionally requested variables, though this feature is not implemented for all relevant variables. For instance, there is a model configuration option
Different MIPs have different requirements for data on pressure levels such as a need for zonally averaged data on 39 levels or high-frequency data on 3 pressure levels. In total there are 10 different pressure axes defined as part of level
harmonization in the DREQ (Fig.
The pressure levels used for atmospheric variables in the DREQ. The right-hand column, titled “single”, contains pressure levels used for single-level variables. Other columns represent collections of levels used as a vertical axis for a range of requested parameters. Black rectangles indicate a level which occurs in only one column.
The
The DREQ content is provided as a version-controlled XML document complying with the schema, but a range of interfaces are provided in order to make the contents more accessible. The use of XML documents ensures robust portability and allows users to import the DREQ into their own software environments.
For users who do not wish to confront the details of the XML schema, alternative views are provided by the website DREQ Python API:
The website provides a complete view of the DREQ content in linked pages and also a range of summary tables as spreadsheets. These include, for instance, lists of variables requested by each MIP for each experiment.
The python package provides both a command line and a programming interface.
The python code is designed to be self descriptive. Every record, e.g. the specification of a variable, is represented by an instantiated class with an attribute for each property defined in the record.
For example, if
The DREQ was version-controlled with a three-element version number, such as
The CMIP6 Data Request, or DREQ, provides a consolidated specification of the data requirements of the 23 endorsed MIPs Table B1 has 25 rows because it also includes “DECK” and “CMIP”, which refer to activities that have a role analogous to MIPs in the DREQ: “DECK” specifies a collection of experiments and “CMIP” specifies a set of data requirements.
The data request has a complex structure which arises from the inherent complexity of the problem: not only are there many more MIPs and experiments than in previous CMIP exercises but also not all modelling centres expect to address all the objectives of individual experiments, let alone all MIPs. This means that the request infrastructure has to handle varying aggregations across the over 350 000 potential combinations of variables, experiments and objectives and deliver the appropriate metadata information, lists, and summaries for the groupings which arise. In practice, 411 groups are needed to serve the objectives which have been extracted from the experiment definitions.
The design of the data request delivers a separation of concerns between a request framework, a configuration which specifies the sections and attributes of the request, and the actual content. In each domain (framework, configuration, content) there are information components (schema, instances) and code to support the use of that information.
Resolving the original ambiguities and errors in the specifications of diagnostics has resulted in frequent updates to the DREQ documents that, although cleanly version controlled, caused significant delays and inconvenience for those attempting to begin simulations as the output configuration was changing. Most of these arose not from the data request machinery but upstream in the definitions of the MIPs, experiments and output requirements.
The formal schema developed for CMIP6 establishes a robust structure, but it has some clear limitations. There are a number of rules governing the content which are not captured by the schema and arise from a semantic mismatch between the notion of a variable and its implementation in the CF conventions for NetCDF.
For example, certain cell methods strings, such as
There are also issues around variable definitions, both in the data request, and in the conventions themselves. For example, variable names containing abbreviated references to parts of the variable definitions (e.g. “sw” for “shortwave”, “lw” for “longwave”) lead to both inconsistency and transcription errors.
Similarly, some CF Standard Names encode information about the nature of physical quantities and the relationships between them.
However, there are variations in the syntax (e.g. variables relating to nitrogen mass may contain either
There are a number of areas where technical improvements can be made to support future CMIP activities and, potentially, related work outside CMIP.
As discussed in Sect. 4.2.3 above, there are a number of areas where the DREQ intersects with ES-DOC and CVs. There is room for closer semantic alignment, as well as some streamlining of information flow between the MIP teams and those developing the technical documents and infrastructure. Significant overlaps with ES-DOC occur in the definitions of experiments, potential model configurations, conditional variables and objectives. Some further rationalization of the interfaces between ES-DOC, the data request, and the controlled vocabularies prior to new experiment and MIP design will aid all parties.
More use of reusable and extensible lists is also anticipated. One obvious way forward would be to aim for future MIPs to be able to exploit existing and reusable variable lists, either as is or with managed extensions.
The data request is complex and establishing and upgrading the content of different components requires different communication approaches. This can be seen by comparing just two of the many components:
The The definition of parameters in the
Upgrades to these two components are in some senses orthogonal, impacting on different groups. Further partitioning of the data request to facilitate more transparent management of request upgrades would be desirable.
Such partitioning may also address complexity in the data request itself, ideally allowing more agility in its specification and use.
In June 2018 a first meeting of a data request support group (DRSG) was convened with the intention of broadening the engagement in the data request design activities. This meeting established some objectives for future work
Following this and subsequent discussions we recommend the following:
There needs to be clear guidance from the CMIP panel as to the central importance to the modelling groups of early and robust resource planning. MIPs should, early in the endorsement process, provide clear information about the expected number of simulation years needed for computation and the storage volume requirements. The infrastructure teams would then be able to monitor technical compliance with these resource envelopes as the experiment documentation and request specifications are compiled. The difficulties of resource estimation are compounded by the fact that, at the start of the process, the modelling groups are generally not able to predict the spatial resolution of the models they will be using when the computations finally get under way. Endorsed MIPs should be required, as part of endorsement, to identify a technical expert responsible for liaising with and supporting the data request. Clear documentation should be in place for these technical experts so that expectations are clear as to what is required. Clear and consistent version information should be provided in the web interface.
These steps would significantly reduce bottlenecks in the preparation for future CMIP exercises and minimize the burden on both the scientific leaders of the MIPs and the modelling groups.
The entire CMIP process is predicated on producing data for analysis, informing both science and policy. The central importance of a data request to those goals is obvious, but the underlying obstacles to the construction of a well defined request are often unclear. We cannot take it for granted that the goals of participating science teams will be met without detailed attention to output requirements, particularly when, as in CMIP, so much of the value arises from the interactions between MIPs.
This detailed attention is only going to become more important in the future as the diversity of the Earth system modelling community grows and pressure for efficient use of the computing resources needed to carry out advanced simulations and store output become greater. Getting output descriptions right will be crucial to delivering and evaluating scientific benefits, and to developing the necessary infrastructure.
The growing dependency on CMIP products by a broad sector of the research community and by national and international climate assessments, services, and policy-making means that CMIP activities require substantial efforts in order to provide timely and quality-controlled model output and analysis.
Although CMIP has been extraordinarily successful and leverages a large investment from individual countries, there are aspects that are fragile or unsustainable due to a lack of sustained funding. The impressive CMIP impact is highly dependent on volunteer efforts of the research community and individual scientists who contribute to the underlying essential infrastructure.
CMIP has now reached a stage where certain components and activities require sustained institutional support if the programme as a whole is to meet the growing expectation to support climate services, policy and decision-making. Of particular urgency is the systematic development of forcing scenarios that require institutionalized support so that quality-controlled datasets and regular updates can be provided in a timely fashion. In addition, a more operational infrastructure needs to be put in place so that core simulations that support national and international assessments can be regularly delivered. This includes the oversight; development; and maintenance of the data requests, standards, documentation, and software capabilities that make this collaborative international enterprise possible.
A specific resolution seeking the support of the World Meteorological Organization (WMO) to CMIP was presented and approved at the 18th World Meteorological Congress, held from 3 to 14 June 2019. The resolution drew WMO members' attention to the importance of CMIP and its critical role in supporting the global climate agenda. Members were requested to contribute institutional, technical and financial resources as necessary to ensure the delivery of sustainable and robust CMIP and CORDEX (Coordinated Regional Climate Downscaling Experiment) climate change projections to the IPCC.
The CMIP6 Data Request (DREQ) relies heavily on the Climate and Forecast (CF) metadata convention.
A number of modifications were required either to deal with new metadata structures or to clarify the interpretation of metadata constructs employed in the past.
These were all discussed in the CF discussion forum maintained by the Lawrence Livermore National Laboratory.
Temporal averaging over a region specified by a time-varying mask offers some particular challenges. A long discussion
(“Time mean over area fractions which vary with time [no. 152]”)
established a clear protocol for expressing the concept using the
Under the CF convention, variables can refer to geographical regions either by using the name of a region from the approved list or by using an integer flag. Some wording in the conventions document was ambiguous about the validity of the latter approach: this has now been clarified to allow for the use of flags
(“Clarification of use of standard region names in
Many standard names state that additional information should be supplied in additional CF variable attributes or impose requirements on the dimensions. Such rules are not currently checked by the CF checker, making their status in the convention ambiguous. The discussion “Requirements related to specific standard names [no. 153]” is still open, but it has led to a proposal for a specific set of rules to be have applied to the data request in order to ensure reasonable completeness of metadata.
A “More than one name in Conventions attribute [no. 76]”, which was proposed long ago, has been concluded. This allows the CF convention to be used in parallel with other compatible conventions. This is required for use with the UGRID convention in CMIP6.
A long discussion on “Subconvention for associated files, proposed for use in CMIP6 [no. 145]” concluded by defining a sub-convention which allows variables in other files to be referenced from the
There is an open discussion on “Extension to
A total of 552 new standard names were proposed for CMIP6, of which 349 were accepted. Names were rejected when existing terms, possibly in combination with area types and other metadata, can be used to meet the requirements. The new names make up 36 % of the standard names used in the DREQ.
The terms span a broad range of scientific domains, with new properties of aerosols, radiation, the cryosphere (including ice shelves and dynamic floating ice sheets, sea ice, and a more detailed representation of snow packs), vegetation, atmospheric dynamics and other aspects of the climate system.
Labels used for collections of experiments in the DREQ and the number of experiments and variables in each collection.
Listing of the properties used to define attributes in the DREQ. Each of the 20 000 records in DREQ is defined by a selection of 288 attributes, and each of these attributes is, in turn, defined through the following properties.
DREQ sections: the DREQ database is split into the following sections, each taking the form of a database table with the number of records specified in column 2. The numbering in the “Title” column represents a provisional partitioning of records into sections.
Continued.
Table showing the data volumes requested, broken down in terms of the requesting MIPs (rows) and the experiments they request data from, grouped according to the MIPs defining them. Units are terabytes (T), gigabytes (G) and megabytes (M). Data volumes are estimated for a nominal model with 1
The current version of the DREQ is available from the project website at
MJ led the development of the CMIP6 Data Request. KT has developed many of the underlying principles in the process of supporting CMIP5 and earlier phases of CMIP and contributed substantially to the harmonization and quality control of the CMIP6 Data Request. BL has contributed on the interface with ES-DOC and on the context of metadata for Earth system models. MM and SS have provided input from the perspective of the operational climate modelling centres, and they contributed significantly to the development of the request by being early adopters. AP is responsible for the procedures around the CF convention; JP and PD contributed as data coordinators for two large sections of the request, for PaleoMIP and OMIP, respectively. MR has helped to establish and maintain the governance framework which facilitated the development of the request.
The authors declare that they have no conflict of interest.
The CMIP6 Data Request is a collaborative effort which relied on substantial effort from the MIP teams listed in Table B1. Updates and extensions to the CF conventions and the CF standard names lists required community consensus which emerged with the help of many regular contributors to the CF discussions, especially Jonathan Gregory. The construction of the request relied on patient input and engagement from the science teams behind the endorsed MIPs, listed in Table B1.
This research has been supported by the EU FP7 Research Infrastructures (IS-ENES2, grant no. 312979), H2020 Excellent Science Research Infrastructures (IS-ENES3, grant no. 824084), H2020 Societal Challenges (CRESCENDO, grant no. 641816), the US Department of Energy (DOE) National Nuclear Security Administration (Lawrence Livermore National Laboratory, contract DE-AC52-07NA27344CE15), the US DOE Office of Science (Regional and Global Model Analysis Program, PCMDI Science Focus Area), the UKRI Newton Fund (UK Met Office Climate Science for Service Partnership Brazil), the UK Natural Environment Research Council (National Capability funding to the National Centre for Atmospheric Sciences), and the World Climate Research Programme (Joint Climate Research Fund for WGCM, CMIP and WIP activities).
This paper was edited by Sophie Valcke and reviewed by Sheri A. Mickelson and Charlotte Pascoe.