Numerical simulation, and in particular simulation of the earth system, relies on contributions from diverse communities, from those who develop models to those involved in devising, executing, and analysing numerical experiments. Often these people work in different institutions and may be working with significant separation in time (particularly analysts, who may be working on data produced years earlier), and they typically communicate via published information (whether journal papers, technical notes, or websites). The complexity of the models, experiments, and methodologies, along with the diversity (and sometimes inexact nature) of information sources, can easily lead to misinterpretation of what was actually intended or done. In this paper we introduce a taxonomy of terms for more clearly defining numerical experiments, put it in the context of previous work on experimental ontologies, and describe how we have used it to document the experiments of the sixth phase for the Coupled Model Intercomparison Project (CMIP6). We describe how, through iteration with a range of CMIP6 stakeholders, we rationalized multiple sources of information and improved the clarity of experimental definitions. We demonstrate how this process has added value to CMIP6 itself by (a) helping those devising experiments to be clear about their goals and their implementation, (b) making it easier for those executing experiments to know what is intended, (c) exposing interrelationships between experiments, and (d) making it clearer for third parties (data users) to understand the CMIP6 experiments. We conclude with some lessons learnt and how these may be applied to future CMIP phases as well as other modelling campaigns.
Climate modelling involves the use of models to carry out simulations of the real world, usually as part of an experiment aimed at understanding processes, testing hypotheses, or projecting some future climate system behaviour. Executing such simulations requires an explicit understanding of experiment definitions including knowledge of how the model must be configured to correctly execute the experiment. This is often not trivial, especially when those executing the simulation were not party to the discussions defining the experiment. Analysing simulation data also requires at least minimal knowledge of both the models used and the experimental protocol to avoid drawing inappropriate conclusions. This again can be non-trivial, especially when the analysts are not close to those who designed and/or ran the experiments.
Traditionally numerical experiment protocols have appeared in the published literature, often alongside analysis. This approach has worked for years, since mostly the same individuals designed the experiment, ran the simulations, and carried out the analysis.
However, as model inter-comparison has become more germane to the science, there has been growing separation between designers, executers, and analysts.
This separation has become acute with the advent of sixth Coupled Model Intercomparison Project
Simulation workflow in which experimental requirements (termed “Numerical requirements”) play a central role.
This increasing separation within the workflow, and between individuals and communities, leads to an increased necessity for information transfer, both between people and across time (often analysts are working years after those who designed the experiments have moved on). In this paper we introduce the “design” component of the Earth System Documentation (ES-DOC) project ontology, intended to aid in this information transfer by supporting both those designing experiments (especially those with inter-experiment dependencies) and those who try to execute and/or understand what has been executed. This ontology provides a structure and vocabulary for building experiment descriptions which can be easily viewed, shared, and understood. It is not intended to supplant journal articles, rather to provide recipes which can be reused (by those running models) and understood by analysts as an introduction to the experiment designs. We explain how it was deployed in support of CMIP6, how it has added value to the CMIP6 process, and how we expect it to be used in the future based on lessons learnt thus far.
We begin by describing key elements of simulation workflows and introduce a formal vocabulary for describing the experiments and the simulations. We provide some examples of ES-DOC-compliant experiment descriptions and then present some of the experiment linkages which can be understood from the use of our canonical experiment descriptions. Our experiences in gathering information and the linkages (and some of the missing links) required to define and document CMIP6 experiments expose opportunities for improving future MIP designs, which we present in the “Summary and further work” section.
In this section we introduce the key concepts involved in designing experiments and describing simulation workflows. We describe how this has evolved from previous work and differs from other work with which we are familiar.
The process of defining numerical experiments is potentially complex (Fig.
Once the experiments are defined (Fig.
In both generic experiment documentation and in defining data requests, it is helpful to utilize controlled vocabularies so that unambiguous machine navigable links can exist between the design documentation, simulation execution, data production, and the analysis outcomes.
The requisite controlled vocabulary for a numerical simulation workflow requires addressing the actions and artefacts of the workflow summarized in Fig.
As noted above, a project has certain scientific objectives that lead it to define one or more
The experiment description itself includes attributes covering the scientific objective and the experiment rationale addressing the following questions: what is this experiment for and why is it being done?
The
Each requirement carries a number of optional attributes and may contain mandatory attributes, as shown in Tables
ES-DOC controlled structure for describing a forcing constraint: each attribute has a name, a Python data type (those in italics are other ES-DOC types), a cardinality (0.1 means either zero or one, 1.1 means one is required) and a description.
ES-DOC forcing types controlled vocabulary; provides context for a forcing constraint.
The ES-DOC vocabulary is an evolution of the “Metafor” system
The ES-DOC controlled vocabulary is an instance of an ontology (“a formal specification of a shared conceptualization”,
The description of ontologies is often presented in the context of establishing provenance for specific workflows and often only retrospectively.
Work supporting scientific workflows has mainly been concerned with execution and analysis phases, with little attention paid to the composition phase of workflows
For the “conception phase” of workflow design, a controlled vocabulary introduced by
The notion of “an experiment” also needs attention, since the experiments described here are even more abstract than the notion of “a workflow” and cover a wider scope than that often attributed to an experiment.
Dictionary definitions of “scientific experiment” generally emphasize the relationship between hypothesis and experiment (e.g. “An experiment is a procedure carried out to support, refute, or validate a hypothesis.
Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated.”,
The first formal attempt to define a generic ontology of experiments (as opposed to workflows), appears to be that of
With the advent of simulation, another type of experiment (beyond those defined earlier) is possible: the simulation (and analysis) of events which cannot be measured empirically, such as predictions of the state of a system influenced by factors which cannot be replicated (or which may be hypothetical, such as the climate on a planet with no continents). For climate science, the most important of these is of course the future; experiments can be used to predict possible futures (scenarios).
In this form of experiment, ES-DOC implicitly defines two classes of “controllable factor”: those controlled by the experiment design (and defined in
The rationale and need for CMIP6 were introduced in
Global model inter-comparison projects have a long history, with pioneering efforts beginning in the late 1980s (e.g.
With each phase, more complexity has been introduced.
CMIP1 had four relatively simple goals: to investigate differences in the models’ response to increasing atmospheric
In CMIP5 and again in CMIP6, there was a substantial increase in the number and scope of experiments. This has led to a new organizational framework in CMIP6 involving the distributed management of a collection of quasi-independently constructed model inter-comparison projects, which were required to meet requirements and expectations set by the overall coordinating body (the CMIP Panel) before they were “endorsed” as part of CMIP6. These MIPs were designed in the context of both increasing scope and wider-spread interest and the growth of two important constituencies: (1) those designing “diagnostic MIPs”, which do not require new experiments, but rather request specific output from existing planned experiments to address specific interests, and (2) the even wider group of downstream users who use the CMIP data opportunistically, having little or no direct contact with either the MIP designers or the modelling groups who ran the experiments.
With the increasing complexity, size, and scope of CMIP came a requirement to improve the documentation of the activity, from experiment specification to data output.
CMIP5 addressed this in three ways: by documenting the experiment design in a detailed specification paper
The overview of the experiment design process given in Fig.
The semantic structure of the data request was developed in parallel to the development of the CMIP6 version of ES-DOC; each had to deal with a distinctive range of complex expectations and requirements.
Hence ES-DOC has not yet fully defined or populated the
The initial ES-DOC documentation was generated from a range of sources and then iterated with (potentially) all parties involved, which provided both challenges and opportunities. An example of the challenge was keeping track of material through changing nomenclature. Experiment names were changed, experiments were discarded, and new experiments were added. In one case an experiment ensemble was formed from a set of hitherto separate experiments. Conversely, a key opportunity was the ability to influence MIP design to add focus and clarity, including influencing those very names. For example, the names of experiments which applied sea surface temperature (SST) anomalies for positive and negative phases of ocean oscillation states were changed from “plus” to “pos” and “minus” to “neg” to better reflect the nature of the forcing and the relationship between experiment objectives and names.
The ES-DOC documentation process also raised a number of discrepancies and duplications, which were sorted out by conversations mediated by PCMDI. Many of the latter arose from independent development within MIPs of what eventually became shared experiments between those MIPs. For example, not all shared experiment opportunities were identified as such by the MIP teams, and it was the iterative process and the consolidated ES-DOC information which exposed the potential for shared experimental design (and significant savings in computational resources).
A specific example of such a saving occurred with ScenarioMIP and CDRMIP, which both included climate change overshoot scenario experiments that examine the influence of
Discrepancies also arose from the parallel nature of the workflow.
For example, specifications could vary between what was published in a CMIP6-endorsed MIP's
This process had other outcomes too: LUMIP originally had a set of experiments that were envisaged to address the impact of particular behaviours such as “grass crops with human fire management”.
Some of these morphed to become entirely the opposite of their original incarnation, such as “land-noFIRE”, where the experiment requires no human fire management (see Table
The experiments within the DECK, as described in ES-DOC. The content of this table, like all the ES-DOC tables in this paper, was generated directly from the online documentation using a Python script (details in the Appendix). The choice of content to display was made in the Python code; other choices could be made (e.g. see
The modelling CMIP6 experiments as introduced in
Somewhat naively, the initial concepts for
Increasing precision is evident throughout CMIP6 and in the documentation.
In some cases, rather than ask how it is done in a model post fact, the experiment definition describes what is expected, as in the GeoMIP experiment G7SST1-cirrus (Table
CMIP6 is more than just an assemblage of unrelated MIPs. One of the beneficial outcomes of the formal documentation of CMIP6 within ES-DOC has been a clearer understanding of the dependencies of MIPs on each other and of experiments on shared forcing constraints. In this section we provide an ES-DOC-generated overview of CMIP6 and discuss elements of commonality and how these interact with the burden on modellers of documenting how their simulation conformed (or did not) to the experiment requirements.
CMIP6 MIPs and experiments. Individual MIPs are represented by large purple dots. Lines connect each MIP to the experiments that are related to it, which are shown as smaller blue dots. Some widely used experiments are labelled, such as the piControl, historical, amip, ssp245, and ssp585, which are used by numerous MIPs within CMIP6.
At the heart of the current CMIP process is a central suite of experiments known as the DECK (Diagnosis, Evaluation, and Characterization of Klima;
Table
Figure
DAMIP experiments and forcing constraints. Individual experiments are represented by large blue dots. Lines connect each experiment to related forcing constraints, represented by pink dots. An example of a forcing constraint might be a constraint on atmospheric composition such as a requirement for a particular concentration of atmospheric carbon dioxide. In this figure three experiments are shown with dark blue borders (piControl, historical, and ssp245); these are experiments that are required by DAMIP but are not defined by DAMIP. The forcing constraints for these three “external” experiments are used extensively by the DAMIP experiment suite.
The most-used CMIP6 experiments in terms of the number of model inter-comparison projects (MIPs) to which they contribute.
There are other shared experiments too, which bring MIPs together around shared scientific goals: land-hist jointly defined and shared by LUMIP and L3SMIP; past1000 defined by PMIP forms part of VolMIP; piClim-control defined by RFMIP forms part of AerChemMIP; and dcppC-forecast-addPinatubo defined by DCPP forms part of VolMIP. By contrast, OMIP stands alone, sharing no experiments with other MIPs.
Experiments share forcing constraints, just as MIPs share experiments.
Figure
Unique modifications appear in Fig.
The importance of the perturbation experiment pattern is further emphasized in DAMIP by noting that the three external experiments (piControl, historical and ssp245) account for 62 % of the DAMIP forcing constraints; five of the DAMIP experiments can be completely described by forcing constraints associated with these external experiments – being different assemblies of the same “forcing building blocks”.
The key role of these building blocks is exposed by placing the DAMIP experiments into sets according to which of those external experiments is used for forcing constraints (Fig.
A view of DAMIP with experiments placed in sets according to the forcing constraints they share with the external experiments: piControl, historical, and ssp245.
This framing of shared forcing constraints exposes some apparent anomalies.
Why, for example, is hist-CO2 not in the historical set? The reasons for these apparent anomalies expose the framing of the experiments.
In the historical experiment, greenhouse gas forcing is a single constraint, which includes
It would have been possible to avoid this sort of anomaly by constructing finer constraints in the case of historical, but this would have been at the cost of simplicity of understanding (and greater multiplicity in reporting as discussed below). There is a necessary balance between clear guidance on experiment requirements and reuse of such constraints to expose relationships between experiments.
One of the goals of the constraint formalism is to minimize the burden on modelling groups.
Minimizing the burden of executing the CMIP6 experiments and the burden of documenting how the experiments were carried out (that is, populating the concrete part of the experiment definition, using the language of
Constraint “conformance” documentation is intended to provide clear targets for interpreting the differences between simulations carried out with different models.
Given that differing constraints often define differing experiments, understanding why models give different results can be aided by understanding differences in constraint implementation (in those cases where there is implementation flexibility).
Section
One can then ask, how much reuse of constraints is possible? Figure
Distribution of forcing constraint reuse across CMIP6. Forcing constraints are categorized in terms of how widely they are used. Widely used forcing constraints are used by experiments in four or more MIPs.
History suggests there has been – and continues to be – divergent understanding of instructions for the expected duration of simulations (temporal constraints), often manifest by delivering “off by one” differences in the number of years of simulation. Such errors hamper statistical inter-comparison between simulations and can result in unnecessary effort (often expensive in human and computer time). The CMIP6 experiments have not been immune from this issue. Temporal constraints in the CMIP6 controlled vocabulary are defined in terms of a start year and a minimum length of simulation expressed in years. However, the publications by the CMIP6-endorsed MIPs often also include an end year which can be inconsistent with the minimum simulation length as described by the CMIP6-CV. The divergence in understanding generally occurs in the interpretation of the dates implied by a given start year and end year, specifically whether they refer to the beginning of January or the end of December.
A significant effort has been made by ES-DOC to identify these discrepancies and instigate their correction. ES-DOC temporal constraints unambiguously specify a start date, end date, and length for simulations and are a mandatory part of the ES-DOC experiment documentation. Despite these steps, there are still many cases where the MIPs of CMIP6 might have coordinated yet further and used the same temporal constraints for different experiments with essentially the same temporal requirements, such as those that begin in the present day and run to the end of the 21st century. These differences provide scope for further rationalization in future experiments and/or CMIP phases, leading to further simplification in analysis and savings in computer time.
The need for structured documentation constrained by controlled descriptive terminology is not always well understood by all parties involved in creating content.
While structured scientific metadata has an important role in science communication, it exacts a cost in time, energy, and attention.
This cost causes friction in the scientific process even though it can provide the information necessary for investigators to reach a common understanding across barriers arising from distance in space, time, institutional location, or disciplinary background.
The balance between this “metadata friction” and the potential benefit in ameliorating the “scientific friction” barriers is difficult to achieve
In this paper we have introduced the ES-DOC structures for experimental design and shown their application in CMIP6. We have introduced a formal taxonomy for experimental definition based around collections of climate modelling projects (MIPs), experiments, and numerical requirements and, in particular, constraints of one form or another. These provide structure for the formal definition of the experiment goals, design, and method. The conformance, model, and simulation definitions (to be fully defined elsewhere) will provide the concrete expression of how the experiments were executed.
The construction of ES-DOC descriptions of CMIP6 experiments has been carried out mostly by the ES-DOC team, using published material, but often as part of the iterative discussions which specified the CMIP6 MIPs. These iterative discussions, led by the MIP teams, with coordination provided at various stages by the CMIP panel and PCMDI, have improved on previous MIP exercises, albeit with a larger increase in process and still with opportunities for imprecision, duplication of design effort, and unnecessary requirements for participants. The ES-DOC experiment definitions provided another route to internal review of the design and aided in identifying and removing some of the imprecision, duplication of effort, and simulation requirements. However, there is still scope for improving the design phase.
Earlier involvement of formal documentation, would have facilitated more interaction between the MIP design teams by requiring more information to be shared earlier. Doing so in the future might allow more common design patterns, and perhaps more experiment and simulation reuse between MIPs, reducing the burden on carrying out the simulations and on storing the results. This potential gain would need to be evaluated and tensioned against the potential process burden, but it can be seen that the ES-DOC experiment, requirement, or constraint definitions are relatively lightweight yet communicate significant precision of objective and method. Early involvement of formal documentation is important for building a culture of engagement. Our experience with the CMIP6 MIPs indicates that the process of providing detailed information about experiments was perceived in a positive way by groups when the intervention occurred early in the experiment life cycle. These groups also had a sense of ownership of their content. In contrast, groups who engaged later in the experiment life cycle were more likely to perceive the documentation effort as yet another burden.
Sharing of experiments and constraints is clearly common within CMIP6, but there remain opportunities for improvement in this regard.
Section Start year either 1850 or 1700 depending on standard practice for particular model.
This experiment is shared with the LS3MIP; note that LS3MIP expects the start year to be 1850.
The sharing and visualization of constraint dependencies (Sect.
ES-DOC remains work in progress. It is fair to say that there was no wide community acceptance of the burden of documentation for CMIP5, but this was in part because of the tooling available then. With the advent of CMIP6, the tooling is much enhanced and available much earlier in the cycle, but both the underlying semantic structure and tooling can and will be improved. There is clearly opportunity of convergence between the data request and ES-DOC, and there will undoubtedly be much community feedback to take on board!
ES-DOC is not intended to apply only to CMIP exercises. We believe the preciseness and self-consistency ES-DOC imposes on experiment design documentation should be of use even when only one or a few models generate related simulations. One such target will be the sharing of national resources to deliver extraordinarily large and expensive simulations (in time, resource, and energy) where individuals and small communities could not justify the expense without sharing goals and outputs. Realizing such sharing opportunities is often impaired by insufficient communication and documentation. We believe the ES-DOC methodology can go some way towards capitalizing on these opportunities and will become essential as we contemplate using significant portions of future exascale machines.
All the underlying ES-DOC code is publicly available at
To improve readability, a number of examples are provided in this Appendix, rather than where first referenced in the main text.
All these tables are produced by a Python script.
The ES-DOC pyesdoc
The abrupt 4X
This is an experiment that has an anti-forcing “Historical land surface forcings except fire management” (note also two temporal constraint options “Start year either 1850 or 1700 depending on standard practice for particular model.”). See
The “Increase Cirrus Sedimentation Velocity” forcing constraint is very precise about the change to be made to “Add a local variable that replaces (in all locations where temperature is colder than 235K) the ice mass mixing ratio in the calculation of the sedimentation velocity with a value that is eight times the original ice mass mixing ratio”. See
GeoMIP is clear about what the forcing should achieve (reduction in radiative forcing from rcp8.5 to rcp4.5) but leave it open to the modelling groups to choose a method that best suits their aerosol scheme.
See
The supplement related to this article is available online at:
CP represented ES-DOC in discussions with CMIP6 experiment designers, collecting information and influencing design. MJ was responsible for the data request. KET led the PCMDI involvement in experiment coordination. EG and BNL led various aspects of ES-DOC at different times. BNL and CP wrote the bulk of this paper, with contributions from the other authors.
The authors declare that they have no conflict of interest
Clearly the CMIP6 design really depends on the many scientists involved in designing and specifying the experiments under the purview of the CMIP6 panel. The use of ES-DOC to describe experiments depends heavily on the tool chain, much of which was designed and implemented by Mark Morgan under the direction of Sébastien Denvil (CNRS/IPSL). Paul Durack was instrumental in the support for CMIP6 vocabularies at PCMDI. Most of the ES-DOC work described here has been funded by national capability contributions to NCAS from the UK Natural Environment Research Council (NERC) and by the European Commission under FW7 grant agreement 312979 for IS-ENES2. The writing of this paper was part funded by the European Commission via H2020 grant agreement no. 824084 for IS-ENES3. Work by Karl E. Taylor was performed under the auspices of the US Department of Energy (USDOE) by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344 with support from the Regional and Global Modeling Analysis Program of the USDOE's Office of Science.
This research has been supported by the UK Natural Environment Research Council (NERC) (grant National capability contributions to the National Centre for Atmospheric Science (NCAS)), the Seventh Framework Programme (grant no. IS-ENES2 (312979)), the U.S. Department of Energy, Office of Science (grant no. DE-AC52-07NA27344), and the European Commission Horizon 2020 (EC funded project: IS-ENES3 (824084)).
This paper was edited by Juan Antonio Añel and reviewed by Ron Stouffer and one anonymous referee.