the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CP-DSL: Supporting Configuration and Parametrization of Ocean Models with UVic (2.9) and MITgcm (67w)
Abstract. Ocean models are long-living software systems facing challenges with increasing complexity, architecture erosion, and managing legacy code. These challenges increase maintenance costs in development and use, which reduces the time and resources available for research. Software engineering addresses these challenges by separation of concerns and modularization. One particular approach is to separate concerns by tailor-made notations, i.e. Domain-Specific Languages (DSLs). Using DSLs, the model developer can focus on one concern at a time without the need to consider other concerns of a software system simultaneously. In ocean and climate models, DSL tooling, like PSyclone and Dusk/Dawn, is used for instance to separate scientific and technical code.
CP-DSL complements this approach with a focus on configuration and parametrization, which play an important role in ocean models, especially in parameter optimization and scenario-based simulations. CP-DSL is designed to be model agnostic and provides a unified interface to different ocean models. Furthermore, the DSL can be integrated into tools and processes used by domain experts. In this paper we report on the DSL design, implementation, and the evaluation with scientists and research software engineers. The implementation of CP-DSL is available as open source software and a replication package for configuration and parameterization of UVic and MITgcm is provided.
- Preprint
(676 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on gmd-2021-311', Andrew Porter, 10 Nov 2021
General points
The paper describes the development and evaluation of a Domain-Specific Language (DSL) for use in the configuration and parameterisation of ocean models. The new language ('CP-DSL') is evaluated in the context of the UVic and MITgcm ocean models but is intended to be generally applicable to others. Configuration and parameterisation of ocean models can be complex due to the number of scientific components involved and the historical evolution of the code base. This complexity presents a considerable barrier to adoption of a model and can also be the source of errors meaning that a scientist may not be running a simulation with precisely the options that they intend.
CP-DSL abstracts out the various mechanisms that may be used to configure a model (e.g. CPP directives, include files, Fortran namelists) and simply presents a user with named options and settings which may be grouped appropriately. As such, I think it is an approach that has value although I feel that the paper itself could do with making this case more strongly. Related to this, although there is a discussion of the evaluation of the features and use of CP-DSL, there is no mention of what the model developers themselves think - are they keen to adopt CP-DSL or do they have reservations?
It seems that configuration/control of diagnostic outputs is at an early stage in CP-DSL but this is a critical and complex part of production jobs. I can see value in a common way of specifying diagnostics as this extends beyond ocean modelling: there is a relatively small number of IO systems and these tend to be common between e.g. atmosphere and ocean models.
Specific points
Although the discussion of the various roles played in the development of ocean models in Section 3.1 is interesting, I don't think it adds any value to the paper (which is primarily about the new DSL) and could be removed.
Figure 2 and its accompanying text mention that there are options that are common between ocean models. Given the subtleties that can occur when different scientists implement the same numerical scheme I think that determining such options could be problematic. I think some examples of such options would be helpful here. Is there a need for an agreed set of named quantities?
The UK Met Office uses 'Rose' for job configuration (see https://metomi.github.io/rose/doc/html/tutorial/rose/index.html#rose-tutorial) which has some similarities with the approach described in this paper. It is likely that other meteorological centres have similar configuration systems.
Â
Technical corrections
Â
(References are to page_number:line_number)
2:38 No comma after "as well as" and throughout the text.
3:87 PSyclone is developed by the UK Science and Technology Facilities Council's Hartree Centre in collaboration with the UK Met Office and the Australian Bureau of Meteorology. PSyclone has two 'modes' of operation: as an internal DSL (as used for the UK Met Office's LFRic atmosphere model) and as a code transformation tool (as used with the full NEMO ocean model).
4:89 "model developers what," should be "model developers which,"
4:105 "diagnoses" should be "diagnostics"
5:143 "responsible to control" should be "responsible for controlling"
7:186-189 "densities" should be "resolutions", "wide mesh" => "coarse mesh", "denser mesh" => "finer mesh",
"denser and wider meshes" => "fine and coarse meshes", "as it is the case" => "as is the case"
8:205 Not all models set grid sizes at compile time. This is a run-time option in NEMO for example.
8:214 Deployment of models is, I believe, something that the CYLC workflow engine (https://cylc.github.io/) does. Please compare.
9:231 No comma after e.g.
9:235 "built tools" => "build tools"
9:236 "whereby some are not available" => "which are not always available"
9:239 SVN is used as well as git (e.g. by NEMO)
9:252 "allows to draw" => "allows users to draw"
10:258 "derived form" => "derived from"
11:281 "look like to" => "look to"
11:294 "allows to have" => "allows"
14:325 "for a ocean" => "for an ocean"
15:330 "allows to import other" => "allows the import of other"
18:422 "Xtend" => "XTend"
20:473 "utilized in a joint session" - a joint session of what or whom?
20:474 What does "UVic is the reference simulation by GEOMAR of the University of Victoria model." mean?
A better reference for GOcean may be found at http://nora.nerc.ac.uk/id/eprint/521162/
Citation: https://doi.org/10.5194/gmd-2021-311-RC1 -
AC1: 'Reply on RC1', Wilhelm Hasselbring, 14 Jan 2022
Thanks for your valuable feedback to our submitt‹ed paper!
We respond to the comments that require changes.
Your comment:
- "As such, I think it is an approach that has value although I feel that the paper itself could do with making this case more strongly."
Reply:
- We will add comments from domain experts to highlight the advantages of the approach.
See also the folloging reply.
Your comment:
- "Related to this, although there is a discussion of the evaluation of the features and use of CP-DSL, there is no mention of what the model developers themselves think – are they keen to adopt CP-DSL or do they have reservations?"
Reply:
- We presented the CP-DSL to research so‰ware engineers and scientific modelers at GEOMAR, and they are (1) very much interested in a DSL in their domain of ocean system modeling and (2) are interested in using the DSL. However, their responses only apply to the currently adapted case studies with UVic and MITgcm (as our paper title indicates).
Your comment:
- "It seems that configuration/control of diagnostic outputs is at an early stage in CP-DSL but this is a critical and complex part of production jobs. I can see value in a common way of specifying diagnostics as this extends beyond ocean modelling: there is a relatively small number of IO systems and these tend to be common between e.g. atmosphere and ocean models."
Reply:
- Th‘e diagnostics con€guration in the DSL is indeed in an early stage and primarily motivated by the diagnostics features present in MITgcm. Its inclusion in the current DSL helps to be‹tter engage in a follow-up discussion on how to specifiy this aspect. While it is possible to specify logging based on the parameter groups and module structure of the CP-DSL, we aim to provide speci€c structures for diagnostics and logging that allow
users to be concise when specifying diagnostics and instruct them safety through specifi€c DSL features.
We are aware of XIOS and other logging and diagnostics facilities for scienti€c models (cf. https://www.esiwace.eu/services/software-support/supXIOS). However, XIOS is seen by our interviewees as complicated and not applicable to all models. Our DSL could – with a suitable template for XIOS con€figuration generation -- support XIOS configuration files. This is future work as mentioned in the paper.
Your comment:
- "Although the discussion of the various roles played in the development of ocean models in Section 3.1 is interesting, I don't think it adds any value to the paper (which is primarily about the new DSL) and could be removed."
Reply:
- The roles are important to understand the processes and thus determine the requirements for the DSL, which in turn leads to the design of the DSL. They are also relevant to understand which DSL addresses which role, i.e., the Declaration and Template specifications targets the research software engineer, while the Configuration specification addresses the needs of the scientific modeler or model user. We will make this more explicit in the paper.
Your comment:
- "Figure 2 and its accompanying text mention that there are options that are common between ocean models. Given the subtleties that can occur when different scientists implement the same numerical scheme I think that determining such options could be problematic. I think some examples of such options would be helpful here. Is there a need for an agreed set of named quantities?"
Reply:
- The text for Figure 2 was, indeed, misleading, sorry. The Declaration and Template models are specific to one scientific model, while the Configuration model is specific to one scientific model setup. We have updated the text and illustration accordingly to clarify this.
Your comment:
- "The UK Met Office uses 'Rose' for job configuration (see https://metomi.github.io/rose/doc/html/tutorial/rose/index.html\#rose-tutorial) which has some similarities with the approach described in this paper. It is likely that other meteorological centres have similar configuration systems."
Reply:
- Thanks for the pointer. Rose' is simpler than CP-DSL, e.g. only name-value pairs and sections are used. Whereas CP-DSL uses higher-level concepts like groups of parameters and features with dependencies. However, we see 'Rose' as an important related work and now include it accordingly in Section 3.3.
Your comment:
- "PSyclone is developed by the UK Science and Technology Facilities Council's Hartree Centre in collaboration with the UK Met Office and the Australian Bureau of Meteorology. PSyclone has two 'modes' of operation: as an internal DSL (as used for the UK Met Office's LFRic atmosphere model) and as a code transformation tool (as used with the full NEMO ocean model)."
Reply:
- We agree that PSyclone provides both applications and mention its use of an internal language as well as the extensions to Fortran code and its ability to transform these via in-place code transformation. To further clarify this, we add the following text: "The use of code transformation makes it possible to both write code and use PSyclone for optimizations in existing code that uses appropriate code structures."
Your comment:
- "Not all models set grid sizes at compile time. This is a run-time option in NEMO for example."
Reply:
- Thanks for this additional information. It is correct that grid sizes can also be a dynamic namelist option for some models, still it can also be defined at compile time. Nevertheless, we agree with the point that there are also models that only allow to set grid sizes at runtime. Accordingly, we have replaced the example and pointed out that models may implement options differently. We added the following text: "However, not all parameters can be set at runtime as optional modules enable or disable complete parts of the earth system model, e.g. atmosphere model. Thus, they are set at compile time. For example, in some models mesh and grid sizes are set at compile time."
Your comment:
- "Deployment of models is, I believe, something that the CYLC workflow engine (https://cylc.github.io/) does. Please compare."
Reply:
- In our paper, we focus on configuration and parametrization. Model deployment is an important issue, but it will be addressed in our project (OceanDSL) with a separate deployment DSL (not covered by the present paper). Nevertheless, we appreciate the interesting reference and have included it in the section as follows: "These requirements are addressed by other DSLs and tools, like the CYCL workflow engine~\citep{oliver2018cylc}."
Your comment:
- "SVN is used as well as git (e.g. by NEMO)."
Reply:
- This is correct, but according to our interviewed experts SVN is outdated and mostly replaced with Git. However, we extended the text to Section 5 to emphasize this fact: "Some model projects, like NEMO~\citep{NEMO2015}, still use SVN."
Your comment:
- "What does "UVic is the reference simulation by GEOMAR of the University of Victoria model." mean?"
Reply:
- The GEOMAR Helmholtz Centre for Ocean Research Kiel uses a refined version of the UVic ESCM model developed by the University of Victoria. To make this point explicit, we moved the paragraph and added the following paragraph to Section 8: "The GEOMAR Helmholtz Centre for Ocean Research Kiel uses a refined version of the UVic ESCM model developed by the University of Victoria~\citep{keller2012new}."
Your comment:
- "A better reference for GOcean may be found at http://nora.nerc.ac.uk/id/eprint/521162/"
Reply:
- Thank you for the reference. We also consider this reference as appropriate and replaced the existing with the proposed one.
Â
Citation: https://doi.org/10.5194/gmd-2021-311-AC1 -
AC2: 'Reply on RC1', Wilhelm Hasselbring, 14 Jan 2022
Sorry, I skipped one comment.
Your comment:
- ""utilized in a joint session" - a joint session of what or whom?"
Reply:
- The DSL was presented to the experts using an application prepared by us in order to incorporate their feedback. We have edited the text to reflect this: "The revised CP-DSL was reviewed and utilized in a joint session with the domain experts and the DSL developers to demonstrate its usage."
Thanks also for all the comments on typos, that we fixed in the paper.
Citation: https://doi.org/10.5194/gmd-2021-311-AC2
-
AC1: 'Reply on RC1', Wilhelm Hasselbring, 14 Jan 2022
-
RC2: 'Comment on gmd-2021-311', Stephan Kramer, 23 Nov 2021
The manuscript "CP-DSL: Supporting Configuration and Parametrization of Ocean
Models with UVic (2.9) and MITgcm (67w)" describes the implementation of a Domain Specific Language approach for the configuration of ocean models in the form of a new language CP-DSL. The subject of this paper addresses an important topic regarding the often complex configuration process of ocean models with a number of strongly conflicting issues like user friendliness for users with a wide range of expertises, the  wide variety of competences involved in the construction of accurate and efficient ocean models which affect both the way the software is developed but also its optimal configuration, and the fact that different models have quite different configuration processes, where the expertise of an advanced user of a specific model does not easily translate to that of another model. Although I believe a number of interesting ideas and techniques to address these challenges are presented in this manuscript, a number of issues in the presentation make it quite hard to evaluate what has actually been achieved.The paper spends a good amount of time discussing the requirements of an ocean model configuration system (important), but there is considerable less attention to explaining what the actual objectives of this project are and how the chosen approach helps to achieve these. As the authors quite rightly state the added complexity and risks (in terms of new dependencies) need to provide benefits, but there is only a limited discussion of what these are, and in particular how the specific choices in this project deliver these. This, and the fact that the actual implementation is only described in a rather abstract way, with only a few restricted excerpts and no concrete examples, make it hard to judge to what extent such benefits are delivered by this project.
Specific issues throughout the manuscript:
- the most problematic section in my view is section 5 which provides a very abstract overview of the syntax of the proposed language but only through a few excerpts that do not give a very clear picture of the language as a whole. I also do not find the UML diagrams to be particularly enlightening (figures 3-6). I would really like to see a more complete overview of the features that have been implemented, and a lot more concrete examples of actually what goes into the "configuration model" vs. the "declaration model" so we can get a better idea how universal the language is and able to specify things in a model independent way.
- Â it's only in section 6 that we finally get told what CP-DSL is actually made of, but the description of the key components in section 6.1 is very terse. Please explain for instance what EMF is. As a side note, it also seems that some of these components bring in a dependency on a specific version of Java which seems to be in contrast with one the requirements (line 251-253) and isn't particularly well supported on some of the HPC systems that ocean models run on.
- as mentioned before the authors do not really evaluate or discuss what has actually been achieved in this work. The evaluation by other users in section 7 is described in a rather superficial way. The first evaluation in section 7.1 seems to be about a quite different version of the language, so the only information we get, about a second evaluation is "The division in general parameters and modules was considered useful. Also the reworked YAML syntax was rated easy to understand." This is actually the very first place (right at the end of the manuscript) where it is actually mentioned that the CP-DL syntax is closely (?) related to YAML - the other places YAML is only mentioned to contrast with XML and JSON. From the conclusions:
"As this is an ongoing research project, we aim to further extend and improve CP-DSL in close contact with users and active
scientists from the domain. We initially developed the DSL for a representative subset of MITgcm ocean modeling scenarios
and are currently evaluating it to be able to support all modules of MITgcm. However, the current syntax for diagnostics caused
a larger comment by our domain experts, as diagnostics are an important topic in climate modeling for various purposes."Having read the paper I still have little idea what representative subset has actually been implemented.
- Â I tried to evaluate the software, also to get a better idea about the structure and functionality of the language, Â but I'm afraid I didn't get very far. There is no documentation at all that is directly accessible, just some installation instructions that were unfortunately insufficient to guide a user, like me, with no experience of using java, maven, etc. For me it is very unclear what the different parts (cp-dsl, cp-dsl-replication, cp-dsl-jupyter-kernel) consists of and how they are supposed to work together, and how they should be installed such that the different components can access each other.
- Â As discussed ocean modelling already brings together a variety of expertises (e.g. oceanography, numerical analysis and HPC) and the CS flavoured approach followed in this paper with DSLs, context-free grammars and metamodels adds a whole layer on top of that. This makes it important to have a clear view of the intended audience and adjust the language to it, briefly explaining key concepts. In particular if the intended audience is ocean model developers, who may be familiar with many advanced computational and numerical techniques, but not with tools and terminology common in DSL approaches, some more guidance would be helpful. Here it is also important to be aware of how terminolgy various between different communities and be specific about what definition is being used. As an example, the authors already point out that the concept of language models adds a new meaning to word model in the ocean modelling context, but within the ocean modelling community the word already has a range of meanings: a conceptual model of the global ocean, a description of its physics, a translation of that into a mathematical model, which in turn are translated into numerical equations whose specific software implemenation is also refered to as an ocean model, and finally a specific configuration of such a model for a specific scenario is again referred to as an ocean model. In this paper the authors choose the (appropriate) definition of a specific software implementation (line 16), but then in line 140 we have "The Model Developer is a software developer and responsible for transferring the ocean models into code" which contradicts that definition and makes it hard to understand what the difference between a "Scientific Modeler" and a "Model Developer" is. As another example, the authors make a distinction between configuration and parameterisation. Here it is important to note that "parameterisation" already has a very specific meaning in the ocean (and atmosphere) modelling community, it refers to processes that are not modelled through PDEs on a numerical grid, but rather through empirical parameterisation of these processes (typically on the sub-grid scale). "parameter selection" might be a better description of what is meant in the paper.
            Configuration is defined in line 37 as "the selection of features and code to beused for the model, as well as, the build configuration." but then on line 268:
"For each simulation experiment, we need to define a configuration and a
parametrization. Independent of a specific experiment, we declare settings
that are specific to an ocean model. Thus, the parameters and configurable
features are declared with CP-DSL in a Declaration Model specific to each
supported ocean model, as depicted in Figure 2. The Configuration Model is
independent of a specific an ocean model, it defines the settings of a concrete
experiment, whereby the declarations in the Declaration Model for the specific
ocean model are referenced. This way, we separate the ocean-model-independent
and the ocean-model-dependent settings"Â Â Â Â Â Â Â which seems to bring an entire new definition of configuration which is used along with the old in the same paragraph.
       As a final example, the word "deploy" and "deployment" is used in a number of places: "Model developers also deploy the software" (line 141), "the deployment is merely configuration and parametrization" (line 163) - and I don't really understand what is meant there - I'm more familiar with its usage as in line 207-214.
Some suggestions for further references
Note: these are suggestions from personal experience only, I don't think the current references are lacking
Section 2 provides a fair overview of previous DSL approaches in the ocean modelling context. Although section 4 gives a good overview of the specific requirements for a configuration system for an ocean model, there isn't a clear separation between those requirements that are specific to ocean models, and those which are in common with the configuration of other types of scientific modelling software, e.g. atmosphere models. As one of the key decissions in the design of DSLs is based around finding the right level of abstraction, it would be worth to extend this discussion to scientific models in general and explain why an ocean-modelling specific language is required, or whether it could be built on top of a more generic approach for the configuration of scientific models in general. As an example of the latter, I have personally been involved in SPUD [5] an XML + RELAX NG based configuration system for scientific computer models (I do share the author's preference for more plain text formats btw).
Already mentioned are Psyclone, Dusk/Dawn and Sprat which target other layers of the software stack, in particular PDE discretisation in combination with automated code generation. In this context it might be worth mentioning the popular FEnics [1] and Firedrake [2], and DUNE [3] projects which make extensive use of such approaches (for context I'm one of the authors of Thetis [4] a coastal ocean model based on Firedrake). You do mention ICON in the context of diagnostic configuration, but I believe there is more DSL-based ICON development described in [6]. Finally the Atmospheric Modelling Language (ATMOL) [7, 8] developed with the Royal Netherlands Meteorological Institute may be worth a mention.[1] M. S. Alnaes, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes and G. N. Wells. The FEniCS Project Version 1.5, Archive of Numerical Software 3 (2015). [doi.org/10.11588/ans.2015.100.20553]
[2] Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. Mcrae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. Firedrake: automating the finite element method by composing abstractions. ACM Trans. Math. Softw., 43(3):24:1–24:27, 2016. doi:10.1145/2998441
[3] Peter Bastian et al. "The Dune framework: Basic concepts and recent developments", https://doi.org/10.1016/j.camwa.2020.06.007
[4] Tuomas Kärnä et al, "Thetis coastal ocean model: discontinuous Galerkin discretization for the three-dimensional hydrostatic equations", GMD, 11, 4359–4382, 2018, https://doi.org/10.5194/gmd-11-4359-2018
[5] Ham, D. A., et al. "Spud 1.0: generalising and automating the user interfaces of scientific computer models." Geoscientific Model Development 2.1 (2009): 33-42
[6] R. Torres et al. "ICON DSL: A Domain-Specific Language for climate modeling" http://sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/wp127s1.pdf
[7] Robert A. van Engelen, "ATMOL: A Domain-Specific Language for Atmospheric Modeling", Journal of computing and information technology 9.4 (2001): 289-303.
[8] Paul van der Mark, "A Case Study for Automatic Code Generation on a Coupled Ocean—Atmosphere Model." International Conference on Computational Science. Springer, Berlin, Heidelberg, 2002.Some smaller specific comments:
line 89: "by model developers what, in principle, allows" - I think you mean "by model developers which, in principle, allows"
lines 186-190: I would suggest replacing grid density, dense and wide with grid resolution, fine (grid resolution), and coarse which are more common
section 3.5: I don't really understand the difference being made between versions and variants, and how it is relevant to the rest of the paper
line 203: why mention openmp but not MPI (the Message Passing Interface not Max Planck!)
line 239-240: "Code management and sharing via tar-balls, ssh and email is mostly deprecated and replaced by Git, often with the Git Large File Storage (Git LFS) extension." I think that's a little too strong; the use of ssh and shared file systems is still very common practice.Â
line 242: what is meant by two layers?Citation: https://doi.org/10.5194/gmd-2021-311-RC2 -
AC3: 'Reply on RC2', Wilhelm Hasselbring, 14 Jan 2022
Thanks for your valuable feedback to our submitted paper!
We respond to the comments that require changes.
Your comment:
- "The paper spends a good amount of time discussing the requirements of an ocean model configuration system (important), but there is considerable less attention to explaining what the actual objectives of this project are and how the chosen approach helps to achieve these. As the authors quite rightly state the added complexity and risks (in terms of new dependencies) need to provide benefits, but there is only a limited discussion of what these are, and in particular how the specific choices in this project deliver these."
Reply:
- We agree that the goals and objectives should be more specific. We have revised Section 6 to reflect this and have explicitly linked the design decisions to the challenges outlined in the introduction. We have also expanded the paragraph in the introduction to state this more clearly.
Your comment:
- "This, and the fact that the actual implementation is only described in a rather abstract way, with only a few restricted excerpts and no concrete examples, make it hard to judge to what extent such benefits are delivered by this project."
Reply:
- We extended the examples in the paper significantly and instead of presenting portions of it in separate listings, we integrated them into a longer listing giving a better example on how the specification files look like. Furthermore, we extend the replication package with a complete step by step instruction on how to repeat example setups with UVic and MITgcm.
Your comment:
- "I would really like to see a more complete overview of the features that have been implemented, and a lot more concrete examples of actually what goes into the "configuration model" vs. the "declaration model" so we can get a better idea how universal the language is and able to specify things in a model independent way."
Reply:
- The examples added for the previous comment were designed to demonstrate the features in detail. See the new Listings 1 through 5 in Section 6, which are a full setup for UVic and an excerpt for MITgcm.
Your comment:
- "the most problematic section in my view is section 5 which provides a very abstract overview of the syntax of the proposed language but only through a few excerpts that do not give a very clear picture of the language as a whole."
Reply:
- Similar to a complete example, a complete syntax would also be very extensive within the paper. Therefore, the replication package was further expanded to emphasize this. We also provide the complete grammars in the replication package, including references to the actual grammars in the code base.
Your comment:
- "I also do not find the UML diagrams to be particularly enlightening (figures 3-6)."
Reply:
- The diagrams show the DSL implementation in a compact form that would otherwise be very extensive. They represent the abstract syntax of the DSLs. Nevertheless, we have revised Section 5 (now Section 6) to address your concern.
Your comment:
- "it's only in section 6 that we finally get told what CP-DSL is actually made of, but the description of the key components in section 6.1 is very terse. Please explain for instance what EMF is."
Reply:
- We have added short explanations for the used technology in that section (now Section 7.1). An excerpt regarding EMF from the additions to the text of this section: "XText is a DSL development framework and toolchain which provides its own IDE and allows to create a DSL, generator, editor and other facilities for a DSL mainly based on a grammar specification. XTend is the template and programming language used with XText and served as inspiration for our Template DSL. Finally, EMF is an implementation of the essential subset of the Meta-Object Facility (EMOF)~\citep{mof20042} that allows to specify metamodels. XText uses EMF to model the abstract syntax of grammars."
Your comment:
- "As a side note, it also seems that some of these components bring in a dependency on a specific version of Java which seems to be in contrast with one the requirements (line 251-253) and isn't particularly well supported on some of the HPC systems that ocean models run on."
Reply:
- We agree that this is a potential risk as we use a technology not widely used in the domain. However, we used this technology stack, as it allows for agile language development, i.e., we can create and modify the DSL in short time and gain feedback from users on a working prototype. This allows to advance faster and closer to the users' needs. Furthermore, on the HPC systems in our partner organizations, Java is available. It is also possible to execute our tools in a Docker container, e.g., utilizing Singularity. Especially, the Jupyter setup is designed for this purpose. On standard workstations, Java is not an issue. As mitigation of this risk, we provide the abstract syntax and metamodel to facilitate porting the parser and code generator to another technology stack, if necessary.
Your comment:
- "as mentioned before the authors do not really evaluate or discuss what has actually been achieved in this work. The evaluation by other users in section 7 is described in a rather superficial way. The first evaluation in section 7.1 seems to be about a quite different version of the language, so the only information we get, about a second evaluation is "The division in general parameters and modules was considered useful."
Reply:
- The evaluation is primarily driven by our two case studies and uses (a) setups for existing experiments as examples, and (b) reviews by research software engineers. We have revised the evaluation section to better reflect this. We also added details of the used example and further comments on the DSL from domain experts to Section 8.
Your comment:
- ""Also the reworked YAML syntax was rated easy to understand." This is actually the very first place (right at the end of the manuscript) where it is actually mentioned that the CP-DL syntax is closely (?) related to YAML - the other places YAML is only mentioned to contrast with XML and JSON."
Reply:
- We added YAML as source for our concrete syntax design in the revision of Section 6. This was the result of our first evaluation step. While the abstract syntax remained the same, we adopted concrete syntax elements from YAML.
Your comment:
- "As discussed ocean modelling already brings together a variety of expertises (e.g. oceanography, numerical analysis and HPC) and the CS flavoured approach followed in this paper with DSLs, context-free grammars and metamodels adds a whole layer on top of that. This makes it important to have a clear view of the intended audience and adjust the language to it, briefly explaining key concepts. In particular if the intended audience is ocean model developers, who may be familiar with many advanced computational and numerical techniques, but not with tools and terminology common in DSL approaches, some more guidance would be helpful."
Reply:
- We are aware that the terminology is ambiguous. We therefore added a new section on terminology to introduce all necessary terms, see Section 2. Our DSL adds another tool to the toolchain of research software engineers or model developers. However, they do not need to understand the building blocks of the DSL. Instead, they can focus on the declaration of parameters and configuration options (Declaration view), the specification of configurations and parameter selection (Configuration view). These two views separate concerns of these aspects for model setup, including additional checks to ensure working setups. This reduces mistakes and error. As the DSL allows to support a wide range of different output files for each ocean model, it allows to reduce the complexity for a user to comprehend all the different syntaxes and conventions, as only the syntax of the Configuration DSL is used.
Your comment:
- "Here it is also important to be aware of how terminology various between different communities and be specific about what definition is being used. As an example, the authors already point out that the concept of language models adds a new meaning to word model in the ocean modelling context, but within the ocean modelling community the word already has a range of meanings: a conceptual model of the global ocean, a description of its physics, a translation of that into a mathematical model, which in turn are translated into numerical equations whose specific software implementation is also referred to as an ocean model, and finally a specific configuration of such a model for a specific scenario is again referred to as an ocean model. In this paper the authors choose the (appropriate) definition of a specific software implementation (line 16), but then in line 140 we have "The Model Developer is a software developer and responsible for transferring the ocean models into code" which contradicts that definition and makes it hard to understand what the difference between a "Scientific Modeler" and a "Model Developer" is."
Reply:
- As the terminology was ambiguous in this context, we have expanded Section 4 and changed "model developer" to the more fitting term "Research Software Engineer" as this provides a better description of this role.
Your comment:
- "As another example, the authors make a distinction between configuration and parameterisation. Here it is important to note that "parameterisation" already has a very specific meaning in the ocean (and atmosphere) modelling community, it refers to processes that are not modelled through PDEs on a numerical grid, but rather through empirical parameterisation of these processes (typically on the sub-grid scale). "parameter selection" might be a better description of what is meant in the paper."
Reply:
- We also find the definition to be more plausible than the one stated previously, thus we replaced it with the proposed one, i.e., "parameter selection". Thanks for this suggestion.
Your comment:
- "Configuration is defined in line 37 as "the selection of features and code to beused for the model, as well as, the build configuration." but then on line 268: "For each simulation experiment, we need to define a configuration and a parametrization. Independent of a specific experiment, we declare settings that are specific to an ocean model. Thus, the parameters and configurable features are declared with CP-DSL in a Declaration Model specific to each supported ocean model, as depicted in Figure 2. The Configuration Model is independent of a specific an ocean model, it defines the settings of a concrete experiment, whereby the declarations in the Declaration Model for the specific ocean model are referenced. This way, we separate the ocean-model-independent and the ocean-model-dependent settings." which seems to bring an entire new definition of configuration which is used along with the old in the same paragraph."
- Reply: Based on your review, we agree that the explanations seem to contradict themself, accordingly we rephrased these paragraphs. Also, we added a terminology section to the paper for further clarification. Also see our next reply.
We added the following text: "For each simulation experiment, we need to define a configuration and a parameter selection. However, portions of experiment settings might be identical across several configurations, e.g., in parameter optimization or to test scientific model stability regarding certain parameter changes. These definitions are stored in a Configuration Model. Independent of a specific experiment, we need to declare which settings are genuine to a specific scientific model. This makes the CP-DSL agnostic to specific scientific models, as the declarations are stored in a Declaration Model. The Declaration Model declares the parameters and configurable features available for a specific scientific model, as depicted in Figure 3, and serve a similar purpose as declaring data types in a programming language."
Your comment:
- "As a final example, the word "deploy" and "deployment" is used in a number of places: "Model developers also deploy the software" (line 141), "the deployment is merely configuration and parametrization" (line 163) - and I don't really understand what is meant there - I'm more familiar with its usage as in line 207-214."
Reply:
- For a more clear explanation of these terms, we have included them in the new terminology Section 2.
Your comment:
- "As one of the key decisions in the design of DSLs is based around finding the right level of abstraction, it would be worth to extend this discussion to scientific models in general and explain why an ocean-modelling specific language is required, or whether it could be built on top of a more generic approach for the configuration of scientific models in general."
Reply:
- Since the starting point of our project are ocean models, we observed similarities with other models in Earth System Climate Models during the development of CP-DSL. Therefore, we see the basic consideration of supporting more scientific models, but find that the requirements of specific domains are too different. In addition, such a generalization would require many more examples at this time.
Your comment:
- "Already mentioned are Psyclone, Dusk/Dawn and Sprat which target other layers of the software stack, in particular PDE discretization in combination with automated code generation. In this context it might be worth mentioning the popular FEnics [1] and Firedrake [2], and DUNE [3] projects which make extensive use of such approaches (for context I'm one of the authors of Thetis [4] a coastal ocean model based on Firedrake). You do mention ICON in the context of diagnostic configuration, but I believe there is more DSL-based ICON development described in [6]. Finally the Atmospheric Modelling Language (ATMOL) [7, 8] developed with the Royal Netherlands Meteorological Institute may be worth a mention."
Reply:
- So far, we considered recently used projects, but agree that ATMOL should be mentioned as one of the first DSLs for weather and climate modeling. Thus, we have expanded Section 3.1. Similarly, since the FEnics, Firedrake, and Dune frameworks also use DSL for high-level specification of model equations in scientific modeling, we have included them accordingly. As for the ICON DSL, we intend to emphasize the use of CDI-pio in the ICON Earth System Model. To clarify this, we replaced the reference with a more specific one. Nevertheless, we thank you for the valuable reference of ICON DSL. It is now cited in Section 3.1.
Your comment:
- "section 3.5: I don't really understand the difference being made between versions and variants, and how it is relevant to the rest of the paper"
Reply:
- For further clarification of these terms, we have included them in the terminology Section 2. We have also rewritten the paragraph to clarify the difference and the importance: "As a development tool, the DSL must fit into an existing toolchain of the domain. A major concern is documentation and, in this context, versions and variants of the model. The DSL design should take this into account since it has to comply with established version control systems."
Your comment:
- "why mention openmp but not MPI (the Message Passing Interface not Max Planck!)"
Reply:
- We agree that MPI should be mentioned here to be complete and have added it to Sections 3.1 and 4.6.
Your comment:
- "Code management and sharing via tar-balls, ssh and email is mostly deprecated and replaced by Git, often with the Git Large File Storage (Git LFS) extension." I think that's a little too strong; the use of ssh and shared file systems is still very common practice."
Reply:
- This is true, and although ssh and shared file systems are used, according to the experts interviewed, the advantages of Git outweigh these and are replaced accordingly where possible. Still, we rephrased the text to Section~5 to emphasize this fact: "Code management and sharing via tar-balls, ssh and email are used, but become less favorable among scientists who we interviewed."
Your comment:
- "what is meant by two layers?"
Reply:
- We agree that this needs more clarification and have extended the paragraph and added an explanatory figure: "The projects and developers use multiple repositories loosely following a repository hierarchy with the community repository at the top, followed by institute-wide repositories, research-group repositories and two layers of individual repositories labeled \emph{installation} and \emph{user} in Figure 2."
Thanks also for all the comments on typos, that we fixed in the paper.Â
Citation: https://doi.org/10.5194/gmd-2021-311-AC3
Status: closed
-
RC1: 'Comment on gmd-2021-311', Andrew Porter, 10 Nov 2021
General points
The paper describes the development and evaluation of a Domain-Specific Language (DSL) for use in the configuration and parameterisation of ocean models. The new language ('CP-DSL') is evaluated in the context of the UVic and MITgcm ocean models but is intended to be generally applicable to others. Configuration and parameterisation of ocean models can be complex due to the number of scientific components involved and the historical evolution of the code base. This complexity presents a considerable barrier to adoption of a model and can also be the source of errors meaning that a scientist may not be running a simulation with precisely the options that they intend.
CP-DSL abstracts out the various mechanisms that may be used to configure a model (e.g. CPP directives, include files, Fortran namelists) and simply presents a user with named options and settings which may be grouped appropriately. As such, I think it is an approach that has value although I feel that the paper itself could do with making this case more strongly. Related to this, although there is a discussion of the evaluation of the features and use of CP-DSL, there is no mention of what the model developers themselves think - are they keen to adopt CP-DSL or do they have reservations?
It seems that configuration/control of diagnostic outputs is at an early stage in CP-DSL but this is a critical and complex part of production jobs. I can see value in a common way of specifying diagnostics as this extends beyond ocean modelling: there is a relatively small number of IO systems and these tend to be common between e.g. atmosphere and ocean models.
Specific points
Although the discussion of the various roles played in the development of ocean models in Section 3.1 is interesting, I don't think it adds any value to the paper (which is primarily about the new DSL) and could be removed.
Figure 2 and its accompanying text mention that there are options that are common between ocean models. Given the subtleties that can occur when different scientists implement the same numerical scheme I think that determining such options could be problematic. I think some examples of such options would be helpful here. Is there a need for an agreed set of named quantities?
The UK Met Office uses 'Rose' for job configuration (see https://metomi.github.io/rose/doc/html/tutorial/rose/index.html#rose-tutorial) which has some similarities with the approach described in this paper. It is likely that other meteorological centres have similar configuration systems.
Â
Technical corrections
Â
(References are to page_number:line_number)
2:38 No comma after "as well as" and throughout the text.
3:87 PSyclone is developed by the UK Science and Technology Facilities Council's Hartree Centre in collaboration with the UK Met Office and the Australian Bureau of Meteorology. PSyclone has two 'modes' of operation: as an internal DSL (as used for the UK Met Office's LFRic atmosphere model) and as a code transformation tool (as used with the full NEMO ocean model).
4:89 "model developers what," should be "model developers which,"
4:105 "diagnoses" should be "diagnostics"
5:143 "responsible to control" should be "responsible for controlling"
7:186-189 "densities" should be "resolutions", "wide mesh" => "coarse mesh", "denser mesh" => "finer mesh",
"denser and wider meshes" => "fine and coarse meshes", "as it is the case" => "as is the case"
8:205 Not all models set grid sizes at compile time. This is a run-time option in NEMO for example.
8:214 Deployment of models is, I believe, something that the CYLC workflow engine (https://cylc.github.io/) does. Please compare.
9:231 No comma after e.g.
9:235 "built tools" => "build tools"
9:236 "whereby some are not available" => "which are not always available"
9:239 SVN is used as well as git (e.g. by NEMO)
9:252 "allows to draw" => "allows users to draw"
10:258 "derived form" => "derived from"
11:281 "look like to" => "look to"
11:294 "allows to have" => "allows"
14:325 "for a ocean" => "for an ocean"
15:330 "allows to import other" => "allows the import of other"
18:422 "Xtend" => "XTend"
20:473 "utilized in a joint session" - a joint session of what or whom?
20:474 What does "UVic is the reference simulation by GEOMAR of the University of Victoria model." mean?
A better reference for GOcean may be found at http://nora.nerc.ac.uk/id/eprint/521162/
Citation: https://doi.org/10.5194/gmd-2021-311-RC1 -
AC1: 'Reply on RC1', Wilhelm Hasselbring, 14 Jan 2022
Thanks for your valuable feedback to our submitt‹ed paper!
We respond to the comments that require changes.
Your comment:
- "As such, I think it is an approach that has value although I feel that the paper itself could do with making this case more strongly."
Reply:
- We will add comments from domain experts to highlight the advantages of the approach.
See also the folloging reply.
Your comment:
- "Related to this, although there is a discussion of the evaluation of the features and use of CP-DSL, there is no mention of what the model developers themselves think – are they keen to adopt CP-DSL or do they have reservations?"
Reply:
- We presented the CP-DSL to research so‰ware engineers and scientific modelers at GEOMAR, and they are (1) very much interested in a DSL in their domain of ocean system modeling and (2) are interested in using the DSL. However, their responses only apply to the currently adapted case studies with UVic and MITgcm (as our paper title indicates).
Your comment:
- "It seems that configuration/control of diagnostic outputs is at an early stage in CP-DSL but this is a critical and complex part of production jobs. I can see value in a common way of specifying diagnostics as this extends beyond ocean modelling: there is a relatively small number of IO systems and these tend to be common between e.g. atmosphere and ocean models."
Reply:
- Th‘e diagnostics con€guration in the DSL is indeed in an early stage and primarily motivated by the diagnostics features present in MITgcm. Its inclusion in the current DSL helps to be‹tter engage in a follow-up discussion on how to specifiy this aspect. While it is possible to specify logging based on the parameter groups and module structure of the CP-DSL, we aim to provide speci€c structures for diagnostics and logging that allow
users to be concise when specifying diagnostics and instruct them safety through specifi€c DSL features.
We are aware of XIOS and other logging and diagnostics facilities for scienti€c models (cf. https://www.esiwace.eu/services/software-support/supXIOS). However, XIOS is seen by our interviewees as complicated and not applicable to all models. Our DSL could – with a suitable template for XIOS con€figuration generation -- support XIOS configuration files. This is future work as mentioned in the paper.
Your comment:
- "Although the discussion of the various roles played in the development of ocean models in Section 3.1 is interesting, I don't think it adds any value to the paper (which is primarily about the new DSL) and could be removed."
Reply:
- The roles are important to understand the processes and thus determine the requirements for the DSL, which in turn leads to the design of the DSL. They are also relevant to understand which DSL addresses which role, i.e., the Declaration and Template specifications targets the research software engineer, while the Configuration specification addresses the needs of the scientific modeler or model user. We will make this more explicit in the paper.
Your comment:
- "Figure 2 and its accompanying text mention that there are options that are common between ocean models. Given the subtleties that can occur when different scientists implement the same numerical scheme I think that determining such options could be problematic. I think some examples of such options would be helpful here. Is there a need for an agreed set of named quantities?"
Reply:
- The text for Figure 2 was, indeed, misleading, sorry. The Declaration and Template models are specific to one scientific model, while the Configuration model is specific to one scientific model setup. We have updated the text and illustration accordingly to clarify this.
Your comment:
- "The UK Met Office uses 'Rose' for job configuration (see https://metomi.github.io/rose/doc/html/tutorial/rose/index.html\#rose-tutorial) which has some similarities with the approach described in this paper. It is likely that other meteorological centres have similar configuration systems."
Reply:
- Thanks for the pointer. Rose' is simpler than CP-DSL, e.g. only name-value pairs and sections are used. Whereas CP-DSL uses higher-level concepts like groups of parameters and features with dependencies. However, we see 'Rose' as an important related work and now include it accordingly in Section 3.3.
Your comment:
- "PSyclone is developed by the UK Science and Technology Facilities Council's Hartree Centre in collaboration with the UK Met Office and the Australian Bureau of Meteorology. PSyclone has two 'modes' of operation: as an internal DSL (as used for the UK Met Office's LFRic atmosphere model) and as a code transformation tool (as used with the full NEMO ocean model)."
Reply:
- We agree that PSyclone provides both applications and mention its use of an internal language as well as the extensions to Fortran code and its ability to transform these via in-place code transformation. To further clarify this, we add the following text: "The use of code transformation makes it possible to both write code and use PSyclone for optimizations in existing code that uses appropriate code structures."
Your comment:
- "Not all models set grid sizes at compile time. This is a run-time option in NEMO for example."
Reply:
- Thanks for this additional information. It is correct that grid sizes can also be a dynamic namelist option for some models, still it can also be defined at compile time. Nevertheless, we agree with the point that there are also models that only allow to set grid sizes at runtime. Accordingly, we have replaced the example and pointed out that models may implement options differently. We added the following text: "However, not all parameters can be set at runtime as optional modules enable or disable complete parts of the earth system model, e.g. atmosphere model. Thus, they are set at compile time. For example, in some models mesh and grid sizes are set at compile time."
Your comment:
- "Deployment of models is, I believe, something that the CYLC workflow engine (https://cylc.github.io/) does. Please compare."
Reply:
- In our paper, we focus on configuration and parametrization. Model deployment is an important issue, but it will be addressed in our project (OceanDSL) with a separate deployment DSL (not covered by the present paper). Nevertheless, we appreciate the interesting reference and have included it in the section as follows: "These requirements are addressed by other DSLs and tools, like the CYCL workflow engine~\citep{oliver2018cylc}."
Your comment:
- "SVN is used as well as git (e.g. by NEMO)."
Reply:
- This is correct, but according to our interviewed experts SVN is outdated and mostly replaced with Git. However, we extended the text to Section 5 to emphasize this fact: "Some model projects, like NEMO~\citep{NEMO2015}, still use SVN."
Your comment:
- "What does "UVic is the reference simulation by GEOMAR of the University of Victoria model." mean?"
Reply:
- The GEOMAR Helmholtz Centre for Ocean Research Kiel uses a refined version of the UVic ESCM model developed by the University of Victoria. To make this point explicit, we moved the paragraph and added the following paragraph to Section 8: "The GEOMAR Helmholtz Centre for Ocean Research Kiel uses a refined version of the UVic ESCM model developed by the University of Victoria~\citep{keller2012new}."
Your comment:
- "A better reference for GOcean may be found at http://nora.nerc.ac.uk/id/eprint/521162/"
Reply:
- Thank you for the reference. We also consider this reference as appropriate and replaced the existing with the proposed one.
Â
Citation: https://doi.org/10.5194/gmd-2021-311-AC1 -
AC2: 'Reply on RC1', Wilhelm Hasselbring, 14 Jan 2022
Sorry, I skipped one comment.
Your comment:
- ""utilized in a joint session" - a joint session of what or whom?"
Reply:
- The DSL was presented to the experts using an application prepared by us in order to incorporate their feedback. We have edited the text to reflect this: "The revised CP-DSL was reviewed and utilized in a joint session with the domain experts and the DSL developers to demonstrate its usage."
Thanks also for all the comments on typos, that we fixed in the paper.
Citation: https://doi.org/10.5194/gmd-2021-311-AC2
-
AC1: 'Reply on RC1', Wilhelm Hasselbring, 14 Jan 2022
-
RC2: 'Comment on gmd-2021-311', Stephan Kramer, 23 Nov 2021
The manuscript "CP-DSL: Supporting Configuration and Parametrization of Ocean
Models with UVic (2.9) and MITgcm (67w)" describes the implementation of a Domain Specific Language approach for the configuration of ocean models in the form of a new language CP-DSL. The subject of this paper addresses an important topic regarding the often complex configuration process of ocean models with a number of strongly conflicting issues like user friendliness for users with a wide range of expertises, the  wide variety of competences involved in the construction of accurate and efficient ocean models which affect both the way the software is developed but also its optimal configuration, and the fact that different models have quite different configuration processes, where the expertise of an advanced user of a specific model does not easily translate to that of another model. Although I believe a number of interesting ideas and techniques to address these challenges are presented in this manuscript, a number of issues in the presentation make it quite hard to evaluate what has actually been achieved.The paper spends a good amount of time discussing the requirements of an ocean model configuration system (important), but there is considerable less attention to explaining what the actual objectives of this project are and how the chosen approach helps to achieve these. As the authors quite rightly state the added complexity and risks (in terms of new dependencies) need to provide benefits, but there is only a limited discussion of what these are, and in particular how the specific choices in this project deliver these. This, and the fact that the actual implementation is only described in a rather abstract way, with only a few restricted excerpts and no concrete examples, make it hard to judge to what extent such benefits are delivered by this project.
Specific issues throughout the manuscript:
- the most problematic section in my view is section 5 which provides a very abstract overview of the syntax of the proposed language but only through a few excerpts that do not give a very clear picture of the language as a whole. I also do not find the UML diagrams to be particularly enlightening (figures 3-6). I would really like to see a more complete overview of the features that have been implemented, and a lot more concrete examples of actually what goes into the "configuration model" vs. the "declaration model" so we can get a better idea how universal the language is and able to specify things in a model independent way.
- Â it's only in section 6 that we finally get told what CP-DSL is actually made of, but the description of the key components in section 6.1 is very terse. Please explain for instance what EMF is. As a side note, it also seems that some of these components bring in a dependency on a specific version of Java which seems to be in contrast with one the requirements (line 251-253) and isn't particularly well supported on some of the HPC systems that ocean models run on.
- as mentioned before the authors do not really evaluate or discuss what has actually been achieved in this work. The evaluation by other users in section 7 is described in a rather superficial way. The first evaluation in section 7.1 seems to be about a quite different version of the language, so the only information we get, about a second evaluation is "The division in general parameters and modules was considered useful. Also the reworked YAML syntax was rated easy to understand." This is actually the very first place (right at the end of the manuscript) where it is actually mentioned that the CP-DL syntax is closely (?) related to YAML - the other places YAML is only mentioned to contrast with XML and JSON. From the conclusions:
"As this is an ongoing research project, we aim to further extend and improve CP-DSL in close contact with users and active
scientists from the domain. We initially developed the DSL for a representative subset of MITgcm ocean modeling scenarios
and are currently evaluating it to be able to support all modules of MITgcm. However, the current syntax for diagnostics caused
a larger comment by our domain experts, as diagnostics are an important topic in climate modeling for various purposes."Having read the paper I still have little idea what representative subset has actually been implemented.
- Â I tried to evaluate the software, also to get a better idea about the structure and functionality of the language, Â but I'm afraid I didn't get very far. There is no documentation at all that is directly accessible, just some installation instructions that were unfortunately insufficient to guide a user, like me, with no experience of using java, maven, etc. For me it is very unclear what the different parts (cp-dsl, cp-dsl-replication, cp-dsl-jupyter-kernel) consists of and how they are supposed to work together, and how they should be installed such that the different components can access each other.
- Â As discussed ocean modelling already brings together a variety of expertises (e.g. oceanography, numerical analysis and HPC) and the CS flavoured approach followed in this paper with DSLs, context-free grammars and metamodels adds a whole layer on top of that. This makes it important to have a clear view of the intended audience and adjust the language to it, briefly explaining key concepts. In particular if the intended audience is ocean model developers, who may be familiar with many advanced computational and numerical techniques, but not with tools and terminology common in DSL approaches, some more guidance would be helpful. Here it is also important to be aware of how terminolgy various between different communities and be specific about what definition is being used. As an example, the authors already point out that the concept of language models adds a new meaning to word model in the ocean modelling context, but within the ocean modelling community the word already has a range of meanings: a conceptual model of the global ocean, a description of its physics, a translation of that into a mathematical model, which in turn are translated into numerical equations whose specific software implemenation is also refered to as an ocean model, and finally a specific configuration of such a model for a specific scenario is again referred to as an ocean model. In this paper the authors choose the (appropriate) definition of a specific software implementation (line 16), but then in line 140 we have "The Model Developer is a software developer and responsible for transferring the ocean models into code" which contradicts that definition and makes it hard to understand what the difference between a "Scientific Modeler" and a "Model Developer" is. As another example, the authors make a distinction between configuration and parameterisation. Here it is important to note that "parameterisation" already has a very specific meaning in the ocean (and atmosphere) modelling community, it refers to processes that are not modelled through PDEs on a numerical grid, but rather through empirical parameterisation of these processes (typically on the sub-grid scale). "parameter selection" might be a better description of what is meant in the paper.
            Configuration is defined in line 37 as "the selection of features and code to beused for the model, as well as, the build configuration." but then on line 268:
"For each simulation experiment, we need to define a configuration and a
parametrization. Independent of a specific experiment, we declare settings
that are specific to an ocean model. Thus, the parameters and configurable
features are declared with CP-DSL in a Declaration Model specific to each
supported ocean model, as depicted in Figure 2. The Configuration Model is
independent of a specific an ocean model, it defines the settings of a concrete
experiment, whereby the declarations in the Declaration Model for the specific
ocean model are referenced. This way, we separate the ocean-model-independent
and the ocean-model-dependent settings"Â Â Â Â Â Â Â which seems to bring an entire new definition of configuration which is used along with the old in the same paragraph.
       As a final example, the word "deploy" and "deployment" is used in a number of places: "Model developers also deploy the software" (line 141), "the deployment is merely configuration and parametrization" (line 163) - and I don't really understand what is meant there - I'm more familiar with its usage as in line 207-214.
Some suggestions for further references
Note: these are suggestions from personal experience only, I don't think the current references are lacking
Section 2 provides a fair overview of previous DSL approaches in the ocean modelling context. Although section 4 gives a good overview of the specific requirements for a configuration system for an ocean model, there isn't a clear separation between those requirements that are specific to ocean models, and those which are in common with the configuration of other types of scientific modelling software, e.g. atmosphere models. As one of the key decissions in the design of DSLs is based around finding the right level of abstraction, it would be worth to extend this discussion to scientific models in general and explain why an ocean-modelling specific language is required, or whether it could be built on top of a more generic approach for the configuration of scientific models in general. As an example of the latter, I have personally been involved in SPUD [5] an XML + RELAX NG based configuration system for scientific computer models (I do share the author's preference for more plain text formats btw).
Already mentioned are Psyclone, Dusk/Dawn and Sprat which target other layers of the software stack, in particular PDE discretisation in combination with automated code generation. In this context it might be worth mentioning the popular FEnics [1] and Firedrake [2], and DUNE [3] projects which make extensive use of such approaches (for context I'm one of the authors of Thetis [4] a coastal ocean model based on Firedrake). You do mention ICON in the context of diagnostic configuration, but I believe there is more DSL-based ICON development described in [6]. Finally the Atmospheric Modelling Language (ATMOL) [7, 8] developed with the Royal Netherlands Meteorological Institute may be worth a mention.[1] M. S. Alnaes, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes and G. N. Wells. The FEniCS Project Version 1.5, Archive of Numerical Software 3 (2015). [doi.org/10.11588/ans.2015.100.20553]
[2] Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. Mcrae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. Firedrake: automating the finite element method by composing abstractions. ACM Trans. Math. Softw., 43(3):24:1–24:27, 2016. doi:10.1145/2998441
[3] Peter Bastian et al. "The Dune framework: Basic concepts and recent developments", https://doi.org/10.1016/j.camwa.2020.06.007
[4] Tuomas Kärnä et al, "Thetis coastal ocean model: discontinuous Galerkin discretization for the three-dimensional hydrostatic equations", GMD, 11, 4359–4382, 2018, https://doi.org/10.5194/gmd-11-4359-2018
[5] Ham, D. A., et al. "Spud 1.0: generalising and automating the user interfaces of scientific computer models." Geoscientific Model Development 2.1 (2009): 33-42
[6] R. Torres et al. "ICON DSL: A Domain-Specific Language for climate modeling" http://sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/wp127s1.pdf
[7] Robert A. van Engelen, "ATMOL: A Domain-Specific Language for Atmospheric Modeling", Journal of computing and information technology 9.4 (2001): 289-303.
[8] Paul van der Mark, "A Case Study for Automatic Code Generation on a Coupled Ocean—Atmosphere Model." International Conference on Computational Science. Springer, Berlin, Heidelberg, 2002.Some smaller specific comments:
line 89: "by model developers what, in principle, allows" - I think you mean "by model developers which, in principle, allows"
lines 186-190: I would suggest replacing grid density, dense and wide with grid resolution, fine (grid resolution), and coarse which are more common
section 3.5: I don't really understand the difference being made between versions and variants, and how it is relevant to the rest of the paper
line 203: why mention openmp but not MPI (the Message Passing Interface not Max Planck!)
line 239-240: "Code management and sharing via tar-balls, ssh and email is mostly deprecated and replaced by Git, often with the Git Large File Storage (Git LFS) extension." I think that's a little too strong; the use of ssh and shared file systems is still very common practice.Â
line 242: what is meant by two layers?Citation: https://doi.org/10.5194/gmd-2021-311-RC2 -
AC3: 'Reply on RC2', Wilhelm Hasselbring, 14 Jan 2022
Thanks for your valuable feedback to our submitted paper!
We respond to the comments that require changes.
Your comment:
- "The paper spends a good amount of time discussing the requirements of an ocean model configuration system (important), but there is considerable less attention to explaining what the actual objectives of this project are and how the chosen approach helps to achieve these. As the authors quite rightly state the added complexity and risks (in terms of new dependencies) need to provide benefits, but there is only a limited discussion of what these are, and in particular how the specific choices in this project deliver these."
Reply:
- We agree that the goals and objectives should be more specific. We have revised Section 6 to reflect this and have explicitly linked the design decisions to the challenges outlined in the introduction. We have also expanded the paragraph in the introduction to state this more clearly.
Your comment:
- "This, and the fact that the actual implementation is only described in a rather abstract way, with only a few restricted excerpts and no concrete examples, make it hard to judge to what extent such benefits are delivered by this project."
Reply:
- We extended the examples in the paper significantly and instead of presenting portions of it in separate listings, we integrated them into a longer listing giving a better example on how the specification files look like. Furthermore, we extend the replication package with a complete step by step instruction on how to repeat example setups with UVic and MITgcm.
Your comment:
- "I would really like to see a more complete overview of the features that have been implemented, and a lot more concrete examples of actually what goes into the "configuration model" vs. the "declaration model" so we can get a better idea how universal the language is and able to specify things in a model independent way."
Reply:
- The examples added for the previous comment were designed to demonstrate the features in detail. See the new Listings 1 through 5 in Section 6, which are a full setup for UVic and an excerpt for MITgcm.
Your comment:
- "the most problematic section in my view is section 5 which provides a very abstract overview of the syntax of the proposed language but only through a few excerpts that do not give a very clear picture of the language as a whole."
Reply:
- Similar to a complete example, a complete syntax would also be very extensive within the paper. Therefore, the replication package was further expanded to emphasize this. We also provide the complete grammars in the replication package, including references to the actual grammars in the code base.
Your comment:
- "I also do not find the UML diagrams to be particularly enlightening (figures 3-6)."
Reply:
- The diagrams show the DSL implementation in a compact form that would otherwise be very extensive. They represent the abstract syntax of the DSLs. Nevertheless, we have revised Section 5 (now Section 6) to address your concern.
Your comment:
- "it's only in section 6 that we finally get told what CP-DSL is actually made of, but the description of the key components in section 6.1 is very terse. Please explain for instance what EMF is."
Reply:
- We have added short explanations for the used technology in that section (now Section 7.1). An excerpt regarding EMF from the additions to the text of this section: "XText is a DSL development framework and toolchain which provides its own IDE and allows to create a DSL, generator, editor and other facilities for a DSL mainly based on a grammar specification. XTend is the template and programming language used with XText and served as inspiration for our Template DSL. Finally, EMF is an implementation of the essential subset of the Meta-Object Facility (EMOF)~\citep{mof20042} that allows to specify metamodels. XText uses EMF to model the abstract syntax of grammars."
Your comment:
- "As a side note, it also seems that some of these components bring in a dependency on a specific version of Java which seems to be in contrast with one the requirements (line 251-253) and isn't particularly well supported on some of the HPC systems that ocean models run on."
Reply:
- We agree that this is a potential risk as we use a technology not widely used in the domain. However, we used this technology stack, as it allows for agile language development, i.e., we can create and modify the DSL in short time and gain feedback from users on a working prototype. This allows to advance faster and closer to the users' needs. Furthermore, on the HPC systems in our partner organizations, Java is available. It is also possible to execute our tools in a Docker container, e.g., utilizing Singularity. Especially, the Jupyter setup is designed for this purpose. On standard workstations, Java is not an issue. As mitigation of this risk, we provide the abstract syntax and metamodel to facilitate porting the parser and code generator to another technology stack, if necessary.
Your comment:
- "as mentioned before the authors do not really evaluate or discuss what has actually been achieved in this work. The evaluation by other users in section 7 is described in a rather superficial way. The first evaluation in section 7.1 seems to be about a quite different version of the language, so the only information we get, about a second evaluation is "The division in general parameters and modules was considered useful."
Reply:
- The evaluation is primarily driven by our two case studies and uses (a) setups for existing experiments as examples, and (b) reviews by research software engineers. We have revised the evaluation section to better reflect this. We also added details of the used example and further comments on the DSL from domain experts to Section 8.
Your comment:
- ""Also the reworked YAML syntax was rated easy to understand." This is actually the very first place (right at the end of the manuscript) where it is actually mentioned that the CP-DL syntax is closely (?) related to YAML - the other places YAML is only mentioned to contrast with XML and JSON."
Reply:
- We added YAML as source for our concrete syntax design in the revision of Section 6. This was the result of our first evaluation step. While the abstract syntax remained the same, we adopted concrete syntax elements from YAML.
Your comment:
- "As discussed ocean modelling already brings together a variety of expertises (e.g. oceanography, numerical analysis and HPC) and the CS flavoured approach followed in this paper with DSLs, context-free grammars and metamodels adds a whole layer on top of that. This makes it important to have a clear view of the intended audience and adjust the language to it, briefly explaining key concepts. In particular if the intended audience is ocean model developers, who may be familiar with many advanced computational and numerical techniques, but not with tools and terminology common in DSL approaches, some more guidance would be helpful."
Reply:
- We are aware that the terminology is ambiguous. We therefore added a new section on terminology to introduce all necessary terms, see Section 2. Our DSL adds another tool to the toolchain of research software engineers or model developers. However, they do not need to understand the building blocks of the DSL. Instead, they can focus on the declaration of parameters and configuration options (Declaration view), the specification of configurations and parameter selection (Configuration view). These two views separate concerns of these aspects for model setup, including additional checks to ensure working setups. This reduces mistakes and error. As the DSL allows to support a wide range of different output files for each ocean model, it allows to reduce the complexity for a user to comprehend all the different syntaxes and conventions, as only the syntax of the Configuration DSL is used.
Your comment:
- "Here it is also important to be aware of how terminology various between different communities and be specific about what definition is being used. As an example, the authors already point out that the concept of language models adds a new meaning to word model in the ocean modelling context, but within the ocean modelling community the word already has a range of meanings: a conceptual model of the global ocean, a description of its physics, a translation of that into a mathematical model, which in turn are translated into numerical equations whose specific software implementation is also referred to as an ocean model, and finally a specific configuration of such a model for a specific scenario is again referred to as an ocean model. In this paper the authors choose the (appropriate) definition of a specific software implementation (line 16), but then in line 140 we have "The Model Developer is a software developer and responsible for transferring the ocean models into code" which contradicts that definition and makes it hard to understand what the difference between a "Scientific Modeler" and a "Model Developer" is."
Reply:
- As the terminology was ambiguous in this context, we have expanded Section 4 and changed "model developer" to the more fitting term "Research Software Engineer" as this provides a better description of this role.
Your comment:
- "As another example, the authors make a distinction between configuration and parameterisation. Here it is important to note that "parameterisation" already has a very specific meaning in the ocean (and atmosphere) modelling community, it refers to processes that are not modelled through PDEs on a numerical grid, but rather through empirical parameterisation of these processes (typically on the sub-grid scale). "parameter selection" might be a better description of what is meant in the paper."
Reply:
- We also find the definition to be more plausible than the one stated previously, thus we replaced it with the proposed one, i.e., "parameter selection". Thanks for this suggestion.
Your comment:
- "Configuration is defined in line 37 as "the selection of features and code to beused for the model, as well as, the build configuration." but then on line 268: "For each simulation experiment, we need to define a configuration and a parametrization. Independent of a specific experiment, we declare settings that are specific to an ocean model. Thus, the parameters and configurable features are declared with CP-DSL in a Declaration Model specific to each supported ocean model, as depicted in Figure 2. The Configuration Model is independent of a specific an ocean model, it defines the settings of a concrete experiment, whereby the declarations in the Declaration Model for the specific ocean model are referenced. This way, we separate the ocean-model-independent and the ocean-model-dependent settings." which seems to bring an entire new definition of configuration which is used along with the old in the same paragraph."
- Reply: Based on your review, we agree that the explanations seem to contradict themself, accordingly we rephrased these paragraphs. Also, we added a terminology section to the paper for further clarification. Also see our next reply.
We added the following text: "For each simulation experiment, we need to define a configuration and a parameter selection. However, portions of experiment settings might be identical across several configurations, e.g., in parameter optimization or to test scientific model stability regarding certain parameter changes. These definitions are stored in a Configuration Model. Independent of a specific experiment, we need to declare which settings are genuine to a specific scientific model. This makes the CP-DSL agnostic to specific scientific models, as the declarations are stored in a Declaration Model. The Declaration Model declares the parameters and configurable features available for a specific scientific model, as depicted in Figure 3, and serve a similar purpose as declaring data types in a programming language."
Your comment:
- "As a final example, the word "deploy" and "deployment" is used in a number of places: "Model developers also deploy the software" (line 141), "the deployment is merely configuration and parametrization" (line 163) - and I don't really understand what is meant there - I'm more familiar with its usage as in line 207-214."
Reply:
- For a more clear explanation of these terms, we have included them in the new terminology Section 2.
Your comment:
- "As one of the key decisions in the design of DSLs is based around finding the right level of abstraction, it would be worth to extend this discussion to scientific models in general and explain why an ocean-modelling specific language is required, or whether it could be built on top of a more generic approach for the configuration of scientific models in general."
Reply:
- Since the starting point of our project are ocean models, we observed similarities with other models in Earth System Climate Models during the development of CP-DSL. Therefore, we see the basic consideration of supporting more scientific models, but find that the requirements of specific domains are too different. In addition, such a generalization would require many more examples at this time.
Your comment:
- "Already mentioned are Psyclone, Dusk/Dawn and Sprat which target other layers of the software stack, in particular PDE discretization in combination with automated code generation. In this context it might be worth mentioning the popular FEnics [1] and Firedrake [2], and DUNE [3] projects which make extensive use of such approaches (for context I'm one of the authors of Thetis [4] a coastal ocean model based on Firedrake). You do mention ICON in the context of diagnostic configuration, but I believe there is more DSL-based ICON development described in [6]. Finally the Atmospheric Modelling Language (ATMOL) [7, 8] developed with the Royal Netherlands Meteorological Institute may be worth a mention."
Reply:
- So far, we considered recently used projects, but agree that ATMOL should be mentioned as one of the first DSLs for weather and climate modeling. Thus, we have expanded Section 3.1. Similarly, since the FEnics, Firedrake, and Dune frameworks also use DSL for high-level specification of model equations in scientific modeling, we have included them accordingly. As for the ICON DSL, we intend to emphasize the use of CDI-pio in the ICON Earth System Model. To clarify this, we replaced the reference with a more specific one. Nevertheless, we thank you for the valuable reference of ICON DSL. It is now cited in Section 3.1.
Your comment:
- "section 3.5: I don't really understand the difference being made between versions and variants, and how it is relevant to the rest of the paper"
Reply:
- For further clarification of these terms, we have included them in the terminology Section 2. We have also rewritten the paragraph to clarify the difference and the importance: "As a development tool, the DSL must fit into an existing toolchain of the domain. A major concern is documentation and, in this context, versions and variants of the model. The DSL design should take this into account since it has to comply with established version control systems."
Your comment:
- "why mention openmp but not MPI (the Message Passing Interface not Max Planck!)"
Reply:
- We agree that MPI should be mentioned here to be complete and have added it to Sections 3.1 and 4.6.
Your comment:
- "Code management and sharing via tar-balls, ssh and email is mostly deprecated and replaced by Git, often with the Git Large File Storage (Git LFS) extension." I think that's a little too strong; the use of ssh and shared file systems is still very common practice."
Reply:
- This is true, and although ssh and shared file systems are used, according to the experts interviewed, the advantages of Git outweigh these and are replaced accordingly where possible. Still, we rephrased the text to Section~5 to emphasize this fact: "Code management and sharing via tar-balls, ssh and email are used, but become less favorable among scientists who we interviewed."
Your comment:
- "what is meant by two layers?"
Reply:
- We agree that this needs more clarification and have extended the paragraph and added an explanatory figure: "The projects and developers use multiple repositories loosely following a repository hierarchy with the community repository at the top, followed by institute-wide repositories, research-group repositories and two layers of individual repositories labeled \emph{installation} and \emph{user} in Figure 2."
Thanks also for all the comments on typos, that we fixed in the paper.Â
Citation: https://doi.org/10.5194/gmd-2021-311-AC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,434 | 309 | 55 | 1,798 | 35 | 29 |
- HTML: 1,434
- PDF: 309
- XML: 55
- Total: 1,798
- BibTeX: 35
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1