sympl (v. 0.4.0) and climt (v. 0.15.3) – towards a flexible framework for building model hierarchies in Python
- 1Department of Meteorology, Stockholm University, 106 91 Stockholm, Sweden
- 2Atmospheric Sciences–Geophysics (ATG) Building, University of Washington, Seattle, Washington 98195-1640, USA
Correspondence: Joy Merwin Monteiro (firstname.lastname@example.org)
sympl (System for Modelling Planets) and climt (Climate Modelling and Diagnostics Toolkit) are an attempt to rethink climate modelling frameworks from the ground up. The aim is to use expressive data structures available in the scientific Python ecosystem along with best practices in software design to allow scientists to easily and reliably combine model components to represent the climate system at a desired level of complexity and to enable users to fully understand what the model is doing.
sympl is a framework which formulates the model in terms of a state that gets evolved forward in time or modified within a specific time by well-defined components. sympl's design facilitates building models that are self-documenting, are highly interoperable, and provide fine-grained control over model components and behaviour. sympl components contain all relevant information about the input they expect and output that they provide. Components are designed to be easily interchanged, even when they rely on different units or array configurations. sympl provides basic functions and objects which could be used in any type of Earth system model.
climt is an Earth system modelling toolkit that contains scientific components built using sympl base objects. These include both pure Python components and wrapped Fortran libraries. climt provides functionality requiring model-specific assumptions, such as state initialization and grid configuration. climt's programming interface designed to be easy to use and thus appealing to a wide audience.
Model building, configuration and execution are performed through a Python script (or Jupyter Notebook), enabling researchers to build an end-to-end Python-based pipeline along with popular Python data analysis and visualization tools.
The climate is a complex system composed of interacting subsystems (atmosphere, ocean, cryosphere, biosphere, chemosphere), each encompassing a broad range of physical and chemical processes with space and timescales spanning many orders of magnitude. Modelling and understanding the climate system in its entirety is a grand scientific and technological challenge. An increasingly recognized strategy for tackling this challenge is to build a hierarchy of models of varying complexity. Simpler models are more amenable to in-depth analysis and understanding; the insight gained from these simpler models can then be used to understand more complicated models, and so on (Held, 2005). Specifying which particular models should occupy each rung in such a hierarchy is necessarily a matter of subjective choice, and the questions of how to create a hierarchy and what models are desirable in a canonical hierarchy has generated extensive discussions (Jeevanjee et al., 2017). Our purpose here is to present a modelling framework which enables climate scientists to easily and transparently traverse the specific model hierarchy suiting their needs.
Designing and building a framework that facilitates traversing this hierarchy remains a challenge. The lack of flexibility of existing climate models forces scientists to spend a lot of time reading and modifying code to construct alternative model versions that should in principle be straightforward to build. Most modelling frameworks simply provide code to exchange information between different physical domains such as the atmosphere and the ocean (see Theurich et al., 2015, and Valcke et al., 2012, for a survey of modelling frameworks), with each physical domain being represented by a monolithic block of code. It was only with the advent a new generation of frameworks like the Earth System Modelling Framework (ESMF) (DeLuca et al., 2012; Theurich et al., 2015) that fine-grained control over the components that constitute a climate model was made possible. For instance, ESMF allows configuring components as trees, where the leaf nodes could represent physical processes such as radiation and the root of the tree could represent a physical domain such as the atmosphere. Such a hierarchical ordering of physical processes is also present in Python-based modelling packages – previous versions of climt (Climate Modelling and Diagnostics Toolkit) and climlab (https://github.com/brian-rose/climlab, last access: 20 August 2018) allow building models in a manner similar to ESMF. However, the norm continues to be that climate model composition is configured by name list variables and boolean flags in the code rather than framework-based approaches (like the component trees that ESMF allows).
Another emerging concern in the scientific community is that of the reproducibility of research (Peng, 2011). While publicly available climate models do provide validated configurations that are in principle completely reproducible, climate scientists routinely need to make changes to the code that are hard to document (or understand). Short of sharing a copy of the entire code base, such modifications makes it difficult to reproduce simulations. While some level of code manipulation is inescapable, we note from our own experience and from reading about such modifications in the literature that most of them follow set patterns which could easily be provided by modelling frameworks themselves.
In this paper we present a new modelling framework, sympl (System for Modelling Planets), and a model toolkit, climt. While sympl focuses on providing a model framework, a rich taxonomy of components, and model agnostic configuration options, climt focuses on providing a broad array of physical components to allow users to build scientifically useful models. climt also provides model-dependent configuration options and helper functions to create an initial model state as required by the components.
sympl and climt allow finer-grained control over the composition of the model, with individual components representing physical processes (such as radiation and convection) rather than physical domains. Attempting to model the climate system at the physical process level has its own challenges, which we attempt to solve in these packages. Initiatives to build frameworks to traverse the hierarchy between highly idealized models to full-scale GCMs (general circulation models) do exist (Fraedrich et al., 2005; Vallis et al., 2018), but we believe the definition of a clear set of classes to represent the physical process level of the model to be unique.
sympl and climt allow writing models which are easy to use and facilitate the reproducibility of simulations. Both these packages are subject to ongoing development but have reached a level of maturity that makes it worthwhile to document them here. In Sect. 2, we present a series of models written using sympl and climt which illustrate the construction of a model hierarchy. In Sect. 3 we describe some challenges that modelling frameworks have to solve (in the context of the above examples when possible) and discuss how sympl and climt address these challenges. We then discuss in more detail the interfaces of sympl (Sect. 5) and climt (Sect. 6). In Sect. 7 we present some benchmark calculations and conclude with a discussion of developments planned in the future.
To illustrate the advantages of the fine-grained control that sympl and climt offer, we consider a series of examples starting with a diagnostic radiative calculation and ending with an idealized three-dimensional atmospheric general circulation model.
Line 1: import the climt package.
Line 4: instantiate a radiative transfer component.
Line 7: create a state dictionary which contains all the quantities required as inputs by the radiative transfer component.
Line 9: calculate the heating rate (available in “tendencies” and any associated diagnostics such as the radiative fluxes (available in “diagnostics”).
In Fig. 1, we have used the default values for
model_state, which corresponds to an isothermal atmosphere.
This example, though seemingly simple, is remarkable because of the ease with which such a diagnostic calculation can be performed. Traditionally, such a calculation would involve writing a Fortran driver, compiling it with the radiative transfer library, writing the output to a file, and then reading the output file into a suitable environment for further analysis. The ease of use illustrated in the above example is a direct result of the fine-grained control that sympl and climt allow – individual components can be configured and run (interactively, if desired) without having to compile them with a driver file. To summarize, components, not models, are first-class entities within the sympl framework.
It is worth examining this example in greater detail, since it highlights
some important features of sympl and climt that we will
look at closely in subsequent sections. The component called
“radiation” is an implementation of the sympl entity called
TendencyComponent, which is a template for components which
calculate tendencies of a quantity (air temperature in this case) based on
quantities in a state dictionary. We will encounter other kinds of components
in subsequent examples.
Figure 2 builds upon the previous example to create a model that includes both radiation and convection, steps the state quantities forward in time, and writes the output to a file. The changes from the previous script are as follows:
Line 7: define the model time step using
datetimelibrary, which is part of any Python distribution.
Line 21: create a time integrator which steps the model state forward in time, using tendencies generated by the radiation and convection components. The integrator chosen is an instance of the sympl
Stepperclass which implements a variety of Adams–Bashforth schemes.
Line 24: create a “monitor” component which writes the model state to a netCDF file.
Lines 27–37: the boundary layer component provides new values of model quantities, which are used to update the model state. The time integrator incorporates the tendencies due to radiation and convection and provides new values for the model state as well. The current model state is updated with diagnostics and written to a netCDF file. The model state is then updated with the new model quantities to prepare for the next iteration.
This example illustrates how to piece together a radiative–convective equilibrium (RCE) model from different kinds of sympl components. This example also moves up the model hierarchy, away from static diagnostic calculations to a model which evolves in time. It is worth noting that the first half of the example which creates components and a model state remains identical to the procedure followed in the previous example. This example shows two notable features of the design of sympl and climt: (i) individual components can step the model state forward themselves; (ii) the model integrator is a separate entity in itself which can be replaced easily (to use more stable integration schemes, for example). These features are in keeping with our goal of capturing the diversity of model components – some of which produce tendencies and others that produce new state quantities – and allowing the user to have fine-grained control over model configuration.
This example also illustrates how sympl's design provides a clear understanding of the model to users. Configuring the model consists of modifying a run script which is meant to be entirely legible to the user. By reading the run script, one can see exactly which model components are being used, how the state is initialized, what configuration options are being passed to which components, what time integration scheme is used on which components, the order in which components are being called, and the point within the integration where the state is being output to a file.
Figure 3 presents the next step in the hierarchy from Fig. 2, creating a moist atmospheric general circulation model (AGCM) in an aqua-planet configuration with a prescribed sea surface temperature. The default surface temperature is horizontally uniform, but we omit prescribing a realistic temperature distribution for purposes of comparison with Fig. 2. The main differences from the previous example are as follows:
Line 14: the boundary layer component is wrapped to output tendencies instead of a new state. This wrapper is required since the spectral dynamical core must apply tendencies to most quantities in spectral space for numerical reasons.
Line 18: the spectral dynamical core is used as the time stepper instead of the Adams–Bashforth scheme used previously.
Lines 22 and 25: since the model now has three dimensions, we require a grid describing the latitudes and longitudes as well. Line 18 creates a model grid, and the default model state created in Line 21 uses this grid to create an appropriate three-dimensional state.
A remarkable fact about this example is that it is only one line longer than
the previous example. Intuitively, this seems appropriate – an AGCM can be
thought of as a collection of radiative–convective columns which communicate
with each other via the dynamical core. However, in most modelling frameworks
the intuitive picture of the transition from an RCE to an AGCM does not
easily translate to code. This plug-and-play behaviour is a direct
consequence of the modularity of individual sympl components and the
fact that all components of a similar kind (
GFSDynamicalCore in this case) implement the same interface, making
it possible to reuse almost the entire model script from the RCE case. In
this way, sympl and climt allow constructing models in such
a way that a change that intuitively seems small also translates to a code
change that is small.
In this following section, we look in detail at the design decisions that allow for the construction of a model hierarchy as described in the three examples in this section.
In this paper we distinguish between modelling frameworks, model toolkits, and models themselves. A framework (such as sympl) consists of abstractions of “infrastructure” code that allow the creation of climate models. A framework creates rules one must follow but by doing so ensures models using the framework are easier to write, understand, and combine with one another. A toolkit (such as climt) implements those abstractions for concrete physical processes and may provide additional functionality that is not covered by the framework. A model itself (such as in Figs. 2 and 3) is written using components that may come from a toolkit which follows the guidelines of the framework.
Modelling frameworks should enable scientists to intuitively combine model components and create an appropriately complex model for the scientific question at hand. The user should also be able to specify details such as the order in which components are called and the time stepping schemes used. Model toolkits should provide a wide variety of components that enable users to write a model appropriate to the question at hand. Toolkits should also maintain a list of quantities and numerical grids that are required by the components it provides to facilitate creating model arrays. It is also desirable that the process of creating a model is fairly easy to understand and that the model code be self-documenting to eliminate the need to write additional documentation whenever possible.
3.1 Component diversity
One of the major aims of sympl and climt is to allow fine-grained control over the processes that constitute an Earth system model. In currently available modelling frameworks, the integration of scientific code (or model components) and the modelling framework happens at the physical domain level – atmosphere, ocean, land, and so on. The processes that operate within each physical domain (fluid dynamics, radiation, convection etc.) are not accessible in a systematic manner, and their code becomes tightly coupled and difficult to modify. For example, changing the radiation code in an atmospheric model is typically harder than changing the kind of ocean the atmosphere is coupled to (slab, dynamical, etc.). Tight coupling of code at the process level can make these models more efficient but at the cost of scientist efficiency, as the code is significantly less flexible and have undocumented and complicated interfaces between components.
Furthermore, the fact that certain process-level components are written to work only with certain other components demands an “architectural unity” (Randall, 1996) which might also encourage tight integration of model components. Within sympl, it may still be the case that two components are incompatible with one another (for example, using different thermodynamic quantities), but because their interfaces are clearly defined, it is easier to couple these components (for example, by converting thermodynamic quantities using an additional component). Other incompatibilities are handled automatically. For example, sympl automatically ensures components which use different units or dimension orderings can be used alongside one another.
If we allow for configuration at the process level, we are then faced with model components which behave quite differently: some components (like radiation) return tendencies, while others (like large-scale condensation) return a new value of a physical quantity. sympl provides a set of component classes that is comprehensive enough to capture the diversity of component behaviours.
An example of a sympl component is depicted in Fig. 4.
tendency_properties are “property” attributes containing details
of the required inputs and returned outputs, tendencies, or diagnostics.
Here, the input is a quantity
air_temperature whose horizontal dimensions are
longitude and whose values in the vertical are defined at model
mid-levels. The units are specified for each quantity. The
array_call method will be called by sympl with
arrays that are automatically extracted from the model state to satisfy the
input property specifications. It is written to return
numpy arrays as output
which satisfy its tendency and diagnostic property specifications. The
__call__ method which is called, for example, in Line 9 of
Fig. 1, is implemented by the base sympl
classes. This method encapsulates the boilerplate code of performing
consistency checks on the model state, extracting
numpy arrays with the
correct units and shape, and creating a new state dictionary from the arrays
Dimension and unit requirements in sympl are not restrictions on the inputs but rather describe the internal representation used by the component. sympl will automatically convert the input state to satisfy these requirements and raise an exception if that is not possible. In this way, the property dictionaries act as self-documenting code, which both documents the component interface and is used to convert input arrays to the desired dimension ordering and units for the component.
TendencyComponent (discussed in Sect. 5),
this component outputs tendencies of a quantity
array_call() method accepts a dictionary containing just the
numpy arrays extracted from the model state with the correct units
and dimensions and returns the temperature tendency as specified in the
Since components in sympl and climt are first-class entities, they are not dependent on any other code for execution – Fig. 1 shows how to interact solely with a radiative component without the need for an integrator or any other entity. This behaviour serves an important educational purpose and facilitates diagnostic calculations made very often during research.
3.2 Model configuration
Together, sympl and climt allow a natural way of configuring various aspects of a climate model listed below:
physical configuration (toolkit agnostic) – the physical constants required by the model components;
algorithmic configuration (toolkit specific) – the “tunable” parameters which modify the behaviour of the algorithm which represent physical processes, for example, the entrainment coefficient in a convective parameterization;
interfacial configuration (toolkit agnostic) – modifications applied to inputs and outputs at the interface of a component, further described below;
memory and computing resource configuration (toolkit specific) – the layout of arrays used by the model and the distribution of the model components over the available number of processors and coprocessors;
compositional configuration (toolkit specific) – the components that compose a model, any dependency between components, and the order in which components are executed all need to be described.
Figure 5 shows types of configuration information and where such a configuration lies in climt/sympl and in traditional climate models. In contrast to sympl, where all configuration passes through a readable run script, the sheer variety of locations where the configuration resides in traditional models makes it hard to keep track of what configuration information has changed and how configuration changes affect model runs. This makes it daunting for beginners to write models. Model configuration using sympl and climt is highly centralized and easily accessible – all configuration passes through the model run script, which is written to be readable and accessible to model users. Such a centralized configuration reduces errors arising due to a misconfiguration of the model.
While most of the configuration elements listed above are familiar,
interfacial configuration is mostly unheard of and usually applied in a
non-systematic manner within climate models. The
TimeDifferencingWrapper used in Line 14 of Fig. 3
is an example of the interfacial configuration of a climate model component.
There are a variety of interfacial modifications which are commonly applied
to components and can be applied in a consistent manner across components.
For instance, model components could
normally return a new value rather than a tendency of some physical quantity, but in certain instances a tendency might be desirable (as in Fig. 3);
provide output that is piecewise constant in time – the output is updated only once every N iterations, and the same value it output for the next N−1 iterations; this behaviour is normally used in radiative transfer codes;
scale some of its inputs or outputs by some floating point number; this kind of behaviour is desirable, for example, in mechanism denial studies.
We can interfacially modify certain behaviours of model components, such as
the ones above, by interacting with only the inputs and outputs of scientific
components. sympl formalizes such a configuration by providing wrapper
Model arrays are contained within the
DataArray abstraction provided
xarray (Hoyer and Hamman, 2017) rather than the more low-level
DataArrays are an abstraction over
numpy arrays with a more natural fit to climate data by providing
labelled dimensions and metadata storage capabilities.
DataArray object is a subclass of the
xarray DataArray that provides unit handling and conversion and
will be described subsequently. This exposes the powerful analysis
xarray, allowing users to build an end-to-end
pipeline entirely within Python, from simulation to data analysis and
the generation of publication-ready figures.
Since low-level array operations using
fairly simple, especially changing coordinate ordering and C/Fortran memory
ordering, climt only provides guidelines for memory layout of
arrays. However, changing the ordering of dimensions in memory will incur the
performance penalty of copying the array if the model calls compiled code.
In the interest of the readability of component and model code, sympl
strongly encourages and climt makes use of descriptive names for model
quantities, adhering to the climate and forecast (CF) conventions when
possible.1 Most pre-defined quantities in climt use
names derived from the CF conventions. One additional suffix that we found
necessary to use was
on_interface_levels to distinguish between
quantities defined on the interfaces and mid-levels of the vertical grid. For
air_temperature refers to the air temperature defined at the
vertical grid centre, whereas
refers to the same quantity defined at the vertical grid edges. This
convention is only a requirement within the model state. Within an individual
component, shorter names can be used for variables representing quantities.
This shorter name is also contained in the property dictionary of the
component, serving as documentation for the meaning of that shorter variable
Python was used as the language to write the framework. Python as a language and the Python ecosystem have a number of desirable features, all of which were taken advantage of during the development of sympl and climt:
Earlier versions of climt were written in Python, which gave the authors an idea of the convenience and flexibility it afforded. In particular, the object-oriented capabilities of Python provide a straightforward way to represent the component-based architecture adopted by almost all climate modelling frameworks.
Scientific libraries within the Python ecosystem now offer acceptable performance for computationally intensive operations typically used in climate models.
The Python ecosystem includes many libraries which can be useful in developing climate models. Examples include machine learning, graphics, and web service libraries.
Python's ability to act as a glue language allows interfacing with the large number of libraries for climate modelling already available in Fortran.
Tools available in the Python ecosystem like
sphinxenable writing reproducible workflows and code that is well documented and tested.
sympl conceives of a climate model as a state that is continuously updated by various components. sympl's taxonomy consists of seven kinds of components. Four of these component types are used to represent physical processes and the remaining represent other functionality required to build and run models.
RRTMGLongwavein Fig. 1, which take the model state as input and return tendencies of quantities and values of quantities defined at the time of the input state.
SimplePhysicsin Fig. 2, which take the model state and a timestep as input and return values of quantities defined at a new time (after the timestep) and optionally diagnostic values of quantities defined at the time of the input state. These are implicit as they define the target model state in terms of the target model state (e.g. that the target state is not supersaturated).
DiagnosticComponentobjects which take the model state as input and return quantities defined at the time of the input state as output.
EmanuelConvectionin Fig. 2, which return tendencies but require the model timestep to produce these tendencies. This is required when the tendencies are defined in terms of the target model state, as is often done in convection schemes or flux limiters. These should generally be written as
Steppercomponents which can later be wrapped into
ImplicitTendencyComponentobjects if needed (such as with
SimplePhysicsin Line 14 of Fig. 3).
AdamsBashforthin Fig. 2, which contain a set of
TendencyComponentobjects and use the tendencies they output to integrate the model state forward in time.
NetCDFMonitorin Fig. 2, which provide a
storemethod which takes the model state as input and “stores” it. The implementation of this method is left to the user and is currently used for NetCDF output and plotting.
Wrapper components like
TimeDifferencingWrapperin Fig. 3, which contain other sympl components and modify the inputs passed to or outputs generated by the “wrapped” component.
Schematics of how the above components interact with the model state are
presented in Fig. 6. A
object (panel a) is very simple, producing diagnostic quantities that are
inserted or updated in the current model state.
(panel b) take in the model state and perform some action using it. The
Stepper object (panel c), which steps the
model state forward in time, is slightly
more complicated. In addition to producing a new state, it can
produce diagnostic quantities which are inserted or updated in the current
model state. Panel (d) is the most complex, depicting how
TendencyComponent objects are used with a
to update the model state. Once created, a
exactly like a
Stepper object. Internally, it provides the input
state to the
TendencyComponent objects to compute tendencies and
uses those tendencies according to its time stepping scheme to evolve the
model state forward in time. The
TendencyStepper provides the same
model state to all
TendencyComponent objects it contains and sums
the tendencies before stepping forward in time (see
Fig. 6d). Using other time marching algorithms such
as sequential tendency or sequential update splitting
(Donahue and Caldwell, 2018) will require users to implement their own
TendencyStepper object or to call several
TendencyStepper components in sequence.
As mentioned previously, wrapper components contain a wrapped component and modify the inputs or outputs, changing how the component appears to behave. Currently, sympl has the following wrappers:
TimeDifferencingWrappercreates tendencies from the output of a
Steppercomponent by first-order differencing. This creates an
Steppercomponent, which is required when using spectral methods.
UpdateFrequencyWrappercalls a wrapped
TendencyComponentonly after the user-specified time interval has elapsed and until then outputs the previously returned value. In effect, this creates a piecewise constant output tendency which can reduce the computational load during a simulation. This is often used on radiation schemes.
ScalingWrapperscales the inputs passed into the wrapped component and the outputs (new state, tendency, or diagnostic) returned by the component.
This taxonomy of components is larger than those typically used in modelling frameworks. For example, ESMF only considers two kinds of components – Gridded and Coupler components. However, as discussed previously, this extended taxonomy is required to capture the diversity of components that arise if models are written to be configurable and modular at the process level.
5.1 Model state and the
The model state is a dictionary whose keys are the names of model quantities,
and values are sympl
DataArray objects. The model state also contains a
time whose value is an object that implements the Python
timedelta interface. sympl provides an interface to use the
datetime objects from the
cftime package to support several different
calendars,2 as well as dates not supported by the
numpy datetime64 or
datetime objects. A schematic of the model state is presented in
Fig. 7. sympl does not put hard restrictions on the
name of model quantities, though standardized names should be used to ensure
inter-package compatibility. All
DataArray objects must define a string
attribute called “units”, which is used to convert the data contained
within to the appropriate units requested by a component. The units
conversion is performed internally using the
library.3 Since the actual contents of the state are dependent on
model details, sympl assumes that the initialization of the model state
will be done by a model package (such as climt) or by the user.
5.2 Physical constants
sympl maintains a unit-aware library of constants which can be accessed or
modified by model packages and by the user through
set_constant() functions. For example,
planetary_rotation_rate can be
changed with a single function call. The unit handling is important to ensure
constants are given to components in the units they each require. For
example, the Rapid Radiative Transfer Model (RRTMG) radiative transfer code (Clough et al., 2005)
requires physical constants in centimetre–gramme–second (CGS) units.
For the purposes of logging, physical constants are classified into various categories:
planetary constants such as rotation rate and acceleration due to gravity;
physical constants such as the speed of light;
atmospheric constants such as specific heat of dry air and reference air pressure;
stellar constants such as stellar irradiance;
condensible constants which refer to the thermodynamic properties of the condensible (in all three phases) in the atmosphere;
oceanographic constants such as the reference sea water density.
We chose to keep the constants related to the condensible component of the
atmosphere separate to ensure sympl is flexible enough to handle
general planetary atmospheres. sympl provides a function
set_condensible(), which allows switching all constants related to
the condensible. For example
replace all condensible constants (such as density of liquid, solid, or gaseous
phases and the latent heat of condensation) to those corresponding to methane,
provided such constants are already in the constants dictionary. The default
condensible is water, which is currently the only condensible compound for
which default values are given by sympl.
5.3 Modelling using sympl
Initialize model components, providing configuration information.
Wrappercomponents to modify the behaviour of any components if necessary.
Initialize model state which contains all quantities required by the selected components.
TendencyStepperto collect all
TendencyComponentcomponents into a component that can step the model state forward in time.
Begin the model main loop.
DiagnosticComponentto compute any derived quantities from prognostic quantities or provide forcing quantities at a given time step.
Steppercomponents and get a new state dictionary with the updated model quantities and any diagnostics. Update the initial model state with diagnostics.
TendencyStepperand get a new state dictionary with the updated model quantities and any diagnostics. Update the initial model state with diagnostics.
Monitorcomponents to store the initial model state (e.g. store to disk, display in real time, send over the network).
Increment model time and repeat the model main loop.
6.1 Model state, quantity dimensions, and output dictionaries
For initialization, climt provides the functions
creates quantities that define the grid such as latitude, longitude, and air
get_default_state() accepts a list of components and
optionally a state with grid quantities and creates a state dictionary which
satisfies the input requirements for those components. Default values of each
model quantity are defined centrally in climt. The default values
provided are scientifically meaningful and can be used without modification
for certain simulations.
6.2 Model composition
Currently, the creation of the model and running the simulation loop is done
by hand, which provides a better understanding of what the model is doing but
increases the verbosity of model code. In the near future, climt
will provide an additional class called
Federation which automates
the process of creating a model from its components.
would not require that the user know the difference between a
Stepper (for instance) and their
different call signatures or that
TendencyComponents require a
TendencyStepper to step the model state forward in time. This makes
creating models, easy especially for those who are not familiar with climate
modelling. The tradeoff is that the run script will not explicitly describe
the sequence of the main loop because that information is hidden within the
Federation code, but this can be desirable for certain applications,
particularly in education. As mentioned before, this automation is possible
only because of the rich taxonomy of components sympl provides.
6.3 Features and software engineering
climt currently has the following components that can be used to build models.4
RRTMG longwave and shortwave radiative transfer (Clough et al., 2005): this is a Fortran component accessed via a Cython wrapper. RRTMG is a state-of-the-art radiative transfer code used in many climate models.
Grey gas radiation scheme: simulates radiative transfer in a grey gas. This component is accompanied by another component that provides an optical depth distribution which mimics the effect of water vapour (Frierson et al., 2006). These components are written in pure Python. This radiative scheme has been used in many idealized climate dynamics simulations to isolate the thermodynamic effects of latent heat release from the radiative effects of water vapour, which is a strong greenhouse gas.
Insolation: this component is written in pure Python. It provides the solar zenith angle based on the time available in the model state. This zenith angle is used in radiative transfer codes. Currently, this component uses approximations and orbital parameters which make it highly accurate for Earth but inapplicable to other planets.
Emanuel convection scheme (Emanuel and Zivkovic-Rothman, 1999): this is a Fortran component accessed via a Cython wrapper. It is a mass-flux-based convection scheme which is based on the boundary layer quasi-equilibrium hypothesis (Raymond, 1995).
Grid scale condensation: this is written in pure Python. It calculates the water vapour and temperature fields in the atmosphere after condensing out excess water vapour to keep the atmospheric column from becoming supersaturated.
Spectral dynamical core: this component is derived from the General Forecast System (https://github.com/jswhit/gfs-dycore, last access: 20 August 2018). It is a Fortran module accessed via a Cython wrapper. It uses a high-performance spherical harmonics library
shtns(https://bitbucket.org/nschaeff/shtns, last access: 20 August 2018). It is parallelized using OpenMP (Dagum and Menon, 1998) and therefore is most effective on shared memory systems. The dynamics are stepped using an implicit–explicit total variation diminishing Runge–Kutta 3 time stepper. The physics tendencies are stepped forward using a forward Euler scheme.
Simple Physics package for idealized simulations (Reed and Jablonowski, 2012): this is a Fortran module accessed via a Cython wrapper. It provides initial conditions which can be used for testing moist dynamical cores and also provides a simple diffusive boundary layer suitable for idealized simulations.
Slab surface: this component is written in pure Python. It allows for a prognostic surface temperature by calculating the surface energy budget. It is flexible enough to represent land or ocean. It currently does not account for localized heat fluxes.
Sea/land ice model: this allows for snow and ice layers and energy-balanced top and bottom surfaces. This component is written in pure Python. It is flexible enough to represent ice/snow growth and melting. It is capable of representing sea or land ice based on the surface type available in the model state. It currently cannot handle fractional land surface types.
Held–Suarez forcing (Held and Suarez, 1994): this component is written in pure Python. It provides an idealized set of model physics which can be used for testing dry dynamical cores and idealized simulations.
Initial conditions from the dynamical core Model Intercomparison Project (MIP) (DCMIP) (https://www.earthsystemcog.org/projects/dcmip/, last access: 20 August 2018): this is a Fortran module accessed via a Cython wrapper. It provides initial conditions for a wide variety of tests which allow assessing the conservation properties of dynamical cores.
This set of components allows building a hierarchy of models ranging from single-column radiative–convective models to energy-balanced moist atmospheric general circulation models. Because of the fine-grained configurability of sympl/climt, the difference between the number of lines of code required to build a single-column model and a moist GCM is only around 10 lines of Python code (see scripts in the Supplement). More importantly, most of the code is reusable when moving from a simpler to a more complex model.
Both sympl and climt are open-source projects, licensed under a permissive BSD license. Both packages are available on Mac, Linux, and Windows platforms and can be directly installed from the Python Package Index using one-line commands:
pip install sympl pip install climt.
This eliminates the need to download source code from GitHub. The Python Package Index projects are located at https://pypi.python.org/pypi/sympl (last access: 20 August 2018) and https://pypi.python.org/pypi/climt (last access: 20 August 2018) respectively.
climt also provides binary releases on all supported platforms, eliminating the need to have a compiler on the user's system. sympl is written in pure Python and does not have any compiler requirements. Both packages are regression tested using the online services TravisCI (https://travis-ci.org/, last access: 20 August 2018) and AppVeyor (https://www.appveyor.com/, last access: 20 August 2018). Both packages also maintain regularly updated documentation at http://sympl.readthedocs.io/en/latest/ (last access: 20 August 2018) and http://climt.readthedocs.io/en/latest/ (last access: 20 August 2018).
The first simulation is that of an atmospheric column that is run to equilibrium in the presence of radiation and convection. This model uses the RRTMG longwave and shortwave components, the Emanuel convection scheme, the Simple Physics component as its boundary layer scheme, and a slab ocean of thickness 50 m. The model timestep is 5 min and the results presented in Fig. 8 are the mean between 15 000 and 20 000 timesteps. The air temperature transitions from a dry adiabat in the boundary layer to a moist adiabat in the free atmosphere until the tropopause. The diabatic heating balance changes from a balance between radiation and convection in the troposphere of the model to a pure radiative equilibrium in the stratosphere.
The second simulation is of an idealized aqua-planet GCM with fixed equinoctial insolation. As mentioned before, the modular nature of our framework allows the reuse of much of the runscript code from the above single-column model. It consists of all the components used in the previous model along with a dynamical core which is used as the time stepper. The model was run for 2 years and the results presented are the mean over the last 6 months. The simulated climate of the model is as expected from such a configuration: the zonal mean zonal winds show two strong westerly jets which penetrate to the surface. The zonal mean temperature shows a distinct tropical cold point and an increase in the temperature above the tropopause. The zonal mean convective heating rate shows deep heating in the tropics and much shallower heating in the subtropical areas dominated by the descent of air. This simulation ran at a resolution of 128 longitudes, 62 latitudes (or T42 resolution), and 28 levels.
sympl and climt represent a novel approach to climate modelling which provides the user with fine-grained control over the configuration of the model. sympl provides a rich set of entities which describe all functionality typically expected of a climate model. This set of entities (or classes) allows climt to be an easy to use climate modelling toolkit by allowing decisions about model creation and configuration to be made at a single location (the run script) and without ambiguity. The modular nature of the packages allows for code reuse as one traverses the hierarchy of models from single-column model to three-dimensional GCMs. We attempt to address concerns about plug-and-play type architectures (Randall, 1996) by ensuring the inputs and outputs of each model are cleanly documented, which makes it clear whether components are compatible or not. The use of Python allows for delegating computationally intensive code to compiled languages while still providing an intuitive and clean interface to the user. This choice also allows users access to a large variety of libraries written in Python for purposes ranging from machine learning to visualization (Alpire, 2017).
The main focus in the near future would be to add more components, especially a cloud microphysics scheme, to allow sympl/climt to simulate a more realistic benchmark climate. Due to its flexibility, we believe our modelling framework is well suited to the simulation of general planetary atmospheres and for exoplanet modelling, and adding components relevant to these fields will also be a priority. Another important component to add would be a flexible grid interpolation component to allow interaction between components based on different model grids. While care has been taken to ensure that parallel computing is possible, we have yet to address the question of distributed memory and computing. While building models in a simple Message Passing Interface (MPI) scenario seems feasible in the near future, more sophisticated configurations with components running in parallel will need some thought and design.
Nevertheless, sympl and climt represent an important step towards creating flexible, usable, and readable models. We hope that they will be a useful addition to the growing collection of Python-based tools available to the climate science community.
sympl is available at https://github.com/mcgibbon/sympl (last access: 20 August 2018). The digital object identifier (DOI) for the version documented in this paper is https://zenodo.org/record/1346405 (McGibbon et al., 2018).
climt is available at https://github.com/CliMT/climt (last access: 20 August 2018). The digital object identifier (DOI) for the version documented in this paper is https://zenodo.org/record/1400103 (Monteiro et al., 2018).
The supplement related to this article is available online at: https://doi.org/10.5194/gmd-11-3781-2018-supplement.
All three authors were involved in the design of sympl and climt. JMM and JM were involved in writing code for the packages. RC was the main author of earlier versions of climt. All three authors contributed to the writing of the paper.
The authors declare that they have no conflict of interest.
Joy Merwin Monteiro and Rodrigo Caballero acknowledge a research grant from
the Swedish e-Science Research Center (SeRC). Jeremy McGibbon was funded by
DOE grant DE-SC0016433 as a contribution to the CMDV (CM)4 project and by
the Natural Sciences and Engineering Research Council of Canada (NSERC)
Postgraduate Scholarship Doctoral Program (PGS-D).
The article processing charges for this open-access
publication were covered by Stockholm University.
Edited by: Carlos Sierra
Reviewed by: Spencer Hill and one anonymous referee
Alpire, A.: Predicting Solar Radiation using a Deep Neural Network, available at: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215715 (last access: 20 August 2018), 2017. a
Clough, S. A., Shephard, M. W., Mlawer, E. J., Delamere, J. S., Iacono, M. J., Cady-Pereira, K., Boukabara, S., and Brown, P. D.: Atmospheric radiative transfer modeling: a summary of the AER codes, J. Quant. Spectrosc. Ra., 91, 233–244, https://doi.org/10.1016/j.jqsrt.2004.05.058, 2005. a, b, c
Dagum, L. and Menon, R.: OpenMP: an industry standard API for shared-memory programming, IEEE Comput. Sci. Eng., 5, 46–55, 1998. a
DeLuca, C., Theurich, G., and Balaji, V.: The Earth System Modeling Framework, in: Earth System Modelling, vol. 3, SpringerBriefs in Earth System Sciences, Springer, Berlin, Heidelberg, 43–54, https://doi.org/10.1007/978-3-642-23360-9_6, 2012. a
Emanuel, K. A. and Zivkovic-Rothman, M.: Development and Evaluation of a Convection Scheme for Use in Climate Models, J. Atmos. Sci., 56, 1766–1782, https://doi.org/10.1175/1520-0469(1999)056<1766:DAEOAC>2.0.CO;2, 1999. a
Frierson, D. M. W., Held, I. M., and Zurita-Gotor, P.: A Gray-Radiation Aquaplanet Moist GCM. Part I: Static Stability and Eddy Scale, J. Atmos. Sci., 63, 2548–2566, https://doi.org/10.1175/JAS3753.1, 2006. a
Held, I. M. and Suarez, M. J.: A Proposal for the Intercomparison of the Dynamical Cores of Atmospheric General Circulation Models, B. Am. Meteorol. Soc., 75, 1825–1830, https://doi.org/10.1175/1520-0477(1994)075<1825:APFTIO>2.0.CO;2, 1994. a
Randall, D. A.: A University Perspective on Global Climate Modeling, B. Am. Meteorol. Soc., 77, 2685–2690, https://doi.org/10.1175/1520-0477(1996)077<2685:AUPOGC>2.0.CO;2, 1996. a, b
Raymond, D. J.: Regulation of Moist Convection over the West Pacific Warm Pool, J. Atmos. Sci., 52, 3945–3959, https://doi.org/10.1175/1520-0469(1995)052<3945:ROMCOT>2.0.CO;2, 1995. a
Reed, K. A. and Jablonowski, C.: Idealized tropical cyclone simulations of intermediate complexity: a test case for AGCMs, J. Adv. Model. Earth Sy., 4, M04001, https://doi.org/10.1029/2011MS000099, 2012. a
Theurich, G., DeLuca, C., Campbell, T., Liu, F., Saint, K., Vertenstein, M., Chen, J., Oehmke, R., Doyle, J., Whitcomb, T., Wallcraft, A., Iredell, M., Black, T., Da Silva, A. M., Clune, T., Ferraro, R., Li, P., Kelley, M., Aleinov, I., Balaji, V., Zadeh, N., Jacob, R., Kirtman, B., Giraldo, F., McCarren, D., Sandgathe, S., Peckham, S., and Dunlap, R.: The Earth System Prediction Suite: Toward a Coordinated U.S. Modeling Capability, B. Am. Meteorol. Soc., 97, 1229–1247, https://doi.org/10.1175/BAMS-D-14-00164.1, 2015. a, b
Vallis, G. K., Colyer, G., Geen, R., Gerber, E., Jucker, M., Maher, P., Paterson, A., Pietschnig, M., Penn, J., and Thomson, S. I.: Isca, v1.0: a framework for the global modelling of the atmospheres of Earth and other planets at varying levels of complexity, Geosci. Model Dev., 11, 843–859, https://doi.org/10.5194/gmd-11-843-2018, 2018. a
http://cfconventions.org/Data/cf-standard-names/48/build/cf-standard-name-table.html (last access: 20 August 2018).
All components written in pure Python were written by the authors for use in climt.