This paper describes the rpe (reduced-precision emulator) library which has the capability to emulate the use of arbitrary reduced floating-point precision within large numerical models written in Fortran. The rpe software allows model developers to test how reduced floating-point precision affects the result of their simulations without having to make extensive code changes or port the model onto specialized hardware. The software can be used to identify parts of a program that are problematic for numerical precision and to guide changes to the program to allow a stronger reduction in precision.

The development of rpe was motivated by the strong demand for more computing power. If numerical precision can be reduced for an application under consideration while still achieving results of acceptable quality, computational cost can be reduced, since a reduction in numerical precision may allow an increase in performance or a reduction in power consumption. For simulations with weather and climate models, savings due to a reduction in precision could be reinvested to allow model simulations at higher spatial resolution or complexity, or to increase the number of ensemble members to improve predictions. rpe was developed with a particular focus on the community of weather and climate modelling, but the software could be used with numerical simulations from other domains.

Users of high-performance computing (HPC) have historically enjoyed a steady increase in computing power according to Moore's law, allowing the complexity of HPC applications to increase steadily over time. In recent years, advances in supercomputing power have not come from increasingly fast processors but from increasing the number of individual processors in a system. This has led to HPC applications being redesigned so that they scale well when run on many thousands of processors, and effort will be required to make sure models are efficient on the massively parallel supercomputing architectures of the future. However, even for massively parallel supercomputers, a continuation of the exponential growth of computational power that has been observed in the past appears to be out of reach with current technologies, primarily since power consumption scales linearly with the number of processors such that power demand of future supercomputers will become excessive.

One approach to reducing the energy requirements of the next-generation
supercomputer is to use so-called approximate computing or inexact hardware.
This is hardware that makes some trade-off between accuracy and/or
reproducibility, and speed and/or efficiency. A number of studies have shown
the promising nature of such hardware implementations

The recognized standard for floating-point arithmetic is IEEE 754
floating point

Whilst aiming to understand the performance of complex numerical models on
new hardware architectures with inexact processing capabilities it is useful
to be able to experiment with different precision levels, including those
that are not supported natively by currently available hardware. Programmable
hardware could be used to study a reduction in precision for an application
of interest

Tools for understanding rounding error in numerical simulations have been
developed already. For example, the CADNA library

A variety of tools also exist for automatically tuning floating-point
precision, and producing mixed-precision versions of double-precision
programs. Tools such as SAM

This paper introduces the rpe (reduced-precision emulator) software

Weather and climate models are becoming increasingly important in society.
Even though weather and climate models are often run on some of the fastest
computing hardware available, this is still not enough computational power to
allow them to run at small scales that are believed to be important, for
example, for resolving individual convective systems

Section

In this section, we introduce the rpe library and explain how to use the emulator, give details on how the emulator truncates floating-point numbers, and explain how precision will propagate through a simulation. Details of the software architecture and possible pitfalls and limitations of the emulator are also discussed.

The rpe software is implemented as a Fortran library. This choice of
language is due to the prevalence of Fortran in geophysical modelling. The
emulator is implemented as a Fortran module that can be included in existing
projects via the standard Fortran 90 module mechanism. The module defines a
new derived type named

The emulator's core functionality comes from overloading the assignment
(

Let us consider the program listed in Fig.

Operations performed by the reduced-precision program
(Fig.

The notation

A simple numerical program using

The reduced-precision program (Fig.

Operations performed by the mixed-precision program (Fig.

This example demonstrates a more complex situation where multiple precisions
are used, resulting in each multiplication casting to the precision of its
most precise input (individual precision values are assigned using the

A simple numerical program utilizing emulated mixed-precision arithmetic.

Examples of different rounding scenarios when truncating 32 bit
floating-point numbers to 10 significand bits.

Floating-point numbers consist of three parts: a sign bit that determines the
sign of the number, an exponent that determines the magnitude of the number,
and a significand that determines the digit precision of the number. The
approach to reduced floating-point precision taken in this paper is a
truncation of the significand. This truncation removes less significant bits
from the floating-point significand by setting them to 0, resulting in
reduced precision. A naïve truncation that simply sets the least
significant bits of the significand to 0 would introduce a bias, since such a
truncation is equivalent to always rounding toward 0 to the nearest
representation with the desired number of remaining significand bits. We have
encountered situations where such biases can have a significant impact on
conservation properties when running simulations of computational fluid
dynamics. Therefore, it is necessary to apply rounding as well as truncation.
The method used for reduction of the precision in rpe emulates IEEE 754 rounding,
where numbers are rounded to the nearest representation with the given number
of bits, with the additional requirement that numbers exactly halfway between
two representations are rounded to the representation with a 0 in the least
significant bit. This scheme is unbiased and thus avoids introducing bias
errors as well as truncation errors. Figure

An

When an operation is performed on an

Because compound expressions are evaluated as separate intermediate terms,
care must be taken to ensure computations are expressed at the required
precision. For example, consider the computation

It is possible to change precision at runtime, either by changing the
module-level variable

The result of truncating double-precision values to 23 significand bits (the
number used by IEEE single precision) with the emulator is the same as
casting from double to single precision using built-in types. The emulator
internally uses a 64 bit

The emulator software consists of a relatively small hand-coded core. This
core implements the

The software is provided with a suite of tests to verify its functionality. These tests are used for development and can be run by the end user to verify their build of the software is functioning as expected. The tests are divided into two categories: unit tests and integration tests. The unit tests test the functionality of individual components of the emulator, isolated from other components. The integration tests verify the performance of the emulator by running a simple Lorenz attractor model. These tests also serve as a useful example of how to use the emulator.

In this section, we present a precision analysis for a shallow-water model, a
typical medium-sized model set-up that is often studied in the development of
atmosphere and ocean models, in a Munk double-gyre configuration. The model
under investigation is a standard approach to implement the shallow-water
equations on an Arakawa C grid

The prognostic parameters in the shallow-water model follow the equations:

Snapshot of the surface height field for simulations of the shallow-water
model.

Time-mean surface height field for simulations of the shallow-water
model calculated over 15 million time steps.

Since a double-precision experiment acts as a reference, we are aiming to reproduce this reference as closely as possible using reduced precision. A first-order estimation of the effect of reduced precision is achieved by running a single-precision simulation using native 32 bit representation (although a version emulating 23 bit precision has also been run and is of similar quality). Two runs with reduced precision are produced using rpe: one with 15 bit of precision in the floating-point significand and one with 10 bit (the number of significand bits in an IEEE half-precision float).

Each simulation is spun-up for 5 million time steps to reach equilibrium,
then integrated for a further 15 million time steps.
Figure

Time stepping of a prognostic variable with mixed precision using a simple forward-in-time scheme.

Figure

The shallow-water model provides a good context to discuss the effect of
rounding on conservation properties. The surface height integrated over the
model domain is conserved in the shallow-water equations. However, in a
numerical approximation, we expect that there will be some error introduced
due to truncation. For the reference double-precision simulation, the mean
surface height of the fluid changes by

A tool for running numerical simulations at emulated reduced floating-point precision has been described and documented. The efficacy of this tool has been demonstrated on a simplified geophysical model, and in particular the value of using rpe to co-design a model for lower precision is demonstrated.

There are a few notable limitations of the emulator which may affect how it can be applied. The lack of support for emulating different exponent sizes may be relevant to certain applications. In principle, it would be possible to extend the software to support emulation of arbitrary exponent sizes below the 11 bit significand of IEEE double precision. This could be done by enforcing a check on the size of each truncated value as it is produced by the truncation procedure. However, these checks will create additional computational overheads.

It is also worth noting that the solution obtained with the emulator is
unlikely to be bit reproducible compared to a solution obtained using native
hardware support for a given significand size. There are several factors
contributing to this, such as the treatment of intrinsics as atomic
operations when they may actually be composed of multiple floating-point
operations and the ability of compilers to optimize code using the emulator.
The emulator operates at the Fortran source level, before compiler
optimizations, and therefore code compiled with the emulator is likely to be
optimized differently than code not using native reduced precision. For
example, the compiler may take advantage of a fused multiply–add (FMA)
operation that performs a multiplication and an addition operation in one step with a
single rounding at the end, as opposed to rounding an intermediate product to
a given number of significand digits before computing the addition and
rounding the final result. Such an optimization is unlikely to be available
when using the

Using the emulator for reduced-precision simulations will introduce a
significant overhead for all floating-point operations and will result in
poorer performance. While the emulator is designed to be used to study a
possible precision reduction to allow simulations that run faster on real
hardware, it is important to realize that the focus of the software is to be
able to verify the output of a model at reduced precision rather than to
emulate the actual efficiency gains one might expect from reduced precision.
The overhead introduced by the emulator may vary depending on its
application. In our experience, the overhead of the library in terms of
execution time is at least a factor of 10. Tests on the simple chaotic
dynamical system of

The rpe software can, in principle, be used as a basis to emulate
other hardware set-ups, such as hardware that shows bit flips occasionally.
For most hardware set-ups, a simple replacement of the function that is
actually realizing the bit truncation will be sufficient. It can also be
anticipated that it is possible to change the emulator to mimic the behaviour
of specific error patterns of specific reduced-precision hardware

The rpe software is freely available under the
Apache 2.0 license and may be accessed at

A. Dawson and P. Düben developed the rpe software and prepared the manuscript. P. Düben developed and carried out the shallow-water model experiments.

The authors thank Mike Giles for the useful discussion of IEEE rounding and code review, and Tim Palmer and Sam Hatfield for the useful feedback and motivation. The authors received funding from ERC grant 291406 “Towards the Prototype Probabilistic Earth-System Model for Climate Prediction”. Edited by: S. Easterbrook Reviewed by: two anonymous referees