<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" dtd-version="3.0">
  <front>
    <journal-meta>
<journal-id journal-id-type="publisher">GMD</journal-id>
<journal-title-group>
<journal-title>Geoscientific Model Development</journal-title>
<abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">1991-9603</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>

    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-10-3297-2017</article-id><title-group><article-title>Development and performance of a new version of the OASIS coupler,
OASIS3-MCT_3.0</article-title>
      </title-group><?xmltex \runningtitle{Development and performance of a new version of the OASIS coupler}?><?xmltex \runningauthor{A.~Craig et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Craig</surname><given-names>Anthony</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Valcke</surname><given-names>Sophie</given-names></name>
          <email>valcke@cerfacs.fr</email>
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Coquart</surname><given-names>Laure</given-names></name>
          
        </contrib>
        <aff id="aff1"><institution>CECI, Université de Toulouse, CNRS, CERFACS, 42 Av. G. Coriolis,
31057 Toulouse Cedex 01, France</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Sophie Valcke (valcke@cerfacs.fr)</corresp></author-notes><pub-date><day>8</day><month>September</month><year>2017</year></pub-date>
      
      <volume>10</volume>
      <issue>9</issue>
      <fpage>3297</fpage><lpage>3308</lpage>
      <history>
        <date date-type="received"><day>13</day><month>March</month><year>2017</year></date>
           <date date-type="rev-request"><day>7</day><month>April</month><year>2017</year></date>
           <date date-type="rev-recd"><day>2</day><month>July</month><year>2017</year></date>
           <date date-type="accepted"><day>24</day><month>July</month><year>2017</year></date>
      </history>
      <permissions>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/3.0/">https://creativecommons.org/licenses/by/3.0/</ext-link></license-p>
</license>
</permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017.html">This article is available from https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017.html</self-uri>
<self-uri xlink:href="https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017.pdf</self-uri>


      <abstract>
    <p>OASIS is coupling software developed primarily for use in the
climate community. It provides the ability to couple different
models<fn id="Ch1.Footn1"><p>Within the text, we use “model” in the sense of a
“numerical model”.</p></fn> with low implementation and performance overhead.
OASIS3-MCT is the latest version of OASIS. It includes several improvements
compared to <?xmltex \hack{\mbox\bgroup}?>OASIS3<?xmltex \hack{\egroup}?>, including elimination of a separate hub coupler process,
parallelization of the coupling communication and run-time grid
interpolation, and the ability to easily reuse mapping weight files.
OASIS3-MCT_3.0 is the latest release and includes the ability to couple
between components running sequentially on the same set of tasks as well as
to couple within a single component between different grids or decompositions
such as physics, dynamics, and I/O. OASIS3-MCT has been tested with different
configurations on up to 32 000 processes, with components running on
high-resolution grids with up to 1.5 million grid cells, and with over
10 000 <?xmltex \hack{\mbox\bgroup}?>2-D<?xmltex \hack{\egroup}?> coupling fields. Several new features will be available in
OASIS3-MCT_4.0, and some of those are also described.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <title>Introduction</title>
      <p>OASIS is coupling software developed primarily for the climate community.
OASIS was originally an abbreviation for “Ocean Atmosphere Sea Ice Soil”,
but the capabilities provided by OASIS are not restricted to just those kinds
of models, so the name OASIS now represents a project to develop general
coupling software. It is in relatively wide use, especially in European-based
modeling efforts. It is one of a number of coupling infrastructure packages
(Valcke et al., 2016) that are focused on standard and reusable methods to
support coupling requirements like interpolation and communication of data
between different models and different grids. <?xmltex \hack{\mbox\bgroup}?>OASIS<?xmltex \hack{\egroup}?> is maintained and managed
by the Centre Européen de Recherche et de Formation Avancée en Calcul
Scientifique (CERFACS) and the Centre National de la Recherche Scientifique
(CNRS) in France. It is a portable set of Fortran 77, Fortran 90, and C
routines. Low intrusiveness, portability, and flexibility are key OASIS
design concepts. The current version of the software, OASIS3-MCT, is a
coupling library that is compiled and linked to the component models. Its
primary purpose is to interpolate and exchange the coupling fields between or
within components to form a coupled system. OASIS3-MCT supports coupling of
fields on grid types commonly used in climate science via a put–get
approach, which means components make subroutine calls to send (put) or
receive (get) data from within the component code directly. A separate
top-level driver to control system sequencing is not required to use
OASIS3-MCT, but a handful of subroutine calls must be added to the code to
initialize the coupling, define grids, define decompositions (partitions),
define coupling fields, and put and get variables between components.
OASIS3-MCT leverages a text input file called the <italic>namcouple</italic> file to
configure the interactions between components. Mapping (also known as
remapping, regridding, or interpolation), time transformations, and the
ability to read or write coupling data from disk are supported in OASIS3-MCT.</p>
      <p>OASIS development began in 1991 and the first version, OASIS1, was used 2
years later in a 10-year coupled integration of the tropical Pacific (Terray
et al., 1995). In the intervening decades, OASIS2 and OASIS3 were released.
The history of OASIS development is well documented (Valcke, 2013). With
OASIS3, the coupled models always had to run concurrently as separate
executables on different MPI tasks and all coupling fields passed through a
separate central hub coupler component that also ran concurrently. OASIS3
allowed parallel coupling of parallel models on a per-field basis by
gathering each parallel field in the source model to a single process on the
hub where operations such as mapping and time averaging were executed, and
the field was then scattered to the destination model. OASIS3 generated
mapping weights on a single process at initialization using the SCRIP library
(Jones, 1999) from the grid information specified by the component models.</p>
      <p>A first attempt to design and develop a fully parallel coupler was started in
the framework of the EU FP5 PRISM and FP7 IS-ENES1 projects (see
<uri>https://is.enes.org</uri>), and that led to the development of OASIS4 (Redler
et al., 2010). In particular, OASIS4 included a library that performed a
parallel calculation for generation of the mapping weights and addresses
needed for the interpolation of the coupling fields. This version had several
other features such as the use of an xml file for specifying the
configuration information. OASIS4 was used by Météo-France, ECMWF,
KNMI, and MPI-M in the framework of the EU GEMS project for <?xmltex \hack{\mbox\bgroup}?>3-D<?xmltex \hack{\egroup}?> coupling
between atmospheric dynamic and atmospheric chemistry models (Hollingsworth
et al., 2008); it was also used by SMHI, AWI, and the BoM in Australia for
ocean–atmosphere 2-D regional and global coupling. But OASIS4 had limited
success and its development was stopped in 2011 after a performance analysis
determined some fundamental weaknesses in its design, in particular with
respect to the support of unstructured grids.</p>
      <p>With OASIS3-MCT, a different approach was taken to improve the parallel
performance and to address new requirements. It extends the widely used and
distributed OASIS3 version of the model. This paper describes the development
of OASIS3-MCT from OASIS3 to the current 3.0 release and also introduces some
new features expected in the next 4.0 release. The initial requirements of
OASIS3-MCT were to improve the parallel performance of the coupling,
implement an ability to read in mapping weights to mitigate the cost of
weight generation, support next-generation grids such as high-resolution
unstructured grids running on high processor counts, and to add those
features while retaining the basic OASIS3 application programming interfaces
(APIs) and <italic>namcouple</italic> file to support backwards compatibility.</p>
      <p>To meet these requirements, a number of changes were made. First, a portion
of the underlying communication implementation was replaced with the Model
Coupling Toolkit (MCT) software package (Larson et al., 2005) developed by
the Argonne National Laboratory. This implementation is transparent to the
user, as MCT methods and data types are only used within the OASIS3-MCT
infrastructure to support parallel mapping and parallel redistribution.
Second, the ability to specify pre-defined mapping files was added. Mapping
files can now be generated offline using a diverse set of packages, such as
SCRIP, ESMF (Theurich et al., 2016), or any locally developed methods. Third,
the OASIS3 hub coupler was deprecated and is no longer needed or implemented.
Transforms are carried out on the component processes, and data
are transferred directly between components via MCT. These features
were released in OASIS3-MCT_1.0 in 2012 (Valcke et al., 2012) and, because
of backwards compatibility, OASIS3 users could upgrade easily to OASIS3-MCT.</p>
      <p>With the release of OASIS3-MCT_3.0 in 2015 (Valcke et al., 2015), several
new features were added to the coupler. OASIS3-MCT_3.0 extends the ability
to couple components running concurrently and adds support for coupling
within a component for grids and fields defined on overlapping or partially
overlapping sets of tasks, such as between physics and dynamics modules
within an atmospheric model or to and from an I/O module. OASIS3-MCT_3.0
also allows a component to define grids, partitions, and coupling fields on
subsets of its tasks, and it comes with a graphical user interface (GUI) to
generate the <italic>namcouple</italic> file.</p>
      <p>The next section, titled Implementation, describes these features in greater
detail. Section 3 provides performance and memory scaling results from
OASIS3-MCT_3.0 as well as some initial results for features
expected in OASIS3-MCT_4.0, and Sect. 4 provides
conclusions and a summary.</p>
</sec>
<sec id="Ch1.S2">
  <title>Implementation</title>
      <p>As discussed in the introduction, OASIS3-MCT development started with the
objective to keep the OASIS3 general design. The requirements of OASIS3-MCT
were focused on improved parallel performance, including parallel mapping and
parallel data coupling, the ability to efficiently support unstructured
grids, the ability to specify pre-defined mapping files to mitigate the
serial cost of generating mapping weights on the fly, and backwards
compatibility in usage of both the <italic>namcouple</italic> file and the OASIS3
APIs. A summary of the changes between OASIS3 and OASIS3-MCT_3.0 is provided
in Appendix A as well as an initial list of features expected in
OASIS3-MCT_4.0.</p>
<sec id="Ch1.S2.SS1">
  <title>General architecture</title>
      <p>To accomplish these tasks efficiently and in a timely manner, the MCT
developed by the Argonne National Laboratory (Larson et al., 2005) was
incorporated into OASIS3 to support parallel matrix vector multiplication and
parallel distributed exchanges. Its design philosophy, based on flexibility
and minimal invasiveness, is consistent with the approach taken in OASIS. MCT
has proven parallel performance and is one of the underlying coupling
software libraries used in the National Center for Atmospheric Research
Community Earth System Model (NCAR CESM) (Jacob et al., 2005; Craig et al.,
2012).</p>
      <p><?xmltex \hack{\newpage}?>MCT handles two primary tasks in OASIS3-MCT: the parallel transfer of data
from a source model to a destination model, and interpolation of fields
between decomposed grids. At the present time, these two steps are
independent and both are largely performance limited by MPI communication
cost at moderate to high processor counts due to the data rearrangement in
both. Data communication and mapping rearrangement are handled internally in
OASIS3-MCT via MCT routers.</p>
      <p>Another significant change in the OASIS3-MCT implementation compared to
OASIS3 is that a separate hub coupler executable running on its own
processes is no longer needed. Accumulation, temporal lagging, mapping, and
other transforms are carried out in the OASIS3-MCT coupling layer on the
model processes in parallel using temporary memory to store data as needed.
Compared to OASIS3, which required an all-to-one communication,
interpolation on the single hub process, and a one-to-all communication to
couple fields, OASIS3-MCT requires just one parallel all-to-all
communication between the source and destination processes and one parallel
mapping which includes a rearrangement of the data on the source or
destination processes. In addition, the memory needed in the infrastructure
in OASIS3-MCT is much more scalable.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <title>Coupling</title>
      <p>OASIS3-MCT fundamentally supports coupling of 2-D logically rectangular
fields, but 3-D fields and 1-D fields are also supported using a
1-D degeneration
of the grid structure. If the user provides a set of pre-calculated weights,
OASIS3-MCT will be able to interpolate any type of 1-D, 2-D, or 3-D field,
but the capability to calculate the mapping weights by the coupler is only
available for 2-D fields on the sphere.</p>
      <p>Another new feature is the option to couple multiple fields as a single
coupling operation. This is supported for fields for which the coupling
options defined in the <italic>namcouple</italic> file are identical. This can
improve performance because rather than mapping and coupling fields one at a
time, the mapping and coupling can be aggregated over multiple fields.
Coupling multiple fields at once is accomplished by specifying a list of
colon-delimited fields in the <italic>namcouple</italic> file on both the source and
destination sides. In this implementation, the get and put calls in the model
are still individual calls on individual fields, but the coupling layer will
aggregate the multiple fields specified in the <italic>namcouple</italic> file into a
single step. On the put side, the multiple fields are not mapped or sent
until all of the individual put calls are made. On the get side, the multiple
fields are received and mapped on the first get call and then subsequent get
calls just copy in fields that were received earlier. A user can quickly
switch between coupling single and multiple fields just by changing the
<italic>namcouple</italic> input file.</p>
      <p>One additional feature available in the current development version and that
will be released with the next official version, OASIS3-MCT_4.0, is the ability to couple a bundle of 2-D fields via extensions to the
OASIS calling interfaces. An extra dimension is supported in the variable
definition and in the get and put field arrays. In this case, a user can
treat a bundled 2-D field as a single field in the system, while the
underlying implementation treats it just like a multiple field coupling.</p>
</sec>
<sec id="Ch1.S2.SS3">
  <title>Interpolation</title>
      <p>Mapping weight files can either be read directly or generated at run-time, on
one processor, using the same serial method based on SCRIP as existed in
OASIS3. In OASIS3-MCT, the weights are read serially by the root process and
distributed to other processes in reasonable chunks, currently set to
100 000 weights at a time to limit memory use on the root process. For the
interpolation, OASIS3-MCT creates a simple 1-D decomposition of the source
grid on the destination processes or vice versa. Fields are then either
remapped to the destination grid on the source processes and then sent to the
destination processes or sent to the destination processes and then remapped
to the destination grid. The user is able to specify whether the source or
destination processes are used for remapping via an optional setting in the
<italic>namcouple</italic> file. That choice will generally be made based on mapping
performance and depends on the relative size of the grids, the number of
weights, and the process counts of the source and destination models. In
OASIS3-MCT_4.0, a new option is expected that may reduce the mapping
rearrangement cost by choosing a more efficient decomposition of the source
grid on the destination processes (or vice versa) compared to the current
default 1-D decomposition.</p>
      <p>Users also have an additional option to set the implementation of the
underlying mapping algorithm. The <italic>bfb</italic> option will enforce an order of
operations that will be bit-for-bit identical on different process counts. It
does this by distributing the mapping weights on the destination
decomposition and then redistributing the source coupling field grid point
values to the destination processes before applying the mapping weights. This
ensures operation order is independent of decomposition. The <italic>sum</italic> option
does the opposite. It distributes the mapping weights on the source
decomposition and then computes partial sums of the destination field on the
source decomposition, before rearranging them to the destination
decomposition and adding up the partial sums. This does not guarantee
identical order of operations on different process counts and decompositions.
In both approaches, the same number of floating operations are carried out as
defined by the mapping weights. The main difference between the <italic>bfb</italic> and
<italic>sum</italic> strategies is that in <italic>bfb</italic> mode, the source field is rearranged onto
the destination distribution before the mapping weights are applied, while in
<italic>sum</italic> mode, the mapping weights are applied on the source decomposition to
form partial sums of the destination field, and then the partial sums are
rearranged. From the performance point of view, it is generally better to use
the method that rearranges the field on the grid that contains the fewest
grid cells, to minimize the communication cost. But, of course, if
bit-for-bit reproducibility on different core counts is required, then the
<italic>bfb</italic> mode should be chosen.</p>
</sec>
<sec id="Ch1.S2.SS4">
  <title>Conservation</title>
      <p>With OASIS3-MCT, the optional CONSERV transform has been refactored. In
OASIS3, this operation was always performed on a single process. In
OASIS3-MCT, this operation is now performed in parallel on the source or
destination processes. The CONSERV operation computes global sums of the
source and destination fields and applies corrections to the decomposed
mapped field in order to conserve area-integrated field quantities. There are
two options for computing the global sums in OASIS3-MCT_3.0. The first,
<italic>bfb</italic>, gathers the fields onto the root process to compute the global sums in
an ordered fashion that guarantees bit-for-bit identical results regardless
of the number of cores or decomposition of the field. (Note that both the
CONSERV operation and the underlying mapping algorithm setting share a common
flag, <italic>bfb</italic>, but these two settings are completely independent.) The second
CONSERV option, <italic>opt</italic>, carries out a local double precision sum of the field
and then does a scalar reduction to generate the global sums. This will
typically introduce a round-off difference in the results when changing
process counts or decomposition, but is much faster. However, the <italic>opt</italic>
option will be bit-for-bit reproducible if the same number of processes and
decomposition are used between different runs.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><caption><p>A schematic of the coupling capability in OASIS3-MCT_3.0. In this
example, there are two executables, exe1 and exe2. exe2 has three components, comp2, comp3, and
comp4, and comp3 has three grids, grid3, grid4, and grid5; comp4 is not
involved in any coupling in this case. The executables, components, and grids
are laid out across different tasks. Arrows indicate different coupling
capabilities: A, D, E, and J between different components in different
executables; B, F, and I in a single executable between different components
with different grids; C between different grids in a single component on
non-overlapping tasks; G between different grids in a single component on
partially overlapping tasks; and H between different grids in a single
component on partially overlapping and partially non-overlapping tasks.</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017-f01.pdf"/>

        </fig>

      <p>In the OASIS3-MCT_4.0 release, three new options (<italic>lsum16</italic>, <italic>ddpdd</italic>, and
<italic>reprosum</italic>) will be added to compute the global sums in CONSERV. At the same
time, <italic>opt </italic>will be renamed <italic>lsum8</italic>, while <italic>bfb </italic>will be renamed <italic>gather</italic>.
The rest of this paper will use the OASIS3-MCT_4.0 naming convention for
CONSERV options. The first new global sum method, <italic>lsum16</italic>, works just like
<italic>lsum8</italic> but uses quadruple precision to compute the local sums and to carry
out the scalar reduction. The cost will be higher than <italic>lsum8</italic>, but there is
a greater chance that results will be bit-for-bit for different
decompositions than <italic>lsum8</italic>. The <italic>ddpdd</italic> is a parallel double–double
algorithm using a single scalar reduction (He and Ding, 2001). It should
behave between <italic>lsum8</italic> and <italic>lsum16</italic> with respect to performance and
reproducibility. The third new algorithm, <italic>reprosum</italic>, is a fixed point method
based on ordered double integer sums that requires two scalar reductions per
global sum (Mirin and Worley, 2012). The cost of <italic>reprosum</italic> will be higher than
some of the other methods, but it is expected to produce bit-for-bit results
on different task counts except in extremely rare cases, and the cost should
be significantly less than the <italic>gather</italic> method.</p><?xmltex \hack{\newpage}?>
</sec>
<sec id="Ch1.S2.SS5">
  <title>Concurrency, process layout, and sequencing</title>
      <p>The ability to couple fields within one executable running on partially
overlapping tasks was added in OASIS3-MCT_3.0. A number of
new capabilities had to be implemented to support this feature including the
ability to define grids, partitions, and coupling fields on subsets of
component tasks. There also had to be a major update in the handling of MPI
communicators within the infrastructure. These changes are transparent to
the user. This allows, within a single model, different sets of MPI tasks to
define multiple grids, multiple decompositions (partitions), and different
coupling fields. These new features and updates provide the flexibility
needed to couple fields between components or within a component.</p>
      <p>Figure 1 provides a schematic of the type of coupling that can be carried
out between and within components in OASIS3-MCT_3.0.
Executables are defined as separate binaries that are launched independently
at startup, components are defined as separate sets of tasks within an
executable, and grids can be defined on all tasks or on a subset of tasks
within a component. Each task will be associated with only one executable
and one component in any application, but multiple grids and decompositions
can exist across overlapping tasks within a component. While OASIS3-MCT
supports both single and multiple executable configurations, the coarsest
level of concurrency in the system is the component.</p>
      <p>In Fig. 1, an example schematic is presented that shows how two executables,
exe1 and exe2, run concurrently on separate sets of MPI tasks (0–5 for exe1
and 6–37 for exe2). Executable exe1 includes only one component, comp1, that
has coupling fields defined on only one grid, grid1 (decomposed on all six
tasks). Executable exe2 includes three components, comp2, comp3, and comp4,
running concurrently on tasks 6–11, 12–33, and 34–37, respectively.
Component comp2 participates in the coupling, with fields defined on only one
grid, grid2 (decomposed on all five tasks), while comp4 does not participate
in the coupling. Component comp3 exchanges coupling fields defined on three
different grids, grid3 (tasks 12–21), grid4 (tasks 22–30), and grid5 (tasks
12–26, overlapping with both grid3 and grid4). Finally, comp3 has three
tasks (31–33) not involved in the coupling. Different coupling capabilities
are indicated by the differently lettered arrows in Fig. 1. Coupling is
supported between components in separate executables, within a single
executable between different components, and between overlapping,
non-overlapping, or partially overlapping grids in a single component. In
OASIS3, only coupling between separate executables was supported; in
OASIS3-MCT_3.0, a functional and highly flexible coupled system can now be
designed and implemented as either a single executable or with multiple
executables.</p>
      <p>Within OASIS, it has always been mandatory for a user to establish a set of
configuration inputs that are consistent with the get and put sequencing in
the components such that the coupled system will not deadlock. OASIS3-MCT
provides some new capabilities to detect potential deadlocks before they
occur, but it is still largely up to the user to make sure this does not
happen. This is even more important for coupling components on overlapping
tasks as there is almost no way to detect a deadlock ahead of time.
Specifically, a field put routine must be called before the matching get
(taking into account any lags specified in the configuration file) when
coupling on overlapping tasks. In OASIS3-MCT, puts are generally
non-blocking, while gets are blocking. More specifically, a put waits for the
completion of the put of the same coupling field at the previous coupling
time step before proceeding in order to prevent puts from queuing up in MPI
and using excess memory. In other words, for a specific put–get pair, the
last put can never be more than one coupling period ahead of the equivalent
get in OASIS3-MCT. This means that the puts and gets have to be interleaved
when coupling on overlapping tasks. It is not possible to queue up a series
of puts over multiple coupling periods before executing the equivalent gets.</p>
</sec>
<sec id="Ch1.S2.SS6">
  <title>Other features</title>
      <p>There are several additional features in OASIS3-MCT relative to OASIS3. The
grid writing routines have been extended to support parallel calls from all
component processes. However, even when the parallel interface is used, the
grid information is still aggregated onto the root processor within the
OASIS3-MCT layer and then written serially to disk.</p>
      <p><?xmltex \hack{\newpage}?>OASIS3-MCT now also includes a GUI, which is an application of OPENTEA
(Dauptain, 2014), the graphical interface developed at CERFACS. The
OASIS3-MCT GUI helps users produce the <italic>namcouple</italic> configuration file
for a specific run, without worrying about the format syntax of the file.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <title>Performance</title>
      <p>This section summarizes the performance of various aspects of OASIS3-MCT_3.0
at low and high process counts and at moderate to high resolutions. The
performance and scaling of initialization, coupling, mapping, conservation,
and other features will be presented. Memory usage will also be shown.</p>
<sec id="Ch1.S3.SS1">
  <title>Initialization</title>
      <p>Figure 2 shows the initialization cost for a T799-ORCA025 test case on up to
16 000 MPI tasks per component, with the two components running concurrently
(32 000 tasks in total) on Curie at CEA TGCC. Curie consists of 5040 nodes
with two eight-core Intel Sandy Bridge EP (E5-2680) 2.7 GHz processors per
node connected with an InfiniBand QDR Full Fat Tree network. These tests were
run with simple toy models that define grids and couple test data but that
have practically no model initialization or run-time overhead. This
configuration was chosen because it demonstrates OASIS3-MCT's ability to
support high-resolution climate configurations. The T799 is a global
atmospheric Gaussian reduced grid with a <inline-formula><mml:math id="M1" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 25 km resolution and
843 490 grid points. The ORCA025 grid is a tripolar grid with <inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mn mathvariant="normal">1442</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1021</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M3" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1.47 million) grid points and is one of the grid
configurations used by the NEMO ocean model
(<uri>http://www.nemo-ocean.eu/</uri>). The OASIS3-MCT initialization consists of
several steps, including setting up the partitions, reading in and
distributing the mapping weights, computing the mapping rearrangement
communication patterns, and computing the coupling communication patterns.
Most of these operations rely heavily on MPI to define the interactions,
reconcile the coupling fields and decompositions, and set up the mapping and
coupling interactions. Multiple runs were performed for each number of cores,
with little variability in the timing measured. Based on the results in
Fig. 2, the total initialization time for Oasis3-MCT is likely to be
reasonable for most applications, even at high numbers of cores. Below 2000
MPI tasks per component, the OASIS3-MCT initialization time is less than
1 min. At 16 000 tasks per component, for this relatively high-resolution
configuration, the initialization time is below 7 min. The initialization
uses MPI heavily to initialize the coupling interactions, read in the mapping
files, and set up the communication for the mapping rearrangement and
coupling communication. In general, the initialization is not expected to
scale well, but the initialization overhead is what allows the model to run
efficiently during the actual run phase. There is clearly some concern that
as task counts continue to increase, the initialization time will continue to
grow. <?xmltex \hack{\mbox\bgroup}?>OASIS<?xmltex \hack{\egroup}?> developers continue to monitor and analyze both the
run-time
and initialization costs.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2"><caption><p>Initialization cost for the T799-ORCA025 toy model using
OASIS3-MCT_3.0 on Curie Bullx.</p></caption>
          <?xmltex \igopts{width=207.705118pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017-f02.pdf"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS2">
  <title>Coupling</title>
      <p>Figure 3 shows the cost of a ping-pong coupling for the same configuration as
Fig. 2. The times are per single ping-pong coupling, but the test was done by
running and averaging 1000 ping-pongs. In a ping-pong test, data are passed
back and forth between the two components sequentially. In other words, data
are sent from model 1 and received by model 2, followed by different data
being sent from model 2 to model 1. Each coupling of data between a pair of
components consists of a mapping operation that interpolates the non-masked
data via a five-nearest-neighbor algorithm that includes both floating point
operations and rearrangement, and a communication operation that transfers
the data between the concurrent sets of MPI tasks of the two components. So
there are four distinct MPI operations in a single ping-pong. There are 4.5
million different links (weights) between the T799 grid points and the
ORCA025 grid points and 3 million weights for the mapping in the other
direction. In this case, scaling is good to about 400 cores per component as
the MPI cost is relatively small and the floating point operations associated
with the mapping dominate the cost. Between 400 and 4000 cores per component,
the ping-pong cost is relatively constant and, above 8000 cores per
component, the timing is degraded relative to lower core counts. At higher
core counts, the timing depends heavily on the MPI performance. At 8000 cores
per component, decompositions get relatively sparse, with just 100 to 200
grid points per core. In addition, timing variability between runs (not
shown) above 1000 cores and the jump in cost at 8000 cores suggest that
interconnect contention is likely a problem at these core counts. Equivalent
timings from OASIS3.3 are also shown in Fig. 3 (Valcke, 2013), and the
ping-pong time is about an order of magnitude better in OASIS3-MCT for a
large range of core counts.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3"><caption><p>Comparison of the ping-pong (pipo) time for the T799-ORCA025 toy
model for OASIS3.3 and OASIS3-MCT_3.0 on Curie Bullx. The
time is averaged for a run where 1000 ping-pongs were carried out.</p></caption>
          <?xmltex \igopts{width=207.705118pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017-f03.pdf"/>

        </fig>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><caption><p>Comparison of the ping-pong (pipo) time for the T799-ORCA025 toy
model on Lenovo on 360 cores with both the relative core count/component and
the mapping location varied. The time is in seconds for 1000 ping-pongs.
Columns <bold>(a)</bold> and <bold>(b)</bold> define the core count used for each
component of the toy model. Columns <bold>(c–f)</bold> are the pipo times for
four different mapping approaches: <bold>(c)</bold> mapping always on the source
cores, <bold>(d)</bold> mapping always on the destination cores, <bold>(e)</bold>
mapping on the ORCA025 cores, and <bold>(f)</bold> mapping on the T799 cores.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="28.452756pt"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="28.452756pt"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="56.905512pt"/>
     <oasis:colspec colnum="4" colname="col4" align="justify" colwidth="56.905512pt"/>
     <oasis:colspec colnum="5" colname="col5" align="justify" colwidth="56.905512pt"/>
     <oasis:colspec colnum="6" colname="col6" align="justify" colwidth="56.905512pt"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1"><bold>(a)</bold> ORCA 025  cores</oasis:entry>  
         <oasis:entry colname="col2"><bold>(b)</bold> T799 <?xmltex \hack{\hfill\break}?>cores</oasis:entry>  
         <oasis:entry colname="col3"><bold>(c)</bold> pipo time<?xmltex \hack{\hfill\break}?>for mapping on <italic>src</italic> cores (s)</oasis:entry>  
         <oasis:entry colname="col4"><bold>(d)</bold> pipo time <?xmltex \hack{\hfill\break}?>for mapping on <?xmltex \hack{\hfill\break}?> <italic>dst</italic> cores (s)</oasis:entry>  
         <oasis:entry colname="col5"><bold>(e)</bold> pipo time <?xmltex \hack{\hfill\break}?>for mapping <?xmltex \hack{\hfill\break}?>on ORCA025<?xmltex \hack{\hfill\break}?>cores (s)</oasis:entry>  
         <oasis:entry colname="col6"><bold>(f)</bold> pipo time <?xmltex \hack{\hfill\break}?>for mapping on T799 cores (s)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">24</oasis:entry>  
         <oasis:entry colname="col2">336</oasis:entry>  
         <oasis:entry colname="col3">5.10</oasis:entry>  
         <oasis:entry colname="col4">5.48</oasis:entry>  
         <oasis:entry colname="col5">7.29</oasis:entry>  
         <oasis:entry colname="col6">3.79</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">180</oasis:entry>  
         <oasis:entry colname="col2">180</oasis:entry>  
         <oasis:entry colname="col3">1.29</oasis:entry>  
         <oasis:entry colname="col4">1.54</oasis:entry>  
         <oasis:entry colname="col5">1.36</oasis:entry>  
         <oasis:entry colname="col6">1.36</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">336</oasis:entry>  
         <oasis:entry colname="col2">24</oasis:entry>  
         <oasis:entry colname="col3">4.70</oasis:entry>  
         <oasis:entry colname="col4">4.93</oasis:entry>  
         <oasis:entry colname="col5">1.91</oasis:entry>  
         <oasis:entry colname="col6">6.69</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4"><caption><p>OASIS3-MCT_3.0 T799-ORCA025 mapping time versus
core count per component on Lenovo. <italic>src</italic> and <italic>dst</italic> mapping are shown for both mapping
directions using the <italic>bfb</italic> algorithm based on tests where 1000 ping-pongs were
run.</p></caption>
          <?xmltex \igopts{width=207.705118pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017-f04.pdf"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS3">
  <title>Interpolation</title>
      <p>One of the features of OASIS3-MCT is the ability to map data on either the
source or destination side as described in Sect. 2.3. Figure 4 shows the
timing of the mapping portion of coupling which includes both the floating
point application of weights and the necessary rearrangement of the data on
either the source processes (<italic>src</italic>) or the destination processes (<italic>dst</italic>) but
not the communication between the source and destination processes. Two
trials were carried out, and Fig. 4 shows the best times, with variability
generally much less than 5 % between runs. This test was run using the
T799-ORCA025 toy model on a Lenovo Xeon based cluster at CERFACS consisting
of over 6000 2.5 GHz cores connected by an Infiniband FDR. Mapping is about
half the total cost of the ping-pong (not shown) in these cases. Figure 4
shows timing data for both mapping directions and for mapping done on the
source (<italic>src</italic>) or destination (<italic>dst</italic>) sides. In all cases, the <italic>bfb</italic>
algorithm is used. The mapping in this case scales well to several hundred
cores. In general, the cost of the T799 to ORCA025 mapping is more expensive
than the reverse, largely due to the fact that there are more mapping weights
(4.5 vs. 3.0 million) to apply.</p>
      <p>Table 1 documents the ping-pong time for 1000 trials for the same
T799-ORCA025 toy model test on Lenovo. In this case, the total number of
cores is held at 360, but the relative distribution of cores to each model is
varied in three test configurations. The ping-pong tests were carried out
with the mapping done on the source, the destination, the ORCA025, or the
T799 sets of cores. In these trials, the <italic>bfb</italic> map algorithm was used. In
this case, the best performance is when the mapping is done on the model with
the highest core count because in this range of core counts, the mapping and
communication are still scaling. At higher core counts or with different
grids, the optimum performance may be different. For the current cases, the
best time is a factor of up to 2.5 times better (1.91 s vs. 4.70 s)
compared to the default setting of <italic>src</italic> and by an even greater factor
compared to the slowest setting. Another point is that if there is a large
disparity in the number of grid cells in the two grids, it should be better
to exchange the coupling fields expressed on the grid with the fewest grid
cells and perform the remapping on the other component tasks. In general, the
number of processes per component is going to be determined by the relative
cost of the scientific models, but the above analysis shows that for a given
task layout, there may be ways to reduce the coupling cost by mapping on the
tasks that provide the greatest performance.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2" specific-use="star"><caption><p>Comparison of unbarriered and barriered ping-pongs (pipo) and
barriered mapping time for the T799-ORCA025 toy model on Lenovo on 180 cores
per component for coupling of 1 field, coupling of 10 fields one at a time,
coupling of 10 fields using OASIS3-MCT multiple-coupling-field capability,
and coupling of 10 fields by a single 3-D bundle. All times are for
<italic>src</italic><inline-formula><mml:math id="M4" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula><italic>bfb</italic> mapping for 1000 ping-pongs. For barriered times, MPI barriers
were introduced in both components before the send and before the mapping to
force serialization of work and to time the mappings separately.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="142.26378pt"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="42.679134pt"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="51.214961pt"/>
     <oasis:colspec colnum="4" colname="col4" align="justify" colwidth="42.679134pt"/>
     <oasis:colspec colnum="5" colname="col5" align="justify" colwidth="42.679134pt"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">time (seconds) <?xmltex \hack{\hfill\break}?>mapping <inline-formula><mml:math id="M5" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> <italic>src</italic><inline-formula><mml:math id="M6" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula><italic>bfb</italic></oasis:entry>  
         <oasis:entry colname="col2">1 field, <?xmltex \hack{\hfill\break}?>1 coupling</oasis:entry>  
         <oasis:entry colname="col3">10 fields,<?xmltex \hack{\hfill\break}?>10 couplings</oasis:entry>  
         <oasis:entry colname="col4">10 fields, <?xmltex \hack{\hfill\break}?>1 coupling</oasis:entry>  
         <oasis:entry colname="col5">10 fields, <?xmltex \hack{\hfill\break}?>1 bundle</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">pipo time, no barriers</oasis:entry>  
         <oasis:entry colname="col2">1.29</oasis:entry>  
         <oasis:entry colname="col3">10.52</oasis:entry>  
         <oasis:entry colname="col4">11.93</oasis:entry>  
         <oasis:entry colname="col5">12.29</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">pipo time, with barriers</oasis:entry>  
         <oasis:entry colname="col2">1.87</oasis:entry>  
         <oasis:entry colname="col3">17.63</oasis:entry>  
         <oasis:entry colname="col4">16.56</oasis:entry>  
         <oasis:entry colname="col5">17.48</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">map ORCA025 <inline-formula><mml:math id="M7" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> T799, with barriers</oasis:entry>  
         <oasis:entry colname="col2">0.67</oasis:entry>  
         <oasis:entry colname="col3">5.48</oasis:entry>  
         <oasis:entry colname="col4">4.61</oasis:entry>  
         <oasis:entry colname="col5">4.68</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">map T799 <inline-formula><mml:math id="M8" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> ORCA025, with barriers</oasis:entry>  
         <oasis:entry colname="col2">0.56</oasis:entry>  
         <oasis:entry colname="col3">5.28</oasis:entry>  
         <oasis:entry colname="col4">4.76</oasis:entry>  
         <oasis:entry colname="col5">4.81</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S3.SS4">
  <title>Field aggregation</title>
      <p>OASIS3-MCT provides a new feature, as described in Sect. 2.2, that allows
users to aggregate coupling of multiple fields into a single coupling
operation by specifying coupled fields via colon-delimited field names in the
<italic>namcouple</italic> file. Table 2 shows unbarriered and barriered ping-pong
and barriered mapping timing for the T799-ORCA025 configuration on Lenovo
using single and multiple fields. For the barriered case, MPI barriers were
added before the send and before the mapping in each component in both
directions of the coupling to strictly enforce serialization of operations
and to be able to time the mapping cost cleanly. Times are in seconds for the
slowest task over the entire run. The fastest time from two test runs is
shown. Variability between runs is less than 2 %. The columns in Table 2
are for a configuration with 180 cores per component using <italic>src</italic><inline-formula><mml:math id="M9" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula><italic>bfb</italic> map
settings for a single field, 10 fields coupled via 10 coupling calls, 10
fields coupled via a single coupling communication, and 10 fields bundled
into a single variable. The bundled fields option will be available in the
OASIS3-MCT_4.0 release. The barriered pipo (ping-pong) time in Table 2 is
about 50 % greater than the unbarriered time. The significant performance
penalty with barriers suggests that there is normally some overlap of
coupling communication and mapping in these timing runs when running without
barriers.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5"><caption><p>OASIS3-MCT_3.0 memory use on Curie Bullx for the
T799-ORCA025 toy model as a function of cores per component.</p></caption>
          <?xmltex \igopts{width=207.705118pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/3297/2017/gmd-10-3297-2017-f05.pdf"/>

        </fig>

      <p>The unbarriered pipo time in Table 2 shows that coupling 10 fields performs
proportionally better than coupling a single field. More specifically, the
case with 10 fields coupled with 10 coupling calls performs best, likely
because there is a greater chance of overlapping mapping and coupling
communication in this case since each field is mapped and sent independently.
The barriered pipo time further suggests that the case with 10 fields coupled
with 10 coupling calls has the greatest amount of overlapping work because
that case has the largest performance degradation when barriers are turned
on.</p>
      <p>In contrast, the mapping time for 10 fields coupled via a single operation
is faster than mapping 10 fields one at a time. This is expected as the
underlying implementation aggregates the mapping rearrangement and coupling
communication cost when fields are bundled. But in this case, that mapping
advantage is offset by the ability to overlap less work. This simple test
case carries out coupling without any real model work between calls. In a
real model, the coupling performance will depend on the sequence of the
coupling calls within the model, how much work can be overlapped with
coupling, and the relative core counts and grid sizes of the different
coupling fields.</p>
</sec>
<sec id="Ch1.S3.SS5">
  <title>Conservation</title>
      <p>Table 3 shows the timings of a ping-pong test of the T799-ORCA025 case on the
Lenovo cluster for four different configurations (48 and 180 cores with <italic>src</italic>
or <italic>dst</italic> mapping) with CONSERV unset and CONSERV set to <italic>lsum8</italic> (equivalent
to <italic>opt</italic> in OASIS3-MCT_3.0), <italic>lsum16</italic>, <italic>ddpdd</italic>, <italic>reprosum</italic>, and <italic>gather</italic>
(equivalent to <italic>bfb</italic> in OASIS3-MCT_3.0). The CONSERV implementation and a
description of the different options for the computation of the global sums
are given in Sect. 2.4. Times are accumulated over 1000 ping-pongs for a
single coupling field in each direction. Two trials of each case were carried
out and the minimum time is shown. Differences between trials were less than
2 % except for the <italic>gather</italic> case where variations in time of up to 10 %
were observed. The CONSERV operation increases the pipo time by at least
50 % regardless of the method used compared to CONSERV off (unset), and
the <italic>gather</italic> option is at least an order of magnitude more expensive
than other CONSERV methods. When OASIS3-MCT_4.0 is available, <italic>lsum8</italic> will
still be the fastest CONSERV method, while <italic>reprosum</italic> will be the best
bit-for-bit option. The cost of <italic>reprosum</italic> is only slightly higher
than <italic>lsum16</italic>, but reproducibility characteristics are significantly better.
When using CONSERV, it is important to test the performance of various
methods and consider carefully the requirements of the science. Of course,
when possible, mapping weights that are inherently conservative such as area
overlap conservative (Jones, 1999) should be used to avoid use of the
CONSERV operation altogether.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T3" specific-use="star"><caption><p>Comparison of ping-pong (pipo) times for the T799-ORCA025 toy model
on Lenovo on 48 and 180 cores per model with the CONSERV option off (unset),
set to <italic>lsum8</italic> (<italic>opt</italic> in OASIS3-MCT_3.0), <italic>lsum16</italic>, <italic>ddpdd</italic>,
<italic>reprosum</italic>, and <italic>gather</italic> (<italic>bfb </italic>in OASIS3-MCT). Times are
accumulated over 1000 ping-pongs for a single coupling field in each
direction.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:colspec colnum="7" colname="col7" align="left"/>
     <oasis:thead>
       <oasis:row>  
         <oasis:entry colname="col1">cores, mapping</oasis:entry>  
         <oasis:entry colname="col2">CONSERV</oasis:entry>  
         <oasis:entry colname="col3">CONSERV</oasis:entry>  
         <oasis:entry colname="col4">CONSERV</oasis:entry>  
         <oasis:entry colname="col5">CONSERV</oasis:entry>  
         <oasis:entry colname="col6">CONSERV</oasis:entry>  
         <oasis:entry colname="col7">CONSERV</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2">unset</oasis:entry>  
         <oasis:entry colname="col3"><italic>lsum8</italic></oasis:entry>  
         <oasis:entry colname="col4"><italic>lsum16</italic></oasis:entry>  
         <oasis:entry colname="col5"><italic>ddpdd</italic></oasis:entry>  
         <oasis:entry colname="col6"><italic>reprosum</italic></oasis:entry>  
         <oasis:entry colname="col7"><italic>gather</italic></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">48, <italic>src</italic><inline-formula><mml:math id="M10" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula><italic>bfb</italic></oasis:entry>  
         <oasis:entry colname="col2">4.00</oasis:entry>  
         <oasis:entry colname="col3">8.27</oasis:entry>  
         <oasis:entry colname="col4">16.78</oasis:entry>  
         <oasis:entry colname="col5">10.65</oasis:entry>  
         <oasis:entry colname="col6">17.34</oasis:entry>  
         <oasis:entry colname="col7">117.72</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">48, <italic>dst</italic><inline-formula><mml:math id="M11" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula><italic>bfb</italic></oasis:entry>  
         <oasis:entry colname="col2">4.39</oasis:entry>  
         <oasis:entry colname="col3">8.02</oasis:entry>  
         <oasis:entry colname="col4">16.59</oasis:entry>  
         <oasis:entry colname="col5">10.42</oasis:entry>  
         <oasis:entry colname="col6">16.98</oasis:entry>  
         <oasis:entry colname="col7">142.12</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">180, <italic>src</italic><inline-formula><mml:math id="M12" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula><italic>bfb</italic></oasis:entry>  
         <oasis:entry colname="col2">1.25</oasis:entry>  
         <oasis:entry colname="col3">2.21</oasis:entry>  
         <oasis:entry colname="col4">4.59</oasis:entry>  
         <oasis:entry colname="col5">2.87</oasis:entry>  
         <oasis:entry colname="col6">4.85</oasis:entry>  
         <oasis:entry colname="col7">126.91</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">180, <italic>dst</italic><inline-formula><mml:math id="M13" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula><italic>bfb</italic></oasis:entry>  
         <oasis:entry colname="col2">1.56</oasis:entry>  
         <oasis:entry colname="col3">2.26</oasis:entry>  
         <oasis:entry colname="col4">4.62</oasis:entry>  
         <oasis:entry colname="col5">2.92</oasis:entry>  
         <oasis:entry colname="col6">4.90</oasis:entry>  
         <oasis:entry colname="col7">130.01</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S3.SS6">
  <title>Memory</title>
      <p>Figure 5 shows the memory use per core for the T799-ORCA025 test case on
Curie, the same test case as in Figs. 2 and 3. Memory use was determined by
calls into the gptl (<uri>http://jmrosinski.github.io/GPTL/</uri>) interface,
included in the OASIS3-MCT release, which queries memory usage through C
intrinsics. At 16 000 cores, the infrastructure uses a bit more than 1 GB
per core, which while not tiny, is generally acceptable for many applications
and hardware. Memory increases on a per core basis at higher core counts. It
is possible that the MPI memory footprint accounts for most of this behavior
(Balaji et al., 2008; Gropp, 2009), but further investigation will be carried
out in the future to better understand this behavior.</p>
</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <title>Conclusions</title>
      <p>OASIS3-MCT was implemented largely to address limitations in parallel
performance of OASIS3 and to provide a framework for use at higher
resolutions. With OASIS3-MCT, the widely used OASIS3 model interfaces (APIs)
and configuration file have largely been preserved, and this explains the
wide adoption of OASIS3-MCT within the OASIS user community. Since its
release in May 2015, about 250 downloads of OASIS3-MCT_3.0 have been
registered from most major climate modeling groups in Europe as well as from
groups in North and South America, Asia, Australia, and Africa. In the last 2
years, the OASIS3-MCT coupler has been used in many state-of-the-art coupled
systems, including high-resolution climate models and systems that couple 3-D
atmospheric fields between global and regional models frequently among
others. Other examples of coupled model applications that use OASIS3-MCT can
be found on the OASIS3-MCT coupled model
page<fn id="Ch1.Footn2"><p><uri>https://portal.enes.org/oasis/oasis-dedicated-user-support-1/survey-on-coupled-models-using-oasis-march-2016/coupled-models-using-oasis</uri></p></fn>.</p>
      <p>The underlying software was refactored significantly in OASIS3-MCT to improve
parallel performance and coupling capabilities. MCT serves as a key part of
the OASIS3-MCT implementation and provides parallel capabilities for coupling
operations. OASIS3-MCT_3.0 also provides new capabilities to couple fields
within a single component running on concurrent, overlapping, or partially
overlapping processes. This increases the flexibility of OASIS3-MCT
significantly and provides a mechanism for coupling data between different
decompositions or grids within a single model, among many other things.
OASIS3-MCT can now be used as a coupling layer for components running
sequentially, concurrently, or both; for single or multiple executable
execution; to exchange coupling fields defined on a subset of the component
tasks; and to support features like a separate I/O component included in the
executable but not involved in the coupling. This provides significant
flexibility to layout models on parallel tasks in relatively arbitrary ways
to optimize overall performance and to build new features into a model beyond
model coupling. OASIS3-MCT has been tested successfully at high resolution,
at high processor counts, and with a large number of coupling fields.</p>
      <p>There are other benefits in the OASIS3-MCT implementation. OASIS3-MCT still
supports mapping weight generation on the fly via SCRIP using a single
processor just like <?xmltex \hack{\mbox\bgroup}?>OASIS3<?xmltex \hack{\egroup}?>. However, mapping files can also be generated
offline, read in directly relatively efficiently, and more easily reused, and
the cost associated with generating the mapping files can be moved to a
preprocessing step using more sophisticated tools. If online weight
generation needs to be upgraded in OASIS in the future to support, for
instance, time-evolving grids, OASIS will consider incorporating more
sophisticated external tools into the infrastructure. There are new features that support the creation of grid data using a parallel interface,
that couple multiple fields in a single operation, and that generate the
<italic>namcouple</italic>
file offline via a GUI. The requirement for an OASIS3 hub
coupler has been removed, all communication and mapping are done in parallel,
and performance is significantly improved.</p>
      <p>The scaling and performance results in Sect. 3 demonstrate the ability of
OASIS3-MCT to support high-resolution model coupling on large core counts.
However, as core counts get well into the tens of thousands and beyond, there
are questions and concerns about the cost of both the initialization and
coupling exchanges in OASIS3-MCT. The operations in OASIS3-MCT are ultimately
constrained by MPI performance at those core counts, and developers will
continue to pursue performance improvements in the underlying implementation.
However, for the near-term future, say the next 5 years, OASIS3-MCT is likely
to adequately meet the needs of the climate modeling community.</p>
      <p>The flexibility and relative cost of OASIS3-MCT to map fields by various
approaches was shown. A general recommendation is to test different
approaches and to choose the approach that yields the best performance. While
it is always first recommended to use conservative mapping weights to avoid
the use of the global CONSERV transformation, the performances of the
different options of this transformation were shown for a high-resolution
case. If the CONSERV transformation is needed, the more efficient <italic>lsum8</italic>
(<italic>opt</italic> in OASIS3-MCT_3.0) option, implemented using partial sums, is
recommended unless bit-for-bit reproducible results on different core counts
are absolutely required. The partial sum option will produce bit-for-bit
reproducible results for a configuration with fixed process counts and
decomposition and will introduce no more than roundoff level differences when
changing process counts or decomposition. CONSERV options planned for the
OASIS3-MCT_4.0 release were also included in the results in Sect. 3. In
OASIS3-MCT_4.0, the new <italic>reprosum</italic> option will significantly improve
the performance of the bit-for-bit CONSERV option compared to the currently
available <italic>gather</italic> (<italic>bfb</italic> in OASIS3-MCT_3.0) option.</p>
      <p>The ability to couple multiple fields via a single coupling operation was
demonstrated. While not shown in this study, OASIS3-MCT has been used to
successfully couple over 10 000 fields in some coupled systems within the
community. Those tests were carried out with both single field coupling and
multiple field coupling with success. In that case, multiple field coupling
significantly reduces the size of the <italic>namcouple</italic> file. Multiple field
coupling was shown to reduce the mapping time compared to coupling the same
number of fields individually. The performance benefit of using the multiple
field feature in the overall coupling time is less clear and will depend on
the sequencing and design of each coupled system.</p>
      <p><?xmltex \hack{\newpage}?>A number of future extensions are being considered for OASIS3-MCT. In theory,
it should be possible to combine the mapping and coupling steps to eliminate
a field rearrangement and further reduce communication cost. As a first step,
decomposition strategies that could reduce the rearrangement cost in the
mapping operation are being developed for release in OASIS3-MCT_4.0. There
are also many opportunities in OASIS3-MCT to improve the I/O performance. In
the current version, I/O is done via a gather and/or scatter to/from a root
task and data are written in serial from the root task. This is likely to
eventually lead to memory and performance issues. Finally, better support
within OASIS3-MCT for shared memory threading (i.e., OpenMP) and on various
multi-core architectures is likely to become more important in the future.</p>
      <p>In summary, OASIS3-MCT_3.0 is the latest released version of the OASIS
coupler. OASIS3-MCT extends the well-used OASIS software with backwards
compatibility with regard to usage, but has an entirely new implementation
internally. It provides the functional capability to couple high-resolution
structured or unstructured grids at high core counts successfully and should
serve the community well for the next several years. The underlying
implementation continues to be improved, and OASIS3-MCT_4.0 is expected to
be ready for release in 2018.</p>
</sec>

      
      </body>
    <back><notes notes-type="codeavailability">

      <p>The OASIS3-MCT source code is available for use and testing after
registration at <uri>https://portal.enes.org/oasis/download</uri>. The SVN
command line to download OASIS3-MCT_3.0 is “svn checkout
<uri>https://oasis3mct.cerfacs.fr/svn/branches/OASIS3-MCT_3.0_branch/oasis3-mct</uri>” (last access: September 2017). The OASIS3-MCT_3.0
source code is also available as a tar file at <uri>ftp://ftp.cerfacs.fr/pub/globc/exchanges/distrib-oasis/oasis3-mct.tar.gz</uri>  (last access: September 2017).</p>
  </notes><?xmltex \hack{\clearpage}?><app-group>

<app id="App1.Ch1.S1">
  <title/>
      <p>The following list provides a history of changes to OASIS3-MCT since OASIS3
up to OASIS3-MCT_3.0. It also includes an initial list of
some features expected in the next release, OASIS3-MCT_4.0.<?xmltex \hack{\newline}?><?xmltex \hack{\newline}?>
OASIS3-MCT_1.0 (2012)
<list list-type="bullet"><list-item><p>requirement for separate coupler processes and hub removed</p></list-item><list-item><p>use of MCT in the underlying coupling layer for regridding and communication</p></list-item><list-item><p>parallel remapping</p></list-item><list-item><p>fully parallel communication</p></list-item><list-item><p>ability to couple a single field to multiple destinations</p></list-item><list-item><p>extended ability to read mapping file</p></list-item><list-item><p>improved deadlock trapping</p></list-item><list-item><p>only MPI1 job launching supported</p></list-item><list-item><p>ability to couple on a subset of processes</p></list-item><list-item><p>support for 1-D coupling field arrays</p></list-item><list-item><p>support for prism_and oasis_interface names</p></list-item><list-item><p>restart files for LOCTRANS operations</p></list-item><list-item><p>coupling multiple fields through a single <italic>namcouple</italic> entry</p></list-item></list></p>
      <p><?xmltex \hack{\newpage}?>OASIS3-MCT_2.0 (2013)
<list list-type="bullet"><list-item><p>support for bicubic interpolation given the field gradient is specified in
the interface arguments</p></list-item><list-item><p>coupling support on a subdomain of the full grid</p></list-item><list-item><p>update to timing and debugging capabilities</p></list-item><list-item><p>parallel interface to grid writing</p></list-item></list></p>
      <p>OASIS3-MCT_3.0 (2015)
<list list-type="bullet"><list-item><p>improved memory use, initialization cost, and scaling</p></list-item><list-item><p>updated mapping file reading algorithm</p></list-item><list-item><p>ability to implement a coupled system within a single executable</p></list-item><list-item><p>ability to couple sequentially and on partially or completely overlapping
processes</p></list-item></list></p>
      <p>OASIS3-MCT_4.0 (2018?)
<list list-type="bullet"><list-item><p>support for bundled coupling fields</p></list-item><list-item><p>additional CONSERV global sum methods and improved CONSERV bit-for-bit
performance</p></list-item><list-item><p>a new option for decomposing the mapped field to reduce communication cost</p></list-item><list-item><p>an update to a newer version of MCT that may improve initialization
performance</p></list-item></list></p><?xmltex \hack{\clearpage}?>
</app>
  </app-group><notes notes-type="competinginterests">

      <p>The authors declare that they have no conflict of interest.</p>
  </notes><ack><title>Acknowledgements</title><p>This research was supported by the ESiWACE H2020 European project, grant
agreement no. 675191 (<uri>www.esiwace.eu</uri>), the IS-ENES2 FP7 European
project, contract number 312979 (<uri>https://verc.enes.org/ISENES2</uri>), and
the CONVERGENCE project funded by the French National Research Agency:
ANR-13-MONU-0008.<?xmltex \hack{\newline}?><?xmltex \hack{\newline}?> Edited by: Claire
Levy<?xmltex \hack{\newline}?> Reviewed by: Moritz Hanke and two anonymous referees</p></ack><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>Balaji, P., Buntinas, D., Goodell, D., Gropp, W. D., Kumar, S., Lusk, E. L.,
Thakur, R., and Traff, J. L.: MPI on a Million Processors. Recent Advances in
Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, Volume 5759, 20–30,
<ext-link xlink:href="https://doi.org/10.1007/978-3-642-03770-2_9" ext-link-type="DOI">10.1007/978-3-642-03770-2_9</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Craig, A. P., Vertenstein, M., and Jacob, R.: A New Flexible Coupler for
Earth System Modeling developed for CCSM4 and CESM1, Int. J. High Perf.
Comp. App., 26, 31–42, <ext-link xlink:href="https://doi.org/10.1177/1094342011428141" ext-link-type="DOI">10.1177/1094342011428141</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Dauptain, A.: OpenTEA Super-User Guide, available at:
<uri>https://oasis3mct.cerfacs.fr/svn/trunk/oasis3-mct/util/oasisgui/OpenteaSUG.pdf</uri> (last access: September 2017), 2014.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Gropp, W.: MPI at Exascale: Challenges for Data Structures and Algorithms,
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent
Advances in: Parallel Virtual Machine and Message Passing Interface, edited
by:
Ropo, M., Westerholm, J., and Dongarra, J., published by Springer-Verlag,
Berlin, <ext-link xlink:href="https://doi.org/10.1007/978-3-642-03770-2_3" ext-link-type="DOI">10.1007/978-3-642-03770-2_3</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>He, Y. and Ding, C. H. Q.: Using Accurate Arithmetics to Improve Numerical
Reproducibility and Stability in Numerical Applications, The Journal of
Supercomputing, 18, 259, <ext-link xlink:href="https://doi.org/10.1023/A:1008153532043" ext-link-type="DOI">10.1023/A:1008153532043</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>Hollingsworth, A., Engelen, R. J., Textor, C., Benedetti, A., Boucher, O.,
Chevallier, F., Dethof, A., Elbern, H., Eskes, H., Flemming, J., Granier,
C., Kaiser, J. W., Morcrette, J.-J., Rayner, P., Peuch, V. H., Rouil, L.,
Schultz, M. G., Simmons, A. J., and The GEMS Consortium: Toward a Monitoring
and Forecasting System For Atmospheric Composition: The GEMS Project, B. Am.
Meteorol. Soc., 89, 1147–1164, 2008.
 </mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Jacob, R., Larson, J., and Ong, E.: MxN Communication and Parallel Interpolation
in CCSM3 Using the Model Coupling Toolkit, Int. J. High Perf. Comp.
App., 19, 293–307, <ext-link xlink:href="https://doi.org/10.1177/1094342005056116" ext-link-type="DOI">10.1177/1094342005056116</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>
Jones, P.: Conservative remapping: First-and second-order conservative
remapping, Mon. Weather Rev., 127, 2204–2210, 1999.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>Larson, J., Jacob, R., and Ong, E.: The Model Coupling Toolkit: A New
Fortran90 Toolkit for Building Multiphysics Parallel Coupled Models, Int. J.
High Perf. Comp. App., 19, 277–292, <ext-link xlink:href="https://doi.org/10.1177/1094342005056116" ext-link-type="DOI">10.1177/1094342005056116</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Mirin, A. A. and Worley, P. H.: Improving the Performance Scalability of the
Community Atmosphere Model, Int. J. High Perf. Comp. App., 26, 17–30,
<ext-link xlink:href="https://doi.org/10.1177/1094342011412630" ext-link-type="DOI">10.1177/1094342011412630</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>Redler, R., Valcke, S., and Ritzdorf, H.: OASIS4 – a coupling software for
next generation earth system modelling, Geosci. Model Dev., 3, 87–104,
<ext-link xlink:href="https://doi.org/10.5194/gmd-3-87-2010" ext-link-type="DOI">10.5194/gmd-3-87-2010</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>
Terray, L., Thual, O., Belamari, S., Déqué, M., Dandin, P., Lévy,
C., and Delecluse, P.: Climatology and interannual variability simulated by
the arpege-opa model, Clim. Dynam., 11, 487–505, 1995.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Theurich, G., Deluca, C., Campbell, T., Liu, F., Saint, K., Vertenstein, M., Chen, J., Oehmke, R., Doyle, J.,
Whitcomb, T., Wallcraft, A., Iredell, M., Black, T., Da Silva, A. M., Clune, T., Ferraro, R.,
Li, P., Kelley, M., Aleinov, I., Balaji, V., Zadeh, N., Jacob, R., Kirtman, B., Giraldo, F.,
McCarren, D., Sandgathe, S., Peckham, S., and Dunlap IV, R.: The Earth System Prediction Suite: Toward a Coordinated U.S.
Modeling Capability, B. Am. Meteor. Soc., 97, 1229–1247, <ext-link xlink:href="https://doi.org/10.1175/BAMS-D-14-00164.1" ext-link-type="DOI">10.1175/BAMS-D-14-00164.1</ext-link>,
2016.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>Valcke, S.: The OASIS3 coupler: a European climate modelling community
software, Geosci. Model Dev., 6, 373–388,
<ext-link xlink:href="https://doi.org/10.5194/gmd-6-373-2013" ext-link-type="DOI">10.5194/gmd-6-373-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Valcke, S., Craig, T., and Coquart, L.: OASIS3-MCT User Guide, OASIS3-MCT
1.0, Technical Report, TR/CMGC/12/49, CERFACS, Toulouse, France, available
at:
<uri>http://www.cerfacs.fr/oa4web/papers_oasis/oasis3mct_UserGuide_1.0.pdf</uri>  (last access: September 2017),
2012.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Valcke, S., Craig, T., and Coquart, L.: OASIS3-MCT User Guide,
OASIS3-MCT_3.0, Technical Report, TR/CMGC/15/38, CERFACS/CNRS/SUC URA
No1875, Toulouse, France, available at:
<uri>http://www.cerfacs.fr/oa4web/oasis3-mct_3.0/oasis3mct_UserGuide.pdf</uri>  (last access: September 2017),
2015.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Valcke, S., Craig, A., Dunlap, R., and Riley, G.: Sharing Experiences and
Outlook on Coupling Technologies for Earth System Models, Bu. Am. Meteor.
Soc., 97, ES53–ES56, <ext-link xlink:href="https://doi.org/10.1175/BAMS-D-15-00239.1" ext-link-type="DOI">10.1175/BAMS-D-15-00239.1</ext-link>, 2016.</mixed-citation></ref>

  </ref-list><app-group content-type="float"><app><title/>

    </app></app-group></back>
    <!--<article-title-html>Development and performance of a new version of the OASIS coupler, OASIS3-MCT_3.0</article-title-html>
<abstract-html><p class="p">OASIS is coupling software developed primarily for use in the
climate community. It provides the ability to couple different
models<span class="note"><sup class="mark">1</sup><div class="note_content">Within the text, we use </div></span><q><span class="note"><div class="note_content">model</div></span></q><span class="note"><div class="note_content"> in the sense of a
</div></span><q><span class="note"><div class="note_content">numerical model</div></span></q><span class="note"><div class="note_content">.</div></span> with low implementation and performance overhead.
OASIS3-MCT is the latest version of OASIS. It includes several improvements
compared to <span style="" class="text">OASIS3</span>, including elimination of a separate hub coupler process,
parallelization of the coupling communication and run-time grid
interpolation, and the ability to easily reuse mapping weight files.
OASIS3-MCT_3.0 is the latest release and includes the ability to couple
between components running sequentially on the same set of tasks as well as
to couple within a single component between different grids or decompositions
such as physics, dynamics, and I/O. OASIS3-MCT has been tested with different
configurations on up to 32 000 processes, with components running on
high-resolution grids with up to 1.5 million grid cells, and with over
10 000 <span style="" class="text">2-D</span> coupling fields. Several new features will be available in
OASIS3-MCT_4.0, and some of those are also described.</p></abstract-html>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
Balaji, P., Buntinas, D., Goodell, D., Gropp, W. D., Kumar, S., Lusk, E. L.,
Thakur, R., and Traff, J. L.: MPI on a Million Processors. Recent Advances in
Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, Volume 5759, 20–30,
<a href="https://doi.org/10.1007/978-3-642-03770-2_9" target="_blank">https://doi.org/10.1007/978-3-642-03770-2_9</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>
Craig, A. P., Vertenstein, M., and Jacob, R.: A New Flexible Coupler for
Earth System Modeling developed for CCSM4 and CESM1, Int. J. High Perf.
Comp. App., 26, 31–42, <a href="https://doi.org/10.1177/1094342011428141" target="_blank">https://doi.org/10.1177/1094342011428141</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>
Dauptain, A.: OpenTEA Super-User Guide, available at:
<a href="https://oasis3mct.cerfacs.fr/svn/trunk/oasis3-mct/util/oasisgui/OpenteaSUG.pdf" target="_blank">https://oasis3mct.cerfacs.fr/svn/trunk/oasis3-mct/util/oasisgui/OpenteaSUG.pdf</a> (last access: September 2017), 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
Gropp, W.: MPI at Exascale: Challenges for Data Structures and Algorithms,
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent
Advances in: Parallel Virtual Machine and Message Passing Interface, edited
by:
Ropo, M., Westerholm, J., and Dongarra, J., published by Springer-Verlag,
Berlin, <a href="https://doi.org/10.1007/978-3-642-03770-2_3" target="_blank">https://doi.org/10.1007/978-3-642-03770-2_3</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>
He, Y. and Ding, C. H. Q.: Using Accurate Arithmetics to Improve Numerical
Reproducibility and Stability in Numerical Applications, The Journal of
Supercomputing, 18, 259, <a href="https://doi.org/10.1023/A:1008153532043" target="_blank">https://doi.org/10.1023/A:1008153532043</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>
Hollingsworth, A., Engelen, R. J., Textor, C., Benedetti, A., Boucher, O.,
Chevallier, F., Dethof, A., Elbern, H., Eskes, H., Flemming, J., Granier,
C., Kaiser, J. W., Morcrette, J.-J., Rayner, P., Peuch, V. H., Rouil, L.,
Schultz, M. G., Simmons, A. J., and The GEMS Consortium: Toward a Monitoring
and Forecasting System For Atmospheric Composition: The GEMS Project, B. Am.
Meteorol. Soc., 89, 1147–1164, 2008.

</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
Jacob, R., Larson, J., and Ong, E.: MxN Communication and Parallel Interpolation
in CCSM3 Using the Model Coupling Toolkit, Int. J. High Perf. Comp.
App., 19, 293–307, <a href="https://doi.org/10.1177/1094342005056116" target="_blank">https://doi.org/10.1177/1094342005056116</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>
Jones, P.: Conservative remapping: First-and second-order conservative
remapping, Mon. Weather Rev., 127, 2204–2210, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>
Larson, J., Jacob, R., and Ong, E.: The Model Coupling Toolkit: A New
Fortran90 Toolkit for Building Multiphysics Parallel Coupled Models, Int. J.
High Perf. Comp. App., 19, 277–292, <a href="https://doi.org/10.1177/1094342005056116" target="_blank">https://doi.org/10.1177/1094342005056116</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>
Mirin, A. A. and Worley, P. H.: Improving the Performance Scalability of the
Community Atmosphere Model, Int. J. High Perf. Comp. App., 26, 17–30,
<a href="https://doi.org/10.1177/1094342011412630" target="_blank">https://doi.org/10.1177/1094342011412630</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>
Redler, R., Valcke, S., and Ritzdorf, H.: OASIS4 – a coupling software for
next generation earth system modelling, Geosci. Model Dev., 3, 87–104,
<a href="https://doi.org/10.5194/gmd-3-87-2010" target="_blank">https://doi.org/10.5194/gmd-3-87-2010</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
Terray, L., Thual, O., Belamari, S., Déqué, M., Dandin, P., Lévy,
C., and Delecluse, P.: Climatology and interannual variability simulated by
the arpege-opa model, Clim. Dynam., 11, 487–505, 1995.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>
Theurich, G., Deluca, C., Campbell, T., Liu, F., Saint, K., Vertenstein, M., Chen, J., Oehmke, R., Doyle, J.,
Whitcomb, T., Wallcraft, A., Iredell, M., Black, T., Da Silva, A. M., Clune, T., Ferraro, R.,
Li, P., Kelley, M., Aleinov, I., Balaji, V., Zadeh, N., Jacob, R., Kirtman, B., Giraldo, F.,
McCarren, D., Sandgathe, S., Peckham, S., and Dunlap IV, R.: The Earth System Prediction Suite: Toward a Coordinated U.S.
Modeling Capability, B. Am. Meteor. Soc., 97, 1229–1247, <a href="https://doi.org/10.1175/BAMS-D-14-00164.1" target="_blank">https://doi.org/10.1175/BAMS-D-14-00164.1</a>,
2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>
Valcke, S.: The OASIS3 coupler: a European climate modelling community
software, Geosci. Model Dev., 6, 373–388,
<a href="https://doi.org/10.5194/gmd-6-373-2013" target="_blank">https://doi.org/10.5194/gmd-6-373-2013</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
Valcke, S., Craig, T., and Coquart, L.: OASIS3-MCT User Guide, OASIS3-MCT
1.0, Technical Report, TR/CMGC/12/49, CERFACS, Toulouse, France, available
at:
<a href="http://www.cerfacs.fr/oa4web/papers_oasis/oasis3mct_UserGuide_1.0.pdf" target="_blank">http://www.cerfacs.fr/oa4web/papers_oasis/oasis3mct_UserGuide_1.0.pdf</a>  (last access: September 2017),
2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
Valcke, S., Craig, T., and Coquart, L.: OASIS3-MCT User Guide,
OASIS3-MCT_3.0, Technical Report, TR/CMGC/15/38, CERFACS/CNRS/SUC URA
No1875, Toulouse, France, available at:
<a href="http://www.cerfacs.fr/oa4web/oasis3-mct_3.0/oasis3mct_UserGuide.pdf" target="_blank">http://www.cerfacs.fr/oa4web/oasis3-mct_3.0/oasis3mct_UserGuide.pdf</a>  (last access: September 2017),
2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>
Valcke, S., Craig, A., Dunlap, R., and Riley, G.: Sharing Experiences and
Outlook on Coupling Technologies for Earth System Models, Bu. Am. Meteor.
Soc., 97, ES53–ES56, <a href="https://doi.org/10.1175/BAMS-D-15-00239.1" target="_blank">https://doi.org/10.1175/BAMS-D-15-00239.1</a>, 2016.
</mixed-citation></ref-html>--></article>
