Introduction

GMD

Geoscientific Model Development

GMD

Geosci. Model Dev.

1991-9603

Copernicus Publications

Göttingen, Germany

10.5194/gmd-9-77-2016

Towards convection-resolving, global atmospheric simulations with the Model for Prediction Across Scales (MPAS) v3.1: an extreme scaling experiment

Heinzeller

heinzeller@kit.edu

https://orcid.org/0000-0003-2962-1049

Duda

M. G.

Kunstmann

1Karlsruhe Institute of Technology, Institute of Meteorology and Climate Research, Kreuzeckbahnstr. 19, 82467 Garmisch-Partenkirchen, Germany 2National Center for Atmospheric Research, Mesoscale and Microscale Meteorology Laboratory, 3090 Center Green Drive, Boulder, CO 80301, USA 3University of Augsburg, Institute of Geography, Alter Postweg 118, 86159 Augsburg, Germany

D. Heinzeller (heinzeller@kit.edu)

18January2016

9 1 77110 24July2015 26August2015 17December2015 18December2015

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

This article is available from https://gmd.copernicus.org/articles/9/77/2016/gmd-9-77-2016.html

The full text article is available as a PDF file from https://gmd.copernicus.org/articles/9/77/2016/gmd-9-77-2016.pdf

The Model for Prediction Across Scales (MPAS) is a novel set of Earth system simulation components and consists of an atmospheric model, an ocean model and a land-ice model. Its distinct features are the use of unstructured Voronoi meshes and C-grid discretisation to address shortcomings of global models on regular grids and the use of limited area models nested in a forcing data set, with respect to parallel scalability, numerical accuracy and physical consistency. This concept allows one to include the feedback of regional land use information on weather and climate at local and global scales in a consistent way, which is impossible to achieve with traditional limited area modelling approaches.

Here, we present an in-depth evaluation of MPAS with regards to technical aspects of performing model runs and scalability for three medium-size meshes on four different high-performance computing (HPC) sites with different architectures and compilers. We uncover model limitations and identify new aspects for the model optimisation that are introduced by the use of unstructured Voronoi meshes. We further demonstrate the model performance of MPAS in terms of its capability to reproduce the dynamics of the West African monsoon (WAM) and its associated precipitation in a pilot study. Constrained by available computational resources, we compare 11-month runs for two meshes with observations and a reference simulation from the Weather Research and Forecasting (WRF) model. We show that MPAS can reproduce the atmospheric dynamics on global and local scales in this experiment, but identify a precipitation excess for the West African region.

Finally, we conduct extreme scaling tests on a global 3 km mesh with more than 65 million horizontal grid cells on up to half a million cores. We discuss necessary modifications of the model code to improve its parallel performance in general and specific to the HPC environment. We confirm good scaling (70 % parallel efficiency or better) of the MPAS model and provide numbers on the computational requirements for experiments with the 3 km mesh. In doing so, we show that global, convection-resolving atmospheric simulations with MPAS are within reach of current and next generations of high-end computing facilities.

Introduction

The weather- and climate-modelling community is currently seeing a shift in paradigm from limited area models towards novel approaches involving global, complex and irregular meshes. Yet, regional models are commonly used in numerical weather prediction and to study past, current and future climate at high spatial and temporal resolution over areas of specific interest. A wealth of such models exists currently, which differ in their discretisation of the computational grid, their implementation of the numerical solvers, their parameterisation of physical processes and most notably in their simulation results e.g.. Despite these differences, these models share the common principle of nested modelling: regional climate information is generated by supplying a set of initial conditions as well as time-varying lateral boundary conditions (LBCs; large-scale atmospheric fields such as wind, temperature, geopotential height and hydrometeors) and lower boundary conditions (sea surface temperature, sea ice) to the regional model. The idea behind this approach is that the LBCs keep the regional climate model (RCM) solution consistent with the forcing atmospheric circulation, while small-scale patterns are generated with higher accuracy due to the increase in temporal and spatial resolution. Sub-grid scale processes in the RCM are included through parameterisations, which can be entirely different from those of the forcing global circulation model (GCM).

Supplying lateral boundary conditions to nested models can cause severe problems, up to the point where the RCM solution becomes inconsistent with the forcing data . Starting off as an initial-value problem, the RCM solution gradually becomes a boundary value problem, which from a mathematical point of view represents a fundamentally ill-posed boundary value problem . This is less of an issue in the context of numerical weather prediction (NWP), where typical model run times are 3 to 15 days, than in seasonal forecasting (weeks to months) and in regional climate modelling (years to centuries).

A common problem for both short- and long-term forecasting is that the solution of the RCM seems to vary with the size of the computational domain, as well as location and season . Several authors have shown that nudging techniques (grid or spectral nudging) towards the large-scale features of the forcing model can reduce these adverse effects, but that they can also hide model biases .

Further, the technique of grid nesting introduces discontinuities in spatial resolution between the regional model and the coarser-grid driving model, as well as between the nests within the regional model itself. For a typical refinement ratio of 3, two-thirds of the spatial wave-number spectrum present in the fine mesh are absent in the coarse mesh . This implies that (a) these features must be spun-up for inflows into the higher-resolution domain, and that (b) these wave numbers are reflected at the domain boundary for outflows from the high-resolution domain to the coarser domain. To address the latter issue, filters that are efficient over a large range of wave numbers are required. The temporal interpolation required by nesting can introduce further numerical artefacts, in particular when interpolating forcing LBCs, usually available at 3–6 h time steps, to the model integration time step of the high-resolution domain (typically 6 s per 1 km grid size).

One obvious solution is to avoid using LBCs and nesting by running a global model at the resolution required for the area of interest. This, however, is prohibitively expensive or simply not feasible, even on the latest generations of supercomputers. An intermediate approach therefore is to run a global model at a moderate resolution and use a smooth mesh transition on a variable-resolution grid, where filters are efficient at the local scale of the corresponding grid cell . Beside the here-discussed Model for Prediction Across Scales (MPAS)

http://mpas-dev.github.io

model, few other recent developments such as ICON

icosahedral non-hydrostatic model;

adopt this strategy. Applying such models for mid- and long-term regional climate simulations has only recently become possible and requires substantial computational power.

The MPAS is a recent numerical modelling framework that includes an atmospheric model MPAS-A , an ocean model MPAS-O and a land-ice model MPAS-LI. The MPAS model is a collaborative project, led by the National Center for Atmospheric Research (NCAR) and the Los Alamos National Laboratory (LANL). The three components of MPAS in principle form a so-called Earth system model (ESM), but a coupler between them is not yet available. A common feature of the three MPAS constituents are unstructured, centroidal Voronoi meshes spherical centroidal Voronoi tessellations, SCVTs;, which allow for the generation of global, irregular, variable-resolution meshes with smooth transitions between areas of different refinement.

Upper left: sketch of a variable-resolution Voronoi mesh used for the horizontal grid; upper right: C-grid staggering of state variables adapted from; lower left: a block of owned cells (blue) assigned to an Message Passing Interface (MPI) task, along with two layers of halo cells (red, orange); lower right: partitioning of the regular 120 km mesh with 40 962 grid cells for 64 tasks.

The atmospheric model MPAS-A is a global, fully compressible non-hydrostatic model using finite-volume numerics. Based on the Voronoi mesh, the model uses a C-grid staggering for the state variables (i.e. wind components are modelled at the faces of every cell, and the prognosed component of the wind is orthogonal to the cell face) as described in and (see Fig. , upper panels, for illustration). The governing equations can then be cast in a way such that energy, momentum and water vapour content are conserved more precisely: potential temperature, mass and scalar mass; seeSect. 2. The MPAS-A model builds on existing, well-established techniques of the Advanced Research Weather Research and Forecasting model WRF-ARW;, for example the split-explicit time integration scheme for the treatment of gravity waves and horizontally propagating acoustic waves. It also contains a subset of WRF's physics parameterisations that are suitable for climate modelling purposes. While WRF uses a terrain-following hydrostatic pressure coordinate for the vertical discretisation, MPAS employs a height-based terrain-following vertical coordinate. The latter discretisation reduces artificial circulations caused by inaccuracies in the horizontal pressure gradient term . It should be noted here that because MPAS was developed out of a regional climate modelling context, the typical model tops in MPAS-A are around 30 to 42 km, while current Earth system models work with model tops of up to 100 km. At the time of writing, work is underway to increase the model tops in MPAS-A for future applications as an Earth system modelling component.

Until recently, advances in computational power following Moore's law were mainly driven by transistor speed and energy scaling, as well as by micro-architecture advances. Physical limitations and practical energy concerns will create new challenges for continued performance scaling in the coming decades. In consequence, large-scale parallelism and the use of accelerators will be required to achieve performance and energy efficiency . Hence, a key aspect of modern numerical codes is their ability to scale on massively parallel systems. The quasi-uniform centroidal Voronoi meshes used by MPAS are similar to icosahedral (hexagonal) meshes and can provide nearly uniform resolution over the globe, as opposed to latitude–longitude grids that require polar filtering to overcome the issue of converging grid lines at the poles. Grids requiring polar filtering or spherical transform methods do not scale very well on massively parallel systems . With MPAS, an efficient parallelisation can be achieved by aligning all grid cells in a 1-D array, with the vertical coordinate stacked on top as the second dimension . Good scaling has been achieved in early weak and strong scaling tests. However, a thorough investigation of the scalability of MPAS on parallel and massively parallel systems has not yet been conducted.

In this study, we investigate the performance of MPAS for different problem sizes on four high-performance computing (HPC) facilities in Europe. For each problem, strong scaling tests are conducted on all four platforms, which cover a range of different architectures to reflect the large variety of computational systems available for research. Additionally, we conduct extreme scaling tests using a 3 km global mesh to study the scalability of MPAS up to nearly half a million tasks and to demonstrate that global, convection-resolving simulations are becoming possible. We explore the limits of the MPAS model when its parallel efficiency breaks down and identify opportunities for improvement. We further derive estimates on the feasibility to conduct longer runs at convection-resolving resolution on current HPC facilities.

We also assess the quality of the MPAS model output in terms of its accuracy for climate modelling. We have chosen to study the particular problem of reproducing the characteristics of the West African monsoon (WAM). The WAM is the most prominent feature of the West African climate and accounts for the majority of the annual precipitation. Differential heating of the ocean and the land surface cause a seasonal change of the large-scale wind systems during boreal summer, which results in the migration of the inter-tropical convergence zone (ITCZ) and the associated rain band northwards over the West African continent. The WAM is driven by a complex and not yet fully understood interplay of various dynamical features e.g.. GCMs often fail to reproduce this annual movement of the ITCZ due to their limited temporal and spatial resolution e.g.. Despite their deficiencies discussed above, regional climate models can improve the representation of precipitation in comparison to their forcing data set . Variable-resolution meshes allow for resolving the region of interest (greater West Africa in this case) at high resolution, while keeping the model aligned with large-scale features outside of this area. It is hoped that this will lead to an improvement of the representation of the WAM. It also opens up the possibility to study processes such as the teleconnection between the Indian monsoon and the WAM , or the impact of land use changes on weather and climate in a consistent approach.

Specifications of the four HPC facilities used for the medium-size scaling tests in Sect. .

TGCC Curiea FZJ Juqueen SCC ForHLR1a FZJ Juropatest System B510 Bullx IBM Blue Gene/Q Megware MiriQuid T-Platform V210s Processor Intel Sandy Bridge IBM PowerPC A2 Intel Ivy Bridge Intel Haswell E5-2680, 2.7 GHz 1.6 GHz E5-2670v2, 2.4 GHz E5-2695v3, 2.3 GHz Cores/node

2×8

1×16

2×20

2×14

Mem./node 64 GB 16 GB 64 GB 128 GB Nodes total 5040 28 672 512 60 MPI network Infiniband QDR 5-D Torus Infiniband FDR Infiniband FDR 5 GB s-1 40 GB s-1 7 GB s-1 7 GB s-1 Filesystem LUSTRE GPFS LUSTRE GPFS 100 GB s-1 200 GB s-1 16 GB s-1 30 GB s-1 I/O network same as MPI 10 Gb Ethernetb same as MPI 10 Gb Ethernet MPI library Bull MPI IBM MPI Intel MPI Intel MPI Compiler Intel icc/ifort IBM bgxlc/bgxlf95 Intel icc/ifort Intel icc/ifort Peak perf. 2 Petaflops 5.9 Petaflops 216 Teraflops 72 Teraflops

a Thin nodes only, ignoring fat nodes (larger memory) and hybrid nodes (GPU accelerators)b Using dedicated I/O nodes, typically 128 I/O nodes per 1024 nodes (1 rack)

The paper is organised as follows: in Sect. , we introduce the HPC facilities used for this study, provide details about the MPAS-A code and present the scaling experiments with moderate problem sizes. We continue in Sect. with an analysis of the physical accuracy of the MPAS model in comparison to observational data and data from our own regional climate modelling experiments. Section is devoted to the extreme scaling tests, and Sect. summarises our findings and gives an outlook on future modelling activities. Lastly, Sect. provides information on how to access the model code and the test cases presented in Sects. and .

Scaling experiments for moderate problem sizes

We perform strong scaling experiments for three different meshes and on four HPC facilities in Europe. The problem sizes range from a regular 120 km mesh with 40 962 cells as the smallest problem to a large, variable-resolution 60–12 km mesh with 535 554 grid cells. An intermediate test case with a variable-resolution 100–25 km mesh with 163 842 grid cells is studied as well.

HPC facilities

Two of the four HPC systems, the Très Grand Centre de Calcul (TGCC) Curie and the Forschungszentrum Jülich (FZJ) Juqueen, belong to the largest machines in Europe and are part of the PRACE Tier-0 pool

http://www.prace-ri.eu/prace-resources

. The technical specifications for each of the systems are summarised in Table , a brief summary is given in the following.

TGCC Curie:

http://www-hpc.cea.fr/en/complexe/tgcc-curie.htm

the TGCC Curie went into service in 2012 and consists of 360 fat nodes and 16 hybrid nodes, not used in this study, and 5040 thin nodes with 2 eight-core Intel Sandy Bridge CPUs. An InfiniBand QDR network is used for both the compute network and the I/O (input and output) to the global LUSTRE file system. With three different node types, Curie addresses a wide range of scientific challenges.

FZJ Juqueen:

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUQUEEN/JUQUEEN_node.html

the FZJ Juqueen is an IBM Blue Gene/Q system and was installed in 2012/13. It hosts 28 racks with 1024 nodes per rack and 16 cores per node, which totals to 458 752 physical cores. Simultaneous multi-threading is supported by the hardware, but not used in this study due to the lack of threading in MPAS-A (see below). A 5-D Torus interconnect is used as compute network, while I/O is redirected to dedicated I/O nodes using a 10 Gb Ethernet to connect to the General Parallel File System (GPFS) file system. With a large number of relatively slow CPUs and a small memory per core, Juqueen most resembles the future massively parallel systems described above. As such, porting and scaling experiments of numerical codes onto this architecture are as challenging as instructive for future applications.

SCC ForHLR1:

http://www.scc.kit.edu/dienste/forhlr.php

the ForHLR1 is the most recent addition to the SCC's HPC systems, installed in September 2014. It hosts different types of nodes to cater for the various needs of the modelling community: the workhorse for parallel applications, and used in this study are 512 thin nodes with 2 10-core Intel Ivy Bridge CPUs, connected via an Infiniband FDR (fourteen data rate) interconnect. A central LUSTRE filesystem is attached to the nodes, using the same Infiniband interconnect for I/O as for the compute network.

FZJ Juropatest:

http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUROPATEST/JUROPATEST_node.html

the FZJ Juropatest cluster is a small but cutting-edge prototype system and consists of 70 T-Platform V210s blades with 2 14-core Intel Haswell processors. The Message Passing Interface (MPI) communication is realised over Infiniband FDR, and the I/O to the central GPFS file system is routed via 10 Gb Ethernet. While optimising the MPAS model for the Haswell features and instruction sets is beyond the scope of this study, it will become an inevitable step in future model development and tuning.

On Juropatest, we conduct two sets of runs for each of the test cases: for the first set (Jtest-half in the following), we use only one of the two available 14-core CPUs in each node, which implies a similar number of cores per node for Curie and Juropatest or, in other words, a similar number of nodes for the same total number of tasks. In this configuration, each task is bound to one core on the node. For the second set (Jtest-full in the following), we use both CPUs, i.e. 28 cores on each node to exploit the capabilities of the Juropatest system and the possible memory bandwidth limitations of MPAS-A.

MPAS-A code

For the strong-scaling studies in this paper, we use MPAS-A v3.1, released on 24 November 2014. This release of the model employs a horizontal domain decomposition for parallel execution, and parallelisation is implemented using MPI only; in this version of the code, no threading is used. The MPAS code is written almost entirely in Fortran 2003, with a few minor parts written in C. The MPAS build system uses only the make utility, with settings for different compilers and architectures described as different targets in the top-level Makefile; see Appendix for the compiler flags used in this study.

The optimal parallelisation and distribution of the cells of the Voronoi mesh for a given number of tasks is treated as a graph partitioning problem. The dual mesh of a Voronoi tessellation is a Delaunay triangulation, which immediately provides the connectivity graph for the primal (i.e. Voronoi) cells in the mesh. In MPAS, the graph partitioning is computed as a separate pre-processing step for which the METIS software is used

http://glaros.dtc.umn.edu/gkhome/views/metis

. An optimal partitioning distributes equal work (by proxy, the number of cells) to each task while minimising the edge cut (assumed to model the communication between tasks). METIS uses a multilevel k-way partitioning scheme, which produces partitions of comparable quality to traditional multilevel bisection algorithms and is about 2 orders of magnitude faster . The resulting graph partitioning can be critical for the model performance due to, for example, a large overhead of communication and computational imbalances between the individual partitions.

Parallel efficiency (%) for the 120 km regular mesh with 40 962 grid cells, the variable 100–25 km mesh with 163 842 grid cells and the variable 60–12 km mesh with 535 554 grid cells.

At start-up, the MPAS-A model reads a file that assigns Voronoi cells to each of the MPI tasks according to a partitioning produced by METIS. The set of cells assigned to an MPI task is referred to as a block, and the cells in this assignment are referred to as the owned cells. The dynamical solver in MPAS-A requires stencils of cells in order to apply various operators, and as part of the model start-up, referred to internally as the bootstrapping process, a pre-determined number of layers of halo cells (sometimes referred to as ghost cells in other modelling systems) are added around each block. Although the number of halo cells can vary between different MPAS models, as illustrated in Fig. (lower left panel), MPAS-A adds two layers of halo cells around each block of cells. A lower bound for the number of halo cells Nh for a given number of owned cells No can be estimated as Nh=πNoπ+22-No. At points in the MPAS-A dynamical solver where current values of fields in halo cells are required, values are communicated between tasks, from owned cells to ghost cells, with point-to-point MPI communications.

An important aspect and common bottleneck in numerical weather prediction and climate modelling is disk I/O, since large 4-D fields such as temperature, geopotential height or wind components need to be written to the disk frequently. In MPAS v3.1, I/O is facilitated by the parallel I/O library (PIO) v1.7.1, a wrapper with an easy-to-use application programme interface (API) that encapsulates the complexity of parallel I/O for a number of supported formats: binary, serial NetCDF

http://www.unidata.ucar.edu/software/netcdf

, Parallel-NetCDF

http://trac.mcs.anl.gov/projects/parallel-netcdf

and recently (since v.1.9.14) parallel NetCDF-4 through PHDF5

http://www.hdfgroup.org/HDF5

. PIO is compiled without further optimisation (standard settings) on all four machines. The HDF5, NetCDF and Parallel-NetCDF libraries are provided as modules on all four systems.

Unless stated otherwise, all experiments are conducted with double precision floating point precision, 41 atmospheric levels, 4 soil levels, a full suite of physics and dynamics (see Appendix for details), and standard disk I/O. Each experiment is run for 24 h model time, during which an initial conditions file is read (init.nc), diagnostic output files are written every 3 h (diag.nc), and a final restart file and a comprehensive output file are written at the end (restart.nc, output.nc). The model integration time step depends on the grid resolution and is mentioned in the individual sections below. Note that for variable-resolution meshes, the global time step is determined by the smallest grid size. By default, all tasks participate in the parallel I/O. Each experiment is repeated once or twice, depending on how close the measured run times are, to account for fluctuations of single experiments.

Regular 120 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">km</mml:mi></mml:math></inline-formula> grid

The first and smallest test case consists of a global, regular 120 km mesh with 40 962 grid cells, which is roughly comparable in resolution to a 284×142 (latitude–longitude) grid. It is thus in the range of current Earth system models. A model integration time step of 150 s is adopted. For a resolution of 120 km, this is an extremely conservative setting (1.25 s per km grid size) for MPAS-A, which implies 576 integration time steps for a 24 h test run, compared to 120–144 integration time steps for typical values of 5–6 s per km grid size. This increases the time spent for the actual parallel integration relative to that for model initialisation and disk I/O. The decomposition of the 120 km grid for 64 tasks is illustrated as an example in Fig. (lower right panel), while Fig. (left panel) shows the scaling plots on the four HPC facilities described above. For an easier comparison of the scalability of the different test cases, the scaling is displayed as a parallel efficiency (i.e. the ratio of real scaling and ideal scaling) vs. the number of tasks (bottom horizontal axis) and number of cells owned per task (top horizontal axis). Table provides further details about the scaling, whereas Table summarises the size of the files to be read and written during one model run.

Previous scaling tests on the NCAR-Wyoming Supercomputing Center's (NWSC) Yellowstone machine

https://www2.cisl.ucar.edu/resources/yellowstone

suggest that for regular meshes, the parallel efficiency of MPAS-A is correlated with the number of cells owned per task. Considering the time required for the solver only, i.e. neglecting the set-up costs and the disk I/O, a parallel efficiency of close to 70 % is obtained for more than 160 cells per task. Here, we include the set-up costs (bootstrapping and reading of initial conditions file) and the output to the disk in the scaling to emphasise the importance of all aspects of the system – from filesystem performance to compute performance to the speed of the interconnect – and to estimate the necessary resource requirements. It should be noted that this can have a negative and noticeable influence on the parallel efficiency, depending on the performance of the parallel I/O operations and the ratio of the time spent for the set-up of the model and the actual time integration. Hence, the threshold of 160 cells owned per task for the breakdown of the parallel efficiency should be considered as a lower limit. In Sect. , we provide a detailed analysis of the costs of the individual steps for the Jtest-full system.

Communication volume as function of number of tasks for the three test cases (left: 120 km; middle: 100–25 km; right: 60–12 km). The dashed lines correspond to an exponential relation with a power-law index of 0.52. For large numbers of tasks, a runaway growth can be seen for the 120 km and to some extent also for the 100–25 km mesh.

On the three Linux-cluster type systems, test runs are conducted on single nodes up to 32 (Curie), 20 (ForHLR1), 15 (Jtest-full) and 30 (Jtest-half) nodes. In the following, we refer to good scaling as a parallel efficiency of ≈70 % or more. Good scaling is achieved up to 6 nodes on Curie (96 tasks, or 427 grid cells per task), 8 nodes on ForHLR1 (160 tasks, or 256 grid cells per task), 8 nodes on Jtest-full (224 tasks, or 183 grid cells per task) and 15 nodes on Jtest-half (210 tasks, or 195 grid cells per task). Comparing Curie and Jtest-half, it is evident that (a) a single Haswell CPU with 14 cores outperforms two Sandy Bridge CPUs with 2×8 cores on the same board, and that (b) the parallel efficiency decreases faster with the number of nodes on Curie. This is probably related to the interconnect: while Juropatest (as well as Yellowstone) uses Infiniband FDR Full Fat Tree technology (FDR, theoretical effective, aggregated throughput 56Gbs-1=7GBs-1) for the inter-process communication (MPI) and a separate 10 Gb Ethernet connection for I/O operations, Curie uses QDR Full Fat Tree technology (quad data rate, theoretical effective, aggregated throughput 32Gbs-1=4GBs-1) for the inter-process communication and for I/O operations. The transition zone for the breakdown of the parallel efficiency between 600 and 150 owned cells per task is indicated as shaded blue area in Fig. . A comparison of the absolute run times on Jtest-half and Jtest-full shows that runs with 28 cores per node are 5–15 % slower than runs with 14 cores per node, which is presumably due to memory bandwidth bottlenecks. An exception here is the 420-task run (30 nodes on Jtest-half, 15 nodes on Jtest-full), for which the increase in inter-node communication is the limiting factor (see also the discussion in Sect. ).

The minimum job (and block) size on Juqueen is 32 nodes or 512 tasks, which corresponds to only around 80 cells owned by each task. The parallel efficiency drops rapidly with an increasing number of nodes, since this problem size is simply too small for application on Juqueen. Figure (left panel) displays the communication volume as a function of the number of tasks, which follows a power law with an index of 0.52 up to about 1000 tasks (approx. 40 owned cells per task). A runaway growth can be seen for larger numbers of tasks. At the same time, the graph partitions become increasingly non-contiguous (not displayed). It should be noted that the communication volume and number of non-contiguous partitions are computed by METIS and as such are a function of the partitioning of the cells only, which essentially assumes a single layer of ghost cells. Since MPAS-A exchanges two layers of ghost cells at maximum, the actual number of ghost cells, edges and vertices can be slightly different. A detailed study of the impact of these graph properties will be given in the following section.

(Left) approximate mesh cell sizes and (right) distribution of mesh cell sizes for (top) the variable 100–25 km mesh with 163 842 grid cells and (bottom) the variable 60–12 km mesh with 535 554 grid cells.

Table lists the cheapest model runs in terms of CPUh spent per 24 h model integration and the fastest runs in terms of real time per 24 h model integration for the four HPC sites. It is important to remember that while the Jtest-half runs use only 50 % of the available cores on each node, the computational costs for the full node (28 CPUh per node per hour real time) are charged for the model run, since the node is not available for other users or jobs. Also, a one-to-one relation of CPUh between Linux cluster-type machines and an IBM Blue Gene is not meaningful. By comparing typical calls for proposals for the different HPC systems, a conversion factor of 1:16 seems to be reasonable, i.e. to consider one entire node with 16 cores on Juqueen as equivalent to one core on the other systems. However, since applications for computing resources usually demand estimates for the required amount of CPUh, we list the actual CPUh here.

Variable 100–25 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">km</mml:mi></mml:math></inline-formula> grid

The second test case employs an irregular mesh with a variable resolution ranging from 100 km for most parts of the globe to 25 km for a circular area spanning about 60∘, and centred on West Africa (lat = 12.5∘ N, long = 0∘ E). An integration time step of 120 s is used. The mesh as well as the frequency distribution of cell sizes are displayed in Fig. (top panels), the scaling is illustrated in Fig. (middle panel) and summarised in Table .

Test runs are conducted on single nodes up to 192 nodes on Curie (3072 tasks, 53 owned cells per task), 60 nodes on ForHLR1 (1200 tasks, 137 owned cells per task), 50 nodes on Jtest-full (1400 tasks, 117 owned cells per task) and 55 nodes on Jtest-half (770 tasks, 213 owned cells per task). Good scaling is achieved up to 32 nodes on Curie (512 tasks, 320 owned cells per task), 20 nodes on ForHLR1 (400 tasks, 410 owned cells per task), 25 nodes on Jtest-full (700 tasks, 234 owned cells per task) and 35 nodes on Jtest-half (490 tasks, 334 owned cells per task). The parallel efficiency on average drops from about 90 to 40 % within the transition zone (shaded area in Fig. , 600 to 150 cells owned per task), similar to the first test case on the regular 120 km mesh.

Notably different to the previous test case are the measured run times for Jtest-full and Jtest-half: for small numbers of tasks, the Jtest-full runs show a worse performance due to the aforementioned memory bandwidth limitations. For large numbers of tasks (nodes), the increase in inter-node MPI communication, which impacts the Jtest-half runs more than the Jtest-full runs, becomes the limiting factor and decreases the performance of the Jtest-half runs below that of the Jtest-full runs. With respect to the remaining HPC systems, a clear separation of the parallel efficiency by interconnect technology as for the 120 km test case cannot be seen here, due to the following reasons.

First, the disk I/O demand scales with the number of grid cells and is larger by a factor of 4 for this mesh (see Table ). As we will see in the following sections, in particular for the extreme scaling experiments in Sect. , the disk I/O becomes increasingly important for larger problem sizes and can consume a significant part of the total run time. The I/O is routed differently at the four HPC sites, the central storage systems have different bandwidths and block sizes, and the parallel I/O libraries might perform differently, depending on the compilers and compilation flags. Additionally, in this test case we adopt an integration time step of 120 s (4.8 s per km grid size), which implies a smaller fraction of the total time spent for the actual time integration relative to the disk I/O.

Ratio of (left) total number of cells, (middle) number of owned cells and (right) number of halo cells per task between the non-contiguous and the contiguous graph partition of the variable 100–25 km mesh for 300 tasks.

Second, the graph partitioning adds another layer of complexity and variability to the performance diagnostics. Figure (middle panel) shows that the above-mentioned power-law relationship between the communication volume and the number of tasks also holds for the variable 100–25 km mesh up to about 40 owned cells per task (4000 tasks). Contrary to the 120 km regular mesh, METIS tends to create more non-contiguous partitions for complex mesh structures (not displayed here). This variability introduced by the graph partitioning is unpredictable and may have a significant impact on the model performance. If not instructed otherwise, the graph partitioning tool METIS aims at minimising the edge cut (communication volume), which potentially comes at the cost of having non-contiguous partitions. As discussed earlier in Sect. , halo cells are added around each patch of the individual tasks, for which communication with the neighbouring tasks is required. The additional amount of communication caused by halo cells around non-contiguous partitions can be substantial, in particular if the number of cells owned per task is small (i.e. the ratio of halo cells to owned cells is large).

To investigate the effect of non-contiguous partitions on the parallel efficiency, we analyse one partition with 300 tasks for the 100–25 km mesh on ForHLR1 (546 owned cells per task), for which METIS by default produces a non-contiguous partition. We create an additional, contiguous partition using the command line arguments -contig -minconn for METIS, which results in an increase of the edge cut from 58 031 to 58 870 (1.4 %). Figure displays the total number of cells (nCellsByTask), the number of owned cells (nCellsSolveByTask) and the number of halo cells per task (nHaloCellsByTask) as a ratio between the non-contiguous and the contiguous partition. Since the communication volume increases for the contiguous partition, the total number of cells per task on average is larger, too. The average number of owned cells is identical, since the number of cells of the graph does not change. Notably, task 200 has a 1.3 times larger number of halo cells for the non-contiguous partition, since its partition consists of two separate patches, which implies a larger number of neighbouring tasks and surrounding halo cells.

To eliminate the influence of the disk I/O on the run times for the two partitions in this test, we switch off the output to the disk. We find that the measured run times for the model integration is practically identical for the two runs (251 s non-contiguous vs. 255 s contiguous). For 300 tasks, the average ratio of halo cells to owned cells is 1 : 2.8, which might be too small to see the effect of the additional halo cells in the non-contiguous partition. We therefore repeat the test for non-contiguous and contiguous partitions for 2520 tasks (65 owned cells per task), with a corresponding ratio of 1.2 : 1 halo cells to owned cells. Even in this case, the measured run times for the model integration are nearly identical (45.2 s contiguous vs. 45.6 s non-contiguous). We conclude therefore that the impact of non-contiguous partitions on the run time is negligible for any reasonable number of tasks for a given mesh.

Although the number of grid cells is 4 times larger for this test case than for the regular 120 km mesh, the problem size is still too small for application on Juqueen. The two smallest possible parallel runs with 512 and 1024 tasks correspond to 320 and 160 cells owned per task, for which the decrease in parallel efficiency is 20 %. Runs with a larger number of tasks all have parallel efficiencies of less than 60 %. Table lists the cheapest and fastest model runs for the four HPC sites.

Variable 60–12 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">km</mml:mi></mml:math></inline-formula> grid

The third moderately sized scaling test consists of a variable-resolution mesh with maximum grid spacing of 60 km and minimum grid spacing of 12 km. The refinement area is an approximate ellipse, illustrated in Fig. (lower panels), and encompasses the whole of north and central Africa, extending as far as India in the east and covering a large part of the Atlantic Ocean in the west. This particular mesh is useful for studying the teleconnection between the Indian and Atlantic oceans and the monsoon systems in East and West Africa. The total number of grid cells is 535 554, which corresponds to 1034×517 grid points on a regular latitude–longitude grid and thus is in the range of current re-analyses. A time step of 72 s (6 s per km grid size) is adopted.

Figure (right panel) and Table summarise the scaling of this mesh on the four systems. As for the variable 100–25 km mesh, a separation of the parallel performance by interconnect cannot be detected, due to the increasing variability introduced by the disk I/O (see Table ). While the number of tasks is sufficiently large to obtain a constant power-law relationship between the communication volume and the number of tasks up to 16 384 tasks on Juqueen (Fig. , right panel), the complexity of the mesh leads to more non-contiguous partitions (not displayed).

Tests runs are conducted on single nodes up to 384 nodes on Curie (6144 tasks, 87 owned cells per task), 120 nodes on ForHLR1 (2400 tasks, 223 owned cells per task), 55 nodes on Jtest-full (1540 tasks, 348 owned cells per task), and 55 nodes on Jtest-half (770 tasks, 696 owned cells per task). Good scaling is achieved up to 48 nodes on Curie (768 tasks, 697 owned cells per task), 45 nodes on ForHLR1 (900 tasks, 595 owned cells per task) and up to the maximum number of 55 nodes on Jtest-full and Jtest-half. In the transition zone between 600 and 150 owned cells per task (shaded in blue), the parallel efficiency on Curie and ForHLR1 drops off quickly from about 80 % to as low as 30 %.

Juropatest shows the best (although most irregular) scaling of the three Linux-cluster systems. With a maximum of 55 available nodes on the system, the parallel performance is better than 70 % for both Jtest-half and Jtest-full. The parallel efficiencies for ForHLR1 and Curie follow a more regular trend, although the number of non-contiguous partitions is highly variable for ForHLR1, but not for Curie. This adds further support to our conclusion that the impact of non-contiguous partitions on the run time is negligible for any reasonable number of tasks for a given mesh. With 535 554 grid cells, this test case also shows good scaling on Juqueen up to 128 nodes (2048 tasks, 262 owned cells per task). For larger number of tasks, the parallel efficiency drops rapidly and constantly and the number of non-contiguous partitions becomes highly variable.

Table lists the cheapest model runs in terms of CPUh spent per 24 h model integration and the fastest runs in terms of real time per 24 h model integration for the four HPC sites.

Total run times (s) for the 120 km regular mesh with 40 962 grid cells, the variable 100–25 km mesh with 163 842 grid cells and the variable 60–12 km mesh with 535 554 grid cells. Indicated are time ratios between the Linux-cluster systems Curie, ForHLR1 and Juropatest, and the Blue Gene system Juqueen. Fits to the total run times are obtained separately for the Linux-cluster systems and the Blue Gene system, using a modified version of Amdahl's law (see Eq. and Table ), and are displayed as dashed black lines.

Comparison of HPC systems

We further address the question how the total run times compare on the various HPC systems. Figure displays the total run times for the 24 h model runs on the four systems. For all test cases, the three Linux-type systems line up quite well, which means that the absolute run times are very close for similar parallel decompositions and that differences in the processor architectures do not translate into model speed-up. Due to the large differences in tasks between the Linux-cluster systems and the Blue Gene system, the problem size must be large enough for a fair comparison. Among the three test cases, only the 60–12 km mesh test case satisfies this requirement.

Fit coefficients of modified Amdahl's law (Eq. ) for the three test cases and the two classes of architectures, with a common value for the exponent X=2.85. For the Blue Gene system, only the fit to for the 60–12 km mesh should be used to estimate the required computing times.

Architecture T(1) (s) A B 120 km Linux-cluster 18 652

1.5×10-3

3.3×10-4

Blue Gene 413 440

1.8×10-4

3.3×10-5

100–25 km Linux-cluster 86 523

8.5×10-4

5.2×10-5

Blue Gene 1 274 880

2.4×10-4

5.5×10-6

60–12 km Linux-cluster 422 424

4.9×10-4

1.4×10-5

Blue Gene 6 579 200

8.4×10-5

6.1×10-6

A common feature across the three test cases is a minimum of the total run times on the three Linux-type systems around 150 owned cells per task. The reason for the increase in total run times for smaller numbers of owned cells per task will be discussed in the following section. Here, we attempt to fit a modified version of Amdahl's law , which also accounts for the observed increase in run times when the parallel performance breaks down: T(n)=T(1)A+1-A-(nB)Xn+nBX. Here, n denotes the number of parallel tasks and T(n) the total run time in seconds. The first term on the right-hand side of Eq. () represents the serial part of the code, the second term the part that scales perfectly and the third term the part that takes longer when more tasks are used. This allows one to quickly estimate the required resources for the test cases presented here for Linux-type systems (and Blue Gene/Q systems) with similar specifications.

The coefficients A and B are allowed to vary across the three test cases and two different classes of architectures. The power-law index X, on the other hand, is derived as the best fit for the three test cases on the Linux-type systems and the 60–12 km on the Blue Gene system. This results in a value of X=2.85 with a good fit to all four curves, which indicates that the increase in total run times below a certain number of owned cells per task occurs for the same reason (see also the discussion in Sect. ).

Required time for a 24 h integration (s) split into time integration, model set-up and disk I/O for the three test cases on Jtest-full. Also displayed are the total time and a combination of time integration and disk I/O to reflect the parallel performance of MPAS-A for longer model runs, for which the initial set-up costs are amortised.

Table summarises the coefficients obtained from fitting Eq. () to each of the six curves. Note that the fits to the 120 km mesh and the 100–25 km mesh on the Blue Gene system are listed here as well, but should be used with precaution. The minimum of T(n) is obtained by setting ∂T(n)/∂n=0n=nmin and solving for the number of owned cells per task No,min=Ncells/nmin. This leads to minimum values of No,min={143,146,182} owned cells per task and Tmin={116,177,402}s for the three Linux-type systems on the 120 km mesh, the 100–25 km mesh and the 60–12 km mesh, respectively. For the Blue Gene system, only the fit for the 60–12 km mesh test case can be considered as realistic and leads to No,min=97 owned cells per task and a minimum run time of Tmin=2155s.

From the discussion in this section, we infer that a lower limit of about 150 owned cells per task is a good and quick estimate for the Linux-type systems below which the parallelisation becomes highly inefficient. The shift of the minimum value to larger numbers of tasks (lower numbers of owned cells per task) on the Blue Gene system can be attributed to the performance of the faster 5-D Torus interconnect relative to that of the slower CPU cores. The resulting fitting curves are also displayed in Fig. .

Breakdown of parallel performance

In the following, we investigate the reasons for the breakdown of the parallel performance. In the previous scaling tests, we include the model set-up – in principle, bootstrapping and reading of the initial conditions file – as well as the parallel output to the disk in the measurements. Here, we split up the total computational costs into three different steps: time integration, model set-up and disk I/O. Over longer runs, the model initialisation costs are amortised and the parallel performance is determined by the numerical solver (time integration) and the parallel I/O (disk output, reading of boundary updates). Accordingly, Fig. displays the total computational costs, the costs for the three steps and a combination of time integration and disk I/O for the Jtest-full system and all test cases. Each time, the parallel efficiency of the combined time integration and disk I/O, indicated with orange stars, starts to decrease for less than 600 owned cells per task. The time integration alone shows nearly ideal scaling down to 150 owned cells per task, while the model set-up and disk I/O are only weakly dependent on the number of tasks.

According to Figs. and , the absolute run times for all three test cases start to increase for less than 150 owned cells per task on the three Linux-cluster systems. For the Blue Gene system, this increase in absolute run times takes place at smaller numbers of owned cells per task, but the picture is less clear due to the small problem sizes. However, given the faster interconnect on the Blue Gene system and the fact that the model initialisation and disk I/O depend only weakly on the number of tasks, this suggests a breakdown of the parallel efficiency of the solver (i.e. the time integration) itself.

To exploit the limiting factors of the solver for a large number of tasks, we use the parallel debugging and profiling tools Scalasca

http://www.scalasca.org

and Score-P

http://www.vi-hps.org/projects/score-p

. These tools are available as modules on Juqueen; hence, we focus on three particular runs on the 60–12 km mesh on this system. We analyse in detail the 2048-task run (261 owned cells per task), the 4096-task run (130 owned cells per task) and the 8192-task run (65 owned cells per task), for which the latter takes longer than the former two runs (Fig. ).

Halo cells are added around each patch of the individual tasks, for which communication with the neighbouring tasks is required. The larger the number of tasks, the smaller is the number of owned cells per tasks, and the larger the ratio between halo cells and owned cells. A lower bound for the number of halo cells Nh for a given number of owned cells No is given by Eq. () and corresponds to a ratio of No:Nh=261:127=2.1 for the 2048-task run, 131:94=1.4 for the 4096-task run and 65:70=0.94 for the 8192-task run.

Table summarises the percentages of time spent during the time integration step for both runs. Leaving aside the model initialisation (initial bootstrapping, reading of initial conditions), the total time spent for communication increases from 35 % for 2048 tasks to 70 % for 8192 tasks. A detailed analysis of the Scalasca report reveals that most of this time is wasted in MPI_WAIT during the exchange of 2-D and 3-D halo fields. About 16 % of the time is spent for parallel I/O, namely for building and writing output streams and for reading boundary updates (sea surface temperature, sea ice fraction). It should be noted here that the selection of fields written to the disk is customisable at run time using simple ASCII files. Table reports on the number of 3-D and 4-D fields written to the disk in our tests.

The Scalasca reports also reveal that computational imbalances in the parallelisation can have serious impacts. For the example of the 2048-task run on the 60–12 km mesh, the average time required to update the boundary information (sea surface temperature, sea ice fraction) is about 11.8 s per task. However, one single task takes 19.3 s for the same action and blocks all remaining tasks in their execution. Since the model synchronisation takes place when exchanging halo cell data, the different computing times of the model physics appear in the time spent for communication.

These results highlight the importance of an efficient parallelisation and fast interconnects between the compute nodes and to the central storage, in particular for future applications on massively parallel systems and for the extreme scaling tests in Sect. .

Reproducing the dynamics of the West African monsoon

The second important aspect of a numerical weather prediction and climate model is its accuracy in reproducing observed meteorological conditions on global, regional and local levels. As described in the introduction, the WAM has turned out to be a notoriously difficult problem in climate modelling, since it is a complex interplay of various dynamical, microphysical and surface-related processes across scales. The common understanding is that the intensity of the monsoon and the associated precipitation on local scales are regulated by the driving, large-scale atmospheric circulation. This picture is challenged and complicated by a recent study of , who found that processes on a local scale, such as mesoscale convection and precipitation events, can have a noticeable influence through feedback effects on the entire monsoon system. Examples are therefore enhanced moisture transport and circulation, and strengthening of westward travelling disturbances (African easterly waves).

In this study, we attempt a first and brief evaluation of the ability of MPAS-A to reproduce the dynamics of the WAM. Due to computational limitations for this short comparison, we are restricted to 11-month long model runs. We focus on the onset of the monsoon season in June/July 1982 for two of the meshes presented above, namely the regular 120 km mesh and the variable 60–12 km mesh. Both models are initialised in September 1981 using Climate Forecast System Reanalysis (CFSR) data NCEP Climate Forecast System Reanalysis; at 0.5∘×0.5∘ resolution as initial conditions. Daily updates of the sea surface temperature are taken from the NOAA Optimum Interpolation Sea Surface Temperature Analysis NOAA OI SST; at 0.25∘×0.25∘ resolution. The period from September 1981 to the beginning of 1982 is considered as spin-up time for the model, in particular for the soil conditions. An analysis of the soil properties over the 11-month runs is undertaken to investigate whether such a short spin-up time is sufficient. The model output is compared to different sets of observational data to account for the large uncertainty in the gridded observational products in the data-sparse region of West Africa see, for example,.

Model topography (terrain height in metres) for the WRF reference and the two MPAS model runs for the West African WRF domain. The left panel also indicates five distinct agro-climatical zones, following a gradient of decreasing annual precipitation from south to north.

For near-surface air temperature and precipitation, we refer to (1) the Climate Research Unit (CRU) high-resolution gridded time-series data set v. 3.22 , and (2) the University of Delaware (UDEL) air temperature and precipitation long-term monthly means V3.01 at 0.5∘×0.5∘. Additional observational data for near-surface air temperature are obtained from the Global Historical Climate Network (GHCN) gridded 2 m temperature data set V2 at 0.5∘×0.5∘. For precipitation, we also use the Global Precipitation Climatology Centre (GPCC) full data re-analysis version 6 monthly means , also at 0.5∘×0.5∘.

We also compare the MPAS-A model output to reference data obtained from a set of novel regional climate simulations over West Africa within the WASCAL program

http://www.wascal.org

. these data are produced using the regional climate model WRF at 12 km resolution with initial and lateral boundary conditions provided by the ERA-Interim re-analysis

80 km resolution

. The WRF model uses a set-up that is optimised for the region of West Africa, following a detailed analysis of the monsoon dynamics for different WRF model configurations by . An extensive documentation and analysis of this reference data set will be given in a forthcoming paper. The WRF model run covers a region from 25∘ W to 25∘ E and 5∘ S to 25∘ N and is initialised in January 1979. Spectral nudging is applied to the WRF limited area model to keep it on track with the large-scale features of the driving ERA-Interim re-analysis. The sea surface temperature is updated every 6 h from the ERA-Interim SST data.

The WRF model domain lies entirely within the refinement zone of the variable 60–12 km mesh. Hence, the resolution of the MPAS-A runs is 120 km for the regular mesh (MPAS-120r hereafter) and 12 km for the variable mesh (MPAS-12v hereafter). Figure shows the model topography of the WRF reference model at 12 km resolution (WRF-12r hereafter) and the MPAS-A models. Also indicated is a classification of the land area into five agro-climatical zones, which will be used in the further analysis. Naturally, the topography is nearly identical for WRF-12r and MPAS-12v. Minor differences can be seen along the coastline and inland water bodies, which are caused by the different grids and by the need to re-grid the MPAS model output onto a regular lat–long grid. This re-gridding is performed with an NCL

http://http://www.ncl.ucar.edu

script mpas_to_latlon.py (L. Fowler, personal communication, 2014), which adds another step to the post-processing tasks and is computationally quite demanding; for example, a minimum of 1.5 GB of memory and 10 s run time is required to re-grid one 3-D variable of MPAS-12v for one time step. For MPAS-120v, the terrain is smoothed out and the coastline (often referred to as landmask in the models) is ill represented. This has negative effects when verifying the model output over regions such as Guinea, as we will see below.

(Top) mean near-surface temperature (over land) in ∘C for July 1982 for the three observational data sets CRU, UDEL, GHCN, the WRF reference and the two MPAS model runs; (bottom) annual cycle of mean near-surface temperature over the entire land area and the five agro-climatical zones depicted in Fig. .

(Top) monthly precipitation (over land) in millimetres for July 1982 for the three observational data sets CRU, UDEL, GPCC, the WRF reference and the two MPAS model runs; (bottom) annual cycle of monthly precipitation over the entire land area and the five agro-climatical zones depicted in Fig. .

In Fig. , we analyse the spatial distribution of the near-surface temperature for July 1982 (top panel) and its annual cycle between September 1981 and July 1982 (bottom panel). The observations provide data over land only and show noticeable spatial differences in the position and intensity of the Saharan heat low (SHL), in particular between CRU/UDEL and GHCN, and minor differences over Ghana and along the coastline. Regarding the model runs, WRF-12r matches the position and temperature of the SHL best and reproduces the observed temperatures. Both MPAS models show a slightly colder surface temperature distribution for July 1982, and the cold bias in MPAS-120r is larger on average. The sea surface temperature distribution is similar for the two MPAS runs, since they are using the same SST data for their daily updates. However, MPAS-120r shows strong artefacts along the coastline of the Gulf of Guinea, which is due to its inaccurate landmask. The WRF-12r SST, which is updated every 6 h from ERA-Interim data, is colder over the Golf of Guinea, but otherwise shows the same patterns. With respect to the temporal evolution, the annual cycle is reproduced well for both MPAS models; however, a significant cold bias is detected over most of the land area from the time of model initialisation in September 1981 up to May 1982. From June to July 1982, this bias seems to nearly vanish with the exception of the coarser MPAS-120v over Guinea due to the limited resolution of the coastline. The observational data sets displayed in the lower panel agree in general, but show small differences for the Saharan region (only CRU and GHCN are displayed for clarity; UDEL is very close to CRU).

Figure likewise displays the spatial distribution of the monthly precipitation for July 1982 (top panel) and its annual cycle (bottom panel) for the three observational data sets and the three models. At the onset of the monsoon season, the rain band is centred over 10∘ N for all three observational data sets. Areas of high precipitation are found over Guinea, Sierra Leone and the Senegal in the west, and over Nigeria, Cameroon and the Central African Republic in the east. While the spatial distribution hardly shows any difference between the observational data sets, the temporal evolution deviates for the Saharan and Guinea regions. For the Saharan region, small absolute values and a very small number of actual observations lead to large relative uncertainties. For Guinea, the spatial interpolation along the coastline leads to differences in the timing of the maximum precipitation (April 1982 for CRU, May 1982 for GPCC; as for temperature, UDEL follows CRU closely).

The three models successfully reproduce the location of the rain band, a fact that should not be taken for granted. In the case of WRF, demonstrated that the representation of the monsoon dynamics is largely determined by the microphysics and the planetary boundary layer schemes, while the cumulus scheme seems to play more of a role on daily timescales and for the actual amount of precipitation triggered by mesoscale convection. The WRF model configuration chosen here uses the WSM5 microphysics scheme, the ACM2 planetary boundary layer parameterisation and the Grell–Freitas cumulus scheme seefor a summary of the WRF physics options and further references. It is highly optimised for the region and thus not only matches the location of the rain band but also reproduces the observed precipitation patterns over the Soudano, the Sahel and the Sahelo regions. The two MPAS model runs, on the other hand, use an out-of-the-box set-up, which consists of the WSM6 microphysics scheme (similar to WSM5 with graupel as additional hydrometeor), the YSU (Yonsei University) planetary boundary layer parameterisation and the Kain–Fritsch cumulus scheme. This particular combination produces excessive precipitation during the peak of the monsoon in WRF due to a non-linear response of convective precipitation events to the dynamics, and it seems to exhibit the same behaviour in the two MPAS models.

(Top) mean sea level pressure, (middle) mean soil temperature and (bottom) mean relative soil moisture over land for July 1982 for the WRF reference and the MPAS model runs.

Annual cycle of (left) mean sea level pressure in hPa, (middle) mean soil temperature in ∘C and (right) mean relative soil moisture in % over land for the WRF reference and the MPAS model runs.

With respect to the annual cycle of precipitation, the WRF model shows excessive precipitation for all months and an early onset of the monsoon season for the Soudano, Sahel and Sahelo regions. The excess in rainfall over all land area is mainly caused by an overestimation over Guinea, which receives the most rainfall over the year (more than 3000 mm year-1, compared to less than 500 mm year-1 in the Sahelo region). The out-of-the-box MPAS models match the observations better, in particular the timing of the onset of the rainy season and the precipitation over Guinea. The coarser MPAS-120r is closer to the observed precipitation over the inland areas than the higher-resolution MPAS-12v during the onset of the monsoon in June/July 1982. This, however, is not a signal of a higher accuracy of the coarser model: at 120 km resolution, the Kain–Fritsch cumulus scheme is less active than at 12 km resolution, and consequently its wet bias is reduced.

To support this further, Fig. a displays the mean sea level pressure map for July 1982. WRF-12r and MPAS-12r both show a distinct area of low pressure, the SHL, alongside the south–north pressure gradient that is causing the movement of the ITCZ and the associated monsoon rain band. This SHL is much less pronounced in MPAS-120r. In both MPAS models, the region of low pressure is displaced by about 10∘ to the east compared to WRF. derived a climatological mean position of 3∘ W, 23∘ N from re-analysis data between 1979 and 2001, which coincides well with WRF.

However, the SHL is not only a region of low surface pressure but also of high surface temperatures. Comparing Figs. and a, it is apparent that the areas of low pressure and high temperature are co-located for WRF-12r. The high-temperature regions of MPAS-120r and MPAS-12v match the WRF SHL position albeit with a cold bias of about 2 ∘C, while the low-pressure region is shifted to the east. This shift is caused by the overall cold bias in the two MPAS models: the SHL, like any other non-frontal thermal low, is formed by rapid solar heating of the land surface and the near-surface layers, which leads to a rising of hot and dense air and to the formation of a stationary low-pressure area. Thus, the cold bias in the MPAS models implies a less intense deepening of the pressure system at the expected location of the SHL. As discussed above, the general cold bias over all land areas persists from the onset of the MPAS model runs in September 1981 and only starts to vanish from June to July 1982 on. This is reflected in the temporal evolution of the mean sea level pressure (Fig. , left panel), integrated over all land areas, which is the highest for MPAS-120r, the lowest for WRF-12r, in-between for MPAS-12v and merging towards each other from June 1982 on.

A final investigation of the soil properties in Figs. b and c and (middle and right panel) reveal the root cause of the cold temperature bias and the associated displacement of the low-pressure system. Starting from the MPAS model initialisation in September 1981, the soil temperature is about 2 ∘C lower then for the WRF model run, which is initialised in January 1979 and thus has more than 2 years to spin up the land surface model (community Noah land surface model for both WRF and MPAS). Over the course of the simulated 11 months, the MPAS soil temperatures start to converge towards the WRF soil temperature: in July 1982, the soil temperatures distributions in Fig. b are already quite similar. However, the relative soil moisture maintains a constant bias for most of the time and still shows spatial differences in July 1982. This is due to the fact that between the MPAS model initialisation in September 1981 and the onset of the monsoon season in 1982, virtually no precipitation occurs over large fractions of the land area and that the inherently low soil moisture simply has no possibility to adapt.

We therefore conclude from this section that the global-to-variable-resolution mesh concept and its implementation in MPAS-A have the potential to reproduce the dynamics of the WAM and its associated precipitation. In this single, 11-month experiment, the out-of-the-box set-up of MPAS can compete with the optimised WRF set-up with respect to the timing of the onset of the rainy season. The high-resolution MPAS-12v model shows a better performance than the coarse-resolution MPAS-120r model in general, but in particular for the coastal regions. Deficiencies in the location of the SHL and a cold bias in the surface temperature are apparent in both MPAS model runs, which are caused to a large extend by an insufficient spin-up time of the models. Detailed investigations of the WRF reference run, not discussed here, suggest that a spin-up time of at least 1 year is required in order to adjust the soil moisture. The MPAS runs also tend to overestimate the monsoon precipitation, which we attribute to the combination of physical parameterisations of the out-of-the-box set-up.

Extreme scaling experiment at very high resolution

In this section, we evaluate the scalability and limitations of the MPAS-A model for a very large mesh on a massively parallel system. This experiment is motivated by the following two aspects:

Future HPC environments will most likely be massively parallel systems, with the number of cores per node and the number of nodes increasing much faster than the speed of the individual cores. Models such as MPAS-A have to be able to scale on these systems in order to be used successfully in the future.

The typical model resolution of global and regional NWP and climate models has increased continuously over the past few decades. Currently, the European Centre for Medium-Range Weather Forecasts (ECMWF) is leading the field with an operational global NWP model at 14 km resolution

http://www.ecmwf.int/en/forecasts/datasets/set-i

, and limited area models have been taken down to sub-kilometre resolution. With this experiment, we want to demonstrate that convection-resolving

below about 4 km grid size; see

for example

, global atmospheric simulations are possible on current HPC environments.

To create a large-enough problem for these tests at a convection-resolving resolution, we use a regular, global 3 km mesh with more than 65 million horizontal grid cells and 41 vertical levels. Up to now, only a few real-data simulations on this mesh have been conducted on NWSC's Yellowstone supercomputer using a maximum of 16 384 MPI tasks (approx. 4000 owned cells per task); a set of scaling benchmarks based on an idealised case have also been run on up to 131 072 MPI tasks on Edison, a Cray XC30 at the National Energy Research Scientific Computing Center

https://www.nersc.gov/users/computational-systems/edison

Among the four HPC sites presented earlier, only the FZJ Juqueen offers a large enough number of cores to conduct this extreme scaling experiment. With a maximum of 458 752 cores, these tests require scaling out to the full system, which is not possible during normal operations. However, following the 3rd Juqueen Tuning and Porting Workshop in February 2015, a few selected applicants were invited to conduct extreme scaling tests on the full machine during a period of 24 h. The results presented in the following were obtained during this event, which is summarised in a technical report by .

Model configuration and experiment preparation

Several modifications of the MPAS-A code are required to conduct this experiment. First, it is no longer possible to use the standard NetCDF large-file format (CDF-2), since the maximum number of elements for some of the 4-D variables exceeds the internal limit of 2 billion values for any particular record. One possible solution is to use the CDF-5 extension of CDF-2, which is supported by the Parallel-NetCDF library. However, only very few applications understand this format, and initial testing on Juqueen revealed problems reading correct data in massively parallel read operations. Another solution, which is adopted here, is to use the newer NetCDF-4 format, which supports parallel I/O through PHDF5. This requires upgrading the parallel I/O library PIO from v1.7.1 to 1.9.15, and modifying the I/O framework of the MPAS-A model code. Motivated by this study, MPAS-A v4.0, released at the time of writing, supports NetCDF-4 I/O without any need to modify the software framework.

Further, the generation of the model input data becomes a large computational problem, which cannot be fit on a single machine due to time constraints and memory limitations. In MPAS release v3.1, the pre-processing of the data is partly a serial process running on one CPU core only. Hence, a parallelisation of the pre-processing is required in addition to the above changes to the I/O routines. These changes are applied to the basic MPAS-A v3.1 code, and this modified version is used in the following scaling experiments.

Another difference from the default model configuration presented in Sect. is that each test is run for 1 h model time only. This implies that the update of the surface data (sea surface temperature, sea ice fraction), which occurs every 24 h in the above moderate scaling tests, is dropped. Also, with a global resolution of 3 km, it is generally agreed that no convection scheme is required, since the microphysics scheme is able to generate the convective precipitation systems on the grid scale. These modifications are reflected in the model namelist in Appendix .

The pre-processing consists of two steps. First, a static data set is produced (static.nc), which maps invariants such as terrain height, landmask and land use classification onto the 3 km mesh. These invariant fields are required as input for the subsequent generation of the initial conditions (init.nc), in part because the terrain field is necessary for the generation of the height-based vertical grid. The parallel pre-processing steps require special mesh partitions, different from the partitions generated by METIS and used for the model runs. Here, both steps are conducted with 576 cores, spread across 80 nodes on the ForHLR1 to provide sufficient memory for each task. Each step takes about 1 h 15 min real time and requires around 4.6 TB of the available 5.1 TB of memory. The resulting initial conditions file has a size of 1.2 TB and is transferred to Juqueen over the 10 Gbs-1 internet connection between the two HPC sites. Since both pre-processing steps are only required once, no further investigation or optimisation of the run times is attempted.

MPAS-A 3 km global simulation experiment (strong scaling tests).

Nodes Tasks (MPI only) Bootstrapping (s) Initial read (s) 4096 65 536 1260 1260 8192 131 072 1370 590 16 384 262 144 1560 1020 24 576 393 216 1680 1080 28 672 458 752 1740 1140 Nodes Integration time f. Parallel efficiency Integration in 1 h model time (s) integration only 24 h wall timea 4096 1760 100.0 % 29 h 8192 960 91.2 % 53 h 16 384 490 90.1 % 104 h 24 576 335 87.7 % 152 h 28 672 360 69.5 % 141 h

a Estimated time extrapolated from 1 h integration, including initial bootstrapping/reading, output to the disk, 12 s time step

First attempts and optimisations

Initial test runs at 3 km resolution revealed previously unknown problems on the system. As described in Sect. , a bootstrapping step is required during the model initialisation to set up the grid and instruct individual tasks with whom to share information about neighbouring grid cells. In the MPAS code, this is implemented using hash tables. In order to complete the bootstrapping in a reasonable time, the hash table size (parameter TABLESIZE) is increased from the default value 27 183 to 6 000 000. After this adjustment, the bootstrapping step takes between 18 and 29 min on Juqueen, depending on the number of MPI tasks (see Table ). A second bottleneck is the reading of the initial conditions file. Performance improvements for this step are achieved by setting two run time environment variables that were presented during the tuning and porting workshop (BGLOCKLESSMPIO_F_TYPE, ROMIO_HINTS), and by optimising the number of I/O tasks. While in the previous scaling tests all tasks participate in the I/O, we find improvements when using only 128 I/O tasks per 1024 nodes (1 rack, 16 384 tasks), with an average read performance of 1.2 GBs-1 and an average write performance of 0.6 GBs-1, respectively. Due to the large number of parallel tasks in this extreme scaling experiment, we use the unit nodes instead of tasks throughout this section, where 1024 nodes correspond to 1 rack or 16 384 tasks.

Required times for individual steps of the 3 km test runs on Juqueen (18 s time step).

With these optimisations, the parallel reading of the initial conditions file improves slightly with the number of tasks, since they are located in different racks. Conversely, the bootstrapping step takes longer for larger numbers of tasks. Hence, the overall model initialisation takes approximately 45 min for the 3 km mesh and varies only slightly with the number of nodes. One notable exception here is the run on 8192 nodes, for which the initial I/O is only 50 % of that of the other runs. The exact reasons for this behaviour need to be investigated, but we think that this combination of file size and I/O tasks is a sweet spot on Juqueen.

Execution of extreme scaling tests

The substantial memory requirements for the 3 km mesh do not allow one to run the model on 1024 or 2048 nodes. The baseline for our scaling experiment is therefore the run on 4096 nodes (4 racks, 65 536 MPI tasks, 512 I/O tasks, 65 TB memory). Contrary to the model initialisation, the time integration step scales very well up to the entire machine, with a parallel efficiency of 87 % for 24 576 nodes (393 216 tasks, 167 cells per task) compared to the baseline (see Table ). The test run on the full system with 28 672 nodes (458 752 tasks, 143 cells per task) shows a lower performance than the run on 24 racks and a parallel efficiency of nearly 70 %, which confirms the above-mentioned limit of approximately 150 owned cells per task, below which the parallelisation becomes inefficient.

All scaling tests are conducted with an 18 s model integration time step for a 1 h model time. However, we find that in order to keep the model stable when starting off the 3 km mesh from initial conditions derived from a 48 km re-analysis data set (CFSR), a more conservative time step is required. MPAS currently lacks a dynamical initialisation system (e.g. digital filters, adaptive time-stepping), which could avoid this issue. Model instabilities leading to NaNs can affect the performance in different ways: (1) the performance might increase in cases where if-NaN-tests causes the code to return early from computationally intensive physics routines, or (2) the performance might decrease due to the continuous generation of floating-point exceptions. We note that on Blue Gene systems, MPAS does not by default include the -qflttrap compiler flag to trap floating-point exceptions, and, consequently, the model will continue to run even when the simulation generates NaNs. We therefore repeat the runs for 4096, 8192 and 16 384 nodes with a 12 s time step to obtain a stable model run. The measured real times for the three 12 s runs are very close to 1.5 times the real times for the corresponding 18 s runs, which gives us confidence that we can scale the results for 24 576 and 28 672 nodes with an 18 s time step to a 12 s time step, despite the fact that the runs with an 18 s time step produced NaNs.

Table and Fig. summarise the required times of the individual steps of the 3 km runs with an 18 s integration time step. Due to wall-time constraints, we only conduct runs without writing output to the disk. The last column in Table estimates how many hours the 3 km model can be advanced within 24 h wall time, and is calculated as follows: a 12 s model integration time step is assumed, and the real time required is scaled from the 18 s runs by a factor of 1.5 for 24 576 and 28 672 nodes, for which no 12 s runs are conducted. For a typical production run, diagnostic output files of 13 GB size are written every 3 h model time, while comprehensive output files of approximately 250 GB size are written every 24 h model time. A restart file of 2.1 TB size is written at the end of the model run. Based on a parallel write performance of 0.6 GBs-1, we make a conservative estimate that roughly 2 hours of the 24 h wall time will be used up by writing these files to the disk. Tables and list the file sizes and the cheapest and fastest model runs for the 3 km mesh on Juqueen.

We conclude from this extreme scaling test that the dynamical solver of MPAS scales on massively parallel systems out to hundreds of thousands of cores. Our results confirm that the model behaves similar for the 3 km mesh than for the significantly smaller problem sizes and that the parallel efficiency is limited by the same factors, namely the increasing number of halo cells and the amount of communication for a large number of tasks. This occurs around 150 owned cells per task, which corresponds to roughly 27 300 nodes for the 3 km mesh. However, we find that the model initialisation and the disk I/O become increasingly important and at the same time difficult to improve for extremely large test cases. Compared to the model integration, the time required for the model initialisation and for reading and writing data is largely independent of the number of tasks. For a maximum wall time of 24 h on Juqueen, these steps consume up to 3 h or 10–15 % of the total job time – in other words, hundreds of thousands of CPUh. The cheapest run on Juqueen utilises 4096 nodes (4 racks, 65 536 tasks), consumes 1.3 Mio CPUh for a 24 h model integration and gives a speed-up of 1.2 × real time. For 24 576 nodes (24 racks, 393 216 tasks), a speed-up of 6.3 × real time is achieved at the slightly higher expense of 1.5 Mio CPUh.

Conclusions

In this study, we analyse the atmospheric model MPAS-A in detail for its numerical performance and for its physical accuracy. We conduct scaling tests for three medium-size problems using regular and variable meshes of different complexity on four different HPC facilities. We confirm an overall good scaling (⪆ 70 % parallel efficiency) of MPAS across all systems and find that a value 150 cells owned by each task provides a good and quick estimate for the breakdown of the parallel performance when set-up and I/O costs are included in the scaling. This limit depends weakly on the interconnect of the system (absolute but also relative to the core speed), with faster interconnects corresponding to lower values, but also on the I/O performance. Accordingly, it is slightly lower for the Blue Gene system (between 100 owned cells per task for the 60–12 km mesh and 150 owned cells per task for the 3 km mesh). Taking into account that the set-up costs are amortised over longer runs, MPAS-A maintains a parallel efficiency of 80 % or better for more than 150 owned cells (100 owned cells on Juqueen) per task. Based on these findings, we provide numbers on the typical file sizes and optimal model configurations for conducting research and operational runs on the different HPC systems.

An in-depth analysis of the properties of different graph partitions for one of the meshes shows that the impact of non-contiguous graph partitions in form of changes in the number of halo cells is negligible for any reasonable number of tasks for a given problem size. We further employ the parallel profiling tools Scalasca and Score-P to identify the bottlenecks in the MPAS-A code when the parallel performance breaks down. Our findings confirm that most of the time in such cases is spent waiting during the communication with neighbouring tasks, but also demonstrate the negative impact that computational imbalances can have on the model performance.

We also study the accuracy of MPAS-A for one common and challenging problem in climate research, namely the capability to reproduce the dynamics of the WAM and its associated precipitation. We conduct 11-month simulations for two meshes, a regular 120 km mesh and a variable 60–12 km mesh, and compare the model output to a number of observation data sets, selected re-analyses and a reference model run. The reference model run is chosen from a novel set of regional climate simulations over West Africa within the framework of the WASCAL programme and employs the regional climate model WRF, from which MPAS inherits several aspects of the dynamical solver and all of its physical parameterisation schemes.

We find that MPAS-A is able to model the monsoon dynamics and the northwards movement of the monsoon rain band in this pilot study. Despite using an out-of-the-box configuration of the model, both runs reproduce the timing of the onset of the monsoon season better than the optimised WRF reference run. We find that the precipitation in the early monsoon season is overestimated, which we attribute to the choice of physics parameterisations. The MPAS model runs also show a cold bias in the near-surface temperature and consequently fail to place the SHL at the correct location, which we believe stems from a too short spin-up time of the model. We would like to stress that our pilot study aims primarily at investigating the feasibility to conduct climate modelling research with MPAS for the West African region. In order to be able to draw general conclusions on the ability of MPAS to reproduce observed climatological properties, an ensemble of model runs over longer time periods with sufficiently long spin-up times is required.

In the last part of this study, we conduct extreme scaling tests on a global 3 km mesh with more than 65 million grid cells on up to 458 752 cores on Juqueen, the IBM Blue Gene/Q at the Forschungszentrum Jülich. We describe the issues that arise when attempting such an experiment for the first time – up to now, MPAS-A has been run for real-data cases, which include a full physics suite, on a maximum number of 16 384 tasks on Yellowstone – and provide solutions that allow one to conduct the scaling test in the first place and improve the model performance. We find that the model scales very well up to the entire machine with a parallel performance of nearly 70 % for 458 752 tasks. We confirm that the limitations and rules to estimate the scaling, derived for moderate problem sizes, are also valid for the extremely large test case. This gives confidence for planning model experiments and estimating required run times, storage and computational resources.

Furthermore, we identify additional aspects in the model that become increasingly relevant for larger problem sizes: the model initialisation and the disk I/O. We describe strategies to improve the performance of the model, which are partly machine dependent. We further give estimates on required run times and resources for conducting scientific experiments with the 3 km mesh on Juqueen.

Our next steps will be to conduct a number of longer simulation experiments on regular and variable-resolution meshes with a moderate number of grid cells. Specifically, we plan to pursue the study on the dynamics of the WAM using a variable-resolution grid such as the 60–12 km mesh and a regular grid with a similarly fine resolution. This will allow us to compare the accuracy of the model after a full spin-up of the soil conditions and to assess the impact of the variable mesh on the model results. It will also allow us to study physical processes such as the teleconnection between the oceans and the African monsoon systems, and investigate the impact of climate and land use changes in a consistent approach.

In conclusion, the MPAS-A model is a novel atmospheric model that scales well on a range of architectures for small up to extremely large numbers of tasks. Based on an unstructured Voronoi mesh, it allows to conduct global simulations with local refinement regions and smooth transition in-between them. This makes it possible to study local-scale processes in regions of interest with a full coupling to the large-scale motions and a physical consistency within the model. This is shown here in a pilot study of the WAM. We also demonstrate that it is possible to conduct global, convection-resolving atmospheric simulations with MPAS on current and future massively parallel systems. However, it is evident that the application of models such as MPAS for extremely large problem sizes and numbers of tasks requires substantial efforts to optimise the model to the problem and to the machine it is run on. In order to do so, interdisciplinary approaches and more intensive training of scientist on hardware, software and programming techniques are paramount.

MPAS compiler flags

We use the following compiler optimisation flags for the Intel compilers icc and ifort on Curie, ForHLR1 and Juropatest: For the IBM XL compilers mpixlc_r and mpixlf95_r on Juqueen, the following flags are used:

Configuration for moderate problem sizes

The following model configuration in terms of the usual namelist (namelist.atmosphere) is used for the experiments in Sect. – (regular 120 km mesh, variable 100–25 km mesh, variable 60–12 km mesh). Differences in the set-up (e.g. model integration time step) are indicated in namelist-style comments. Details about the structure of the namelist file and the available options can be found in the MPAS-Atmosphere Model User's Guide .

Configuration for extreme scaling tests

For the 3 km extreme scaling tests on Juqueen, the following model configuration is used. For details, the reader is referred to the MPAS-Atmosphere Model User's Guide .

<?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.T1"><?xmltex \hack{\hsize\textwidth}?><caption><p>Scaling tests for the 120 km regular mesh on the four HPC sites.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7"> <oasis:colspec colnum="1" colname="col1" align="left"/> <oasis:colspec colnum="2" colname="col2" align="right"/> <oasis:colspec colnum="3" colname="col3" align="right"/> <oasis:colspec colnum="4" colname="col4" align="right"/> <oasis:colspec colnum="5" colname="col5" align="right"/> <oasis:colspec colnum="6" colname="col6" align="right"/> <oasis:colspec colnum="7" colname="col7" align="right"/> <oasis:thead> <oasis:row> <oasis:entry colname="col1">HPC site</oasis:entry> <oasis:entry colname="col2">Nodes</oasis:entry> <oasis:entry colname="col3">Tasks</oasis:entry> <oasis:entry colname="col4">Real time (s)</oasis:entry> <oasis:entry colname="col5">CPUh</oasis:entry> <oasis:entry colname="col6">Parallel</oasis:entry> <oasis:entry colname="col7">Cells/</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4">per 24 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">h</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col5">per 24 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">h</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col6">efficiency</oasis:entry> <oasis:entry colname="col7">task</oasis:entry> </oasis:row> </oasis:thead> <oasis:tbody> <oasis:row> <oasis:entry colname="col1">Curie</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">16</oasis:entry> <oasis:entry colname="col4">1306</oasis:entry> <oasis:entry colname="col5">5.8</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">2560</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">32</oasis:entry> <oasis:entry colname="col4">633</oasis:entry> <oasis:entry colname="col5">5.6</oasis:entry> <oasis:entry colname="col6">103.16 %</oasis:entry> <oasis:entry colname="col7">1280</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">64</oasis:entry> <oasis:entry colname="col4">377</oasis:entry> <oasis:entry colname="col5">6.7</oasis:entry> <oasis:entry colname="col6">86.6 %</oasis:entry> <oasis:entry colname="col7">640</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">6</oasis:entry> <oasis:entry colname="col3">96</oasis:entry> <oasis:entry colname="col4">305</oasis:entry> <oasis:entry colname="col5">8.1</oasis:entry> <oasis:entry colname="col6">71.37 %</oasis:entry> <oasis:entry colname="col7">427</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">128</oasis:entry> <oasis:entry colname="col4">269</oasis:entry> <oasis:entry colname="col5">9.6</oasis:entry> <oasis:entry colname="col6">60.69 %</oasis:entry> <oasis:entry colname="col7">320</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">160</oasis:entry> <oasis:entry colname="col4">236</oasis:entry> <oasis:entry colname="col5">10.5</oasis:entry> <oasis:entry colname="col6">55.34 %</oasis:entry> <oasis:entry colname="col7">256</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">12</oasis:entry> <oasis:entry colname="col3">192</oasis:entry> <oasis:entry colname="col4">212</oasis:entry> <oasis:entry colname="col5">11.3</oasis:entry> <oasis:entry colname="col6">51.34 %</oasis:entry> <oasis:entry colname="col7">213</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">16</oasis:entry> <oasis:entry colname="col3">256</oasis:entry> <oasis:entry colname="col4">187</oasis:entry> <oasis:entry colname="col5">13.3</oasis:entry> <oasis:entry colname="col6">43.65 %</oasis:entry> <oasis:entry colname="col7">160</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">24</oasis:entry> <oasis:entry colname="col3">384</oasis:entry> <oasis:entry colname="col4">182</oasis:entry> <oasis:entry colname="col5">19.4</oasis:entry> <oasis:entry colname="col6">29.9 %</oasis:entry> <oasis:entry colname="col7">107</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">32</oasis:entry> <oasis:entry colname="col3">512</oasis:entry> <oasis:entry colname="col4">181</oasis:entry> <oasis:entry colname="col5">25.7</oasis:entry> <oasis:entry colname="col6">22.55 %</oasis:entry> <oasis:entry colname="col7">80</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">ForHLR1</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">20</oasis:entry> <oasis:entry colname="col4">863</oasis:entry> <oasis:entry colname="col5">4.8</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">2048</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">40</oasis:entry> <oasis:entry colname="col4">486</oasis:entry> <oasis:entry colname="col5">5.4</oasis:entry> <oasis:entry colname="col6">88.79 %</oasis:entry> <oasis:entry colname="col7">1024</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">3</oasis:entry> <oasis:entry colname="col3">60</oasis:entry> <oasis:entry colname="col4">317</oasis:entry> <oasis:entry colname="col5">5.3</oasis:entry> <oasis:entry colname="col6">90.75 %</oasis:entry> <oasis:entry colname="col7">683</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">80</oasis:entry> <oasis:entry colname="col4">255</oasis:entry> <oasis:entry colname="col5">5.7</oasis:entry> <oasis:entry colname="col6">84.61 %</oasis:entry> <oasis:entry colname="col7">512</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">6</oasis:entry> <oasis:entry colname="col3">120</oasis:entry> <oasis:entry colname="col4">190</oasis:entry> <oasis:entry colname="col5">6.3</oasis:entry> <oasis:entry colname="col6">75.7 %</oasis:entry> <oasis:entry colname="col7">341</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">160</oasis:entry> <oasis:entry colname="col4">149</oasis:entry> <oasis:entry colname="col5">6.6</oasis:entry> <oasis:entry colname="col6">72.4 %</oasis:entry> <oasis:entry colname="col7">256</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">200</oasis:entry> <oasis:entry colname="col4">135</oasis:entry> <oasis:entry colname="col5">7.5</oasis:entry> <oasis:entry colname="col6">63.93 %</oasis:entry> <oasis:entry colname="col7">205</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">12</oasis:entry> <oasis:entry colname="col3">240</oasis:entry> <oasis:entry colname="col4">128</oasis:entry> <oasis:entry colname="col5">8.5</oasis:entry> <oasis:entry colname="col6">56.18 %</oasis:entry> <oasis:entry colname="col7">171</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">16</oasis:entry> <oasis:entry colname="col3">320</oasis:entry> <oasis:entry colname="col4">127</oasis:entry> <oasis:entry colname="col5">11.3</oasis:entry> <oasis:entry colname="col6">42.47 %</oasis:entry> <oasis:entry colname="col7">128</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">18</oasis:entry> <oasis:entry colname="col3">360</oasis:entry> <oasis:entry colname="col4">131</oasis:entry> <oasis:entry colname="col5">13.1</oasis:entry> <oasis:entry colname="col6">36.6 %</oasis:entry> <oasis:entry colname="col7">114</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">20</oasis:entry> <oasis:entry colname="col3">400</oasis:entry> <oasis:entry colname="col4">130</oasis:entry> <oasis:entry colname="col5">14.4</oasis:entry> <oasis:entry colname="col6">33.19 %</oasis:entry> <oasis:entry colname="col7">102</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-full</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">28</oasis:entry> <oasis:entry colname="col4">651</oasis:entry> <oasis:entry colname="col5">5.1</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">1463</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">56</oasis:entry> <oasis:entry colname="col4">333</oasis:entry> <oasis:entry colname="col5">5.2</oasis:entry> <oasis:entry colname="col6">97.75 %</oasis:entry> <oasis:entry colname="col7">731</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">112</oasis:entry> <oasis:entry colname="col4">177</oasis:entry> <oasis:entry colname="col5">5.5</oasis:entry> <oasis:entry colname="col6">91.95 %</oasis:entry> <oasis:entry colname="col7">366</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">6</oasis:entry> <oasis:entry colname="col3">168</oasis:entry> <oasis:entry colname="col4">133</oasis:entry> <oasis:entry colname="col5">6.2</oasis:entry> <oasis:entry colname="col6">81.58 %</oasis:entry> <oasis:entry colname="col7">244</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">224</oasis:entry> <oasis:entry colname="col4">109</oasis:entry> <oasis:entry colname="col5">6.8</oasis:entry> <oasis:entry colname="col6">74.66 %</oasis:entry> <oasis:entry colname="col7">183</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">280</oasis:entry> <oasis:entry colname="col4">102</oasis:entry> <oasis:entry colname="col5">7.9</oasis:entry> <oasis:entry colname="col6">63.82 %</oasis:entry> <oasis:entry colname="col7">146</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">12</oasis:entry> <oasis:entry colname="col3">336</oasis:entry> <oasis:entry colname="col4">92</oasis:entry> <oasis:entry colname="col5">8.6</oasis:entry> <oasis:entry colname="col6">58.97 %</oasis:entry> <oasis:entry colname="col7">122</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">420</oasis:entry> <oasis:entry colname="col4">91</oasis:entry> <oasis:entry colname="col5">10.6</oasis:entry> <oasis:entry colname="col6">47.69 %</oasis:entry> <oasis:entry colname="col7">98</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-half</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">14</oasis:entry> <oasis:entry colname="col4">1172</oasis:entry> <oasis:entry colname="col5">9.1</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">2926</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">28</oasis:entry> <oasis:entry colname="col4">628</oasis:entry> <oasis:entry colname="col5">9.8</oasis:entry> <oasis:entry colname="col6">93.31 %</oasis:entry> <oasis:entry colname="col7">1463</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">56</oasis:entry> <oasis:entry colname="col4">317</oasis:entry> <oasis:entry colname="col5">9.9</oasis:entry> <oasis:entry colname="col6">92.43 %</oasis:entry> <oasis:entry colname="col7">731</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">6</oasis:entry> <oasis:entry colname="col3">84</oasis:entry> <oasis:entry colname="col4">220</oasis:entry> <oasis:entry colname="col5">10.3</oasis:entry> <oasis:entry colname="col6">88.79 %</oasis:entry> <oasis:entry colname="col7">488</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">112</oasis:entry> <oasis:entry colname="col4">167</oasis:entry> <oasis:entry colname="col5">10.4</oasis:entry> <oasis:entry colname="col6">87.72 %</oasis:entry> <oasis:entry colname="col7">366</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">140</oasis:entry> <oasis:entry colname="col4">142</oasis:entry> <oasis:entry colname="col5">11.0</oasis:entry> <oasis:entry colname="col6">82.54 %</oasis:entry> <oasis:entry colname="col7">293</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">210</oasis:entry> <oasis:entry colname="col4">95</oasis:entry> <oasis:entry colname="col5">11.1</oasis:entry> <oasis:entry colname="col6">82.25 %</oasis:entry> <oasis:entry colname="col7">195</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">20</oasis:entry> <oasis:entry colname="col3">280</oasis:entry> <oasis:entry colname="col4">87</oasis:entry> <oasis:entry colname="col5">13.5</oasis:entry> <oasis:entry colname="col6">67.36 %</oasis:entry> <oasis:entry colname="col7">146</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">25</oasis:entry> <oasis:entry colname="col3">350</oasis:entry> <oasis:entry colname="col4">88</oasis:entry> <oasis:entry colname="col5">17.1</oasis:entry> <oasis:entry colname="col6">53.27 %</oasis:entry> <oasis:entry colname="col7">117</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">30</oasis:entry> <oasis:entry colname="col3">420</oasis:entry> <oasis:entry colname="col4">99</oasis:entry> <oasis:entry colname="col5">23.1</oasis:entry> <oasis:entry colname="col6">39.46 %</oasis:entry> <oasis:entry colname="col7">98</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Juqueen</oasis:entry> <oasis:entry colname="col2">32</oasis:entry> <oasis:entry colname="col3">512</oasis:entry> <oasis:entry colname="col4">661</oasis:entry> <oasis:entry colname="col5">94.0</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">80</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">64</oasis:entry> <oasis:entry colname="col3">1024</oasis:entry> <oasis:entry colname="col4">477</oasis:entry> <oasis:entry colname="col5">135.7</oasis:entry> <oasis:entry colname="col6">69.29 %</oasis:entry> <oasis:entry colname="col7">40</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">96</oasis:entry> <oasis:entry colname="col3">1536</oasis:entry> <oasis:entry colname="col4">586</oasis:entry> <oasis:entry colname="col5">250.0</oasis:entry> <oasis:entry colname="col6">37.6 %</oasis:entry> <oasis:entry colname="col7">27</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">128</oasis:entry> <oasis:entry colname="col3">2048</oasis:entry> <oasis:entry colname="col4">608</oasis:entry> <oasis:entry colname="col5">345.9</oasis:entry> <oasis:entry colname="col6">27.18 %</oasis:entry> <oasis:entry colname="col7">20</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">192</oasis:entry> <oasis:entry colname="col3">3072</oasis:entry> <oasis:entry colname="col4">751</oasis:entry> <oasis:entry colname="col5">640.9</oasis:entry> <oasis:entry colname="col6">14.67 %</oasis:entry> <oasis:entry colname="col7">13</oasis:entry> </oasis:row> </oasis:tbody> </oasis:tgroup></oasis:table></table-wrap> <?xmltex \hack{\clearpage}?> </app> <app id="App1.Ch1.S5"> <title/> <?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.T2"><?xmltex \hack{\hsize\textwidth}?><caption><p>Scaling tests for the 100–25 km variable mesh on the four HPC sites.</p></caption><oasis:table frame="topbot"><?xmltex \begin{scaleboxenv}{.87}[.87]?><oasis:tgroup cols="7"> <oasis:colspec colnum="1" colname="col1" align="left"/> <oasis:colspec colnum="2" colname="col2" align="right"/> <oasis:colspec colnum="3" colname="col3" align="right"/> <oasis:colspec colnum="4" colname="col4" align="right"/> <oasis:colspec colnum="5" colname="col5" align="right"/> <oasis:colspec colnum="6" colname="col6" align="right"/> <oasis:colspec colnum="7" colname="col7" align="right"/> <oasis:thead> <oasis:row> <oasis:entry colname="col1">HPC site</oasis:entry> <oasis:entry colname="col2">Nodes</oasis:entry> <oasis:entry colname="col3">Tasks</oasis:entry> <oasis:entry colname="col4">Real time (s)</oasis:entry> <oasis:entry colname="col5">CPUh</oasis:entry> <oasis:entry colname="col6">Parallel</oasis:entry> <oasis:entry colname="col7">Cells/</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4">per 24 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">h</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col5">per 24 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">h</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col6">efficiency</oasis:entry> <oasis:entry colname="col7">task</oasis:entry> </oasis:row> </oasis:thead> <oasis:tbody> <oasis:row> <oasis:entry colname="col1">Curie</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">16</oasis:entry> <oasis:entry colname="col4">6086</oasis:entry> <oasis:entry colname="col5">27.05</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">10 240</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">64</oasis:entry> <oasis:entry colname="col4">1528</oasis:entry> <oasis:entry colname="col5">27.16</oasis:entry> <oasis:entry colname="col6">99.57 %</oasis:entry> <oasis:entry colname="col7">2560</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">128</oasis:entry> <oasis:entry colname="col4">782</oasis:entry> <oasis:entry colname="col5">27.8</oasis:entry> <oasis:entry colname="col6">97.28 %</oasis:entry> <oasis:entry colname="col7">1280</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">16</oasis:entry> <oasis:entry colname="col3">256</oasis:entry> <oasis:entry colname="col4">424</oasis:entry> <oasis:entry colname="col5">30.15</oasis:entry> <oasis:entry colname="col6">89.71 %</oasis:entry> <oasis:entry colname="col7">640</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">24</oasis:entry> <oasis:entry colname="col3">384</oasis:entry> <oasis:entry colname="col4">314</oasis:entry> <oasis:entry colname="col5">33.49</oasis:entry> <oasis:entry colname="col6">80.76 %</oasis:entry> <oasis:entry colname="col7">427</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">32</oasis:entry> <oasis:entry colname="col3">512</oasis:entry> <oasis:entry colname="col4">265</oasis:entry> <oasis:entry colname="col5">37.69</oasis:entry> <oasis:entry colname="col6">71.77 %</oasis:entry> <oasis:entry colname="col7">320</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">48</oasis:entry> <oasis:entry colname="col3">768</oasis:entry> <oasis:entry colname="col4">223</oasis:entry> <oasis:entry colname="col5">47.57</oasis:entry> <oasis:entry colname="col6">56.86 %</oasis:entry> <oasis:entry colname="col7">213</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">64</oasis:entry> <oasis:entry colname="col3">1024</oasis:entry> <oasis:entry colname="col4">263</oasis:entry> <oasis:entry colname="col5">74.81</oasis:entry> <oasis:entry colname="col6">36.16 %</oasis:entry> <oasis:entry colname="col7">160</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">96</oasis:entry> <oasis:entry colname="col3">1536</oasis:entry> <oasis:entry colname="col4">271</oasis:entry> <oasis:entry colname="col5">115.63</oasis:entry> <oasis:entry colname="col6">23.39 %</oasis:entry> <oasis:entry colname="col7">107</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">128</oasis:entry> <oasis:entry colname="col3">2048</oasis:entry> <oasis:entry colname="col4">293</oasis:entry> <oasis:entry colname="col5">166.68</oasis:entry> <oasis:entry colname="col6">16.23 %</oasis:entry> <oasis:entry colname="col7">80</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">192</oasis:entry> <oasis:entry colname="col3">3072</oasis:entry> <oasis:entry colname="col4">541</oasis:entry> <oasis:entry colname="col5">461.65</oasis:entry> <oasis:entry colname="col6">5.86 %</oasis:entry> <oasis:entry colname="col7">53</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">ForHLR1</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">20</oasis:entry> <oasis:entry colname="col4">4413</oasis:entry> <oasis:entry colname="col5">24.52</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">8192</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">80</oasis:entry> <oasis:entry colname="col4">1129</oasis:entry> <oasis:entry colname="col5">25.09</oasis:entry> <oasis:entry colname="col6">97.72 %</oasis:entry> <oasis:entry colname="col7">2048</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">160</oasis:entry> <oasis:entry colname="col4">555</oasis:entry> <oasis:entry colname="col5">24.67</oasis:entry> <oasis:entry colname="col6">99.39 %</oasis:entry> <oasis:entry colname="col7">1024</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">200</oasis:entry> <oasis:entry colname="col4">474</oasis:entry> <oasis:entry colname="col5">26.33</oasis:entry> <oasis:entry colname="col6">93.1 %</oasis:entry> <oasis:entry colname="col7">819</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">300</oasis:entry> <oasis:entry colname="col4">346</oasis:entry> <oasis:entry colname="col5">28.83</oasis:entry> <oasis:entry colname="col6">85.03 %</oasis:entry> <oasis:entry colname="col7">546</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">20</oasis:entry> <oasis:entry colname="col3">400</oasis:entry> <oasis:entry colname="col4">316</oasis:entry> <oasis:entry colname="col5">35.11</oasis:entry> <oasis:entry colname="col6">69.83 %</oasis:entry> <oasis:entry colname="col7">410</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">25</oasis:entry> <oasis:entry colname="col3">500</oasis:entry> <oasis:entry colname="col4">257</oasis:entry> <oasis:entry colname="col5">35.69</oasis:entry> <oasis:entry colname="col6">68.68 %</oasis:entry> <oasis:entry colname="col7">328</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">30</oasis:entry> <oasis:entry colname="col3">600</oasis:entry> <oasis:entry colname="col4">239</oasis:entry> <oasis:entry colname="col5">39.83</oasis:entry> <oasis:entry colname="col6">61.55 %</oasis:entry> <oasis:entry colname="col7">273</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">40</oasis:entry> <oasis:entry colname="col3">800</oasis:entry> <oasis:entry colname="col4">253</oasis:entry> <oasis:entry colname="col5">56.22</oasis:entry> <oasis:entry colname="col6">43.61 %</oasis:entry> <oasis:entry colname="col7">205</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">45</oasis:entry> <oasis:entry colname="col3">900</oasis:entry> <oasis:entry colname="col4">196</oasis:entry> <oasis:entry colname="col5">49.0</oasis:entry> <oasis:entry colname="col6">50.03 %</oasis:entry> <oasis:entry colname="col7">182</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">50</oasis:entry> <oasis:entry colname="col3">1000</oasis:entry> <oasis:entry colname="col4">208</oasis:entry> <oasis:entry colname="col5">57.78</oasis:entry> <oasis:entry colname="col6">42.43 %</oasis:entry> <oasis:entry colname="col7">164</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">55</oasis:entry> <oasis:entry colname="col3">1100</oasis:entry> <oasis:entry colname="col4">217</oasis:entry> <oasis:entry colname="col5">66.31</oasis:entry> <oasis:entry colname="col6">36.98 %</oasis:entry> <oasis:entry colname="col7">149</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">60</oasis:entry> <oasis:entry colname="col3">1200</oasis:entry> <oasis:entry colname="col4">210</oasis:entry> <oasis:entry colname="col5">70.0</oasis:entry> <oasis:entry colname="col6">35.02 %</oasis:entry> <oasis:entry colname="col7">137</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-full</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">28</oasis:entry> <oasis:entry colname="col4">3361</oasis:entry> <oasis:entry colname="col5">26.14</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">5852</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">56</oasis:entry> <oasis:entry colname="col4">1644</oasis:entry> <oasis:entry colname="col5">25.57</oasis:entry> <oasis:entry colname="col6">102.22 %</oasis:entry> <oasis:entry colname="col7">2926</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">112</oasis:entry> <oasis:entry colname="col4">807</oasis:entry> <oasis:entry colname="col5">25.11</oasis:entry> <oasis:entry colname="col6">104.12 %</oasis:entry> <oasis:entry colname="col7">1463</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">224</oasis:entry> <oasis:entry colname="col4">418</oasis:entry> <oasis:entry colname="col5">26.01</oasis:entry> <oasis:entry colname="col6">100.51 %</oasis:entry> <oasis:entry colname="col7">731</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">280</oasis:entry> <oasis:entry colname="col4">349</oasis:entry> <oasis:entry colname="col5">27.14</oasis:entry> <oasis:entry colname="col6">96.3 %</oasis:entry> <oasis:entry colname="col7">585</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">420</oasis:entry> <oasis:entry colname="col4">260</oasis:entry> <oasis:entry colname="col5">30.33</oasis:entry> <oasis:entry colname="col6">86.18 %</oasis:entry> <oasis:entry colname="col7">390</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">20</oasis:entry> <oasis:entry colname="col3">560</oasis:entry> <oasis:entry colname="col4">215</oasis:entry> <oasis:entry colname="col5">33.44</oasis:entry> <oasis:entry colname="col6">78.16 %</oasis:entry> <oasis:entry colname="col7">293</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">25</oasis:entry> <oasis:entry colname="col3">700</oasis:entry> <oasis:entry colname="col4">186</oasis:entry> <oasis:entry colname="col5">36.17</oasis:entry> <oasis:entry colname="col6">72.28 %</oasis:entry> <oasis:entry colname="col7">234</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">30</oasis:entry> <oasis:entry colname="col3">840</oasis:entry> <oasis:entry colname="col4">168</oasis:entry> <oasis:entry colname="col5">39.2</oasis:entry> <oasis:entry colname="col6">66.69 %</oasis:entry> <oasis:entry colname="col7">195</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">40</oasis:entry> <oasis:entry colname="col3">1120</oasis:entry> <oasis:entry colname="col4">180</oasis:entry> <oasis:entry colname="col5">56.0</oasis:entry> <oasis:entry colname="col6">46.68 %</oasis:entry> <oasis:entry colname="col7">146</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">50</oasis:entry> <oasis:entry colname="col3">1400</oasis:entry> <oasis:entry colname="col4">186</oasis:entry> <oasis:entry colname="col5">72.33</oasis:entry> <oasis:entry colname="col6">36.14 %</oasis:entry> <oasis:entry colname="col7">117</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-half</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">14</oasis:entry> <oasis:entry colname="col4">5405</oasis:entry> <oasis:entry colname="col5">42.04</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">11 703</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">28</oasis:entry> <oasis:entry colname="col4">2766</oasis:entry> <oasis:entry colname="col5">43.03</oasis:entry> <oasis:entry colname="col6">97.7 %</oasis:entry> <oasis:entry colname="col7">5852</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">56</oasis:entry> <oasis:entry colname="col4">1487</oasis:entry> <oasis:entry colname="col5">46.26</oasis:entry> <oasis:entry colname="col6">90.87 %</oasis:entry> <oasis:entry colname="col7">2926</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">112</oasis:entry> <oasis:entry colname="col4">734</oasis:entry> <oasis:entry colname="col5">45.67</oasis:entry> <oasis:entry colname="col6">92.05 %</oasis:entry> <oasis:entry colname="col7">1463</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">140</oasis:entry> <oasis:entry colname="col4">620</oasis:entry> <oasis:entry colname="col5">48.22</oasis:entry> <oasis:entry colname="col6">87.18 %</oasis:entry> <oasis:entry colname="col7">1170</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">210</oasis:entry> <oasis:entry colname="col4">407</oasis:entry> <oasis:entry colname="col5">47.48</oasis:entry> <oasis:entry colname="col6">88.53 %</oasis:entry> <oasis:entry colname="col7">780</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">20</oasis:entry> <oasis:entry colname="col3">280</oasis:entry> <oasis:entry colname="col4">323</oasis:entry> <oasis:entry colname="col5">50.24</oasis:entry> <oasis:entry colname="col6">83.67 %</oasis:entry> <oasis:entry colname="col7">585</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">30</oasis:entry> <oasis:entry colname="col3">420</oasis:entry> <oasis:entry colname="col4">233</oasis:entry> <oasis:entry colname="col5">54.37</oasis:entry> <oasis:entry colname="col6">77.32 %</oasis:entry> <oasis:entry colname="col7">390</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">35</oasis:entry> <oasis:entry colname="col3">490</oasis:entry> <oasis:entry colname="col4">201</oasis:entry> <oasis:entry colname="col5">54.72</oasis:entry> <oasis:entry colname="col6">76.83 %</oasis:entry> <oasis:entry colname="col7">334</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">40</oasis:entry> <oasis:entry colname="col3">560</oasis:entry> <oasis:entry colname="col4">286</oasis:entry> <oasis:entry colname="col5">88.98</oasis:entry> <oasis:entry colname="col6">47.25 %</oasis:entry> <oasis:entry colname="col7">293</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">50</oasis:entry> <oasis:entry colname="col3">700</oasis:entry> <oasis:entry colname="col4">229</oasis:entry> <oasis:entry colname="col5">89.06</oasis:entry> <oasis:entry colname="col6">47.21 %</oasis:entry> <oasis:entry colname="col7">234</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">55</oasis:entry> <oasis:entry colname="col3">770</oasis:entry> <oasis:entry colname="col4">226</oasis:entry> <oasis:entry colname="col5">96.68</oasis:entry> <oasis:entry colname="col6">43.48 %</oasis:entry> <oasis:entry colname="col7">213</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Juqueen</oasis:entry> <oasis:entry colname="col2">32</oasis:entry> <oasis:entry colname="col3">512</oasis:entry> <oasis:entry colname="col4">2250</oasis:entry> <oasis:entry colname="col5">320.0</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">320</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">64</oasis:entry> <oasis:entry colname="col3">1024</oasis:entry> <oasis:entry colname="col4">1365</oasis:entry> <oasis:entry colname="col5">388.27</oasis:entry> <oasis:entry colname="col6">82.42 %</oasis:entry> <oasis:entry colname="col7">160</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">96</oasis:entry> <oasis:entry colname="col3">1536</oasis:entry> <oasis:entry colname="col4">1323</oasis:entry> <oasis:entry colname="col5">564.48</oasis:entry> <oasis:entry colname="col6">56.69 %</oasis:entry> <oasis:entry colname="col7">107</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">128</oasis:entry> <oasis:entry colname="col3">2048</oasis:entry> <oasis:entry colname="col4">1168</oasis:entry> <oasis:entry colname="col5">664.46</oasis:entry> <oasis:entry colname="col6">48.16 %</oasis:entry> <oasis:entry colname="col7">80</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">160</oasis:entry> <oasis:entry colname="col3">2560</oasis:entry> <oasis:entry colname="col4">1004</oasis:entry> <oasis:entry colname="col5">713.96</oasis:entry> <oasis:entry colname="col6">44.82 %</oasis:entry> <oasis:entry colname="col7">64</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">192</oasis:entry> <oasis:entry colname="col3">3072</oasis:entry> <oasis:entry colname="col4">940</oasis:entry> <oasis:entry colname="col5">802.13</oasis:entry> <oasis:entry colname="col6">39.89 %</oasis:entry> <oasis:entry colname="col7">53</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">256</oasis:entry> <oasis:entry colname="col3">4096</oasis:entry> <oasis:entry colname="col4">700</oasis:entry> <oasis:entry colname="col5">796.44</oasis:entry> <oasis:entry colname="col6">40.18 %</oasis:entry> <oasis:entry colname="col7">40</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">384</oasis:entry> <oasis:entry colname="col3">6144</oasis:entry> <oasis:entry colname="col4">554</oasis:entry> <oasis:entry colname="col5">945.49</oasis:entry> <oasis:entry colname="col6">33.84 %</oasis:entry> <oasis:entry colname="col7">27</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">512</oasis:entry> <oasis:entry colname="col3">8192</oasis:entry> <oasis:entry colname="col4">562</oasis:entry> <oasis:entry colname="col5">1278.86</oasis:entry> <oasis:entry colname="col6">25.02 %</oasis:entry> <oasis:entry colname="col7">20</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">1024</oasis:entry> <oasis:entry colname="col3">16 384</oasis:entry> <oasis:entry colname="col4">1749</oasis:entry> <oasis:entry colname="col5">7959.89</oasis:entry> <oasis:entry colname="col6">4.02 %</oasis:entry> <oasis:entry colname="col7">10</oasis:entry> </oasis:row> </oasis:tbody> </oasis:tgroup><?xmltex \end{scaleboxenv}?></oasis:table></table-wrap> <?xmltex \hack{\clearpage}?> </app> <app id="App1.Ch1.S6"> <title/> <?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.T3"><?xmltex \hack{\hsize\textwidth}?><caption><p>Scaling tests for the 60–12 km variable mesh on the four HPC sites.</p></caption><oasis:table frame="topbot"><?xmltex \begin{scaleboxenv}{.87}[.87]?><oasis:tgroup cols="7"> <oasis:colspec colnum="1" colname="col1" align="left"/> <oasis:colspec colnum="2" colname="col2" align="right"/> <oasis:colspec colnum="3" colname="col3" align="right"/> <oasis:colspec colnum="4" colname="col4" align="right"/> <oasis:colspec colnum="5" colname="col5" align="right"/> <oasis:colspec colnum="6" colname="col6" align="right"/> <oasis:colspec colnum="7" colname="col7" align="right"/> <oasis:thead> <oasis:row> <oasis:entry colname="col1">HPC site</oasis:entry> <oasis:entry colname="col2">Nodes</oasis:entry> <oasis:entry colname="col3">Tasks</oasis:entry> <oasis:entry colname="col4">Real time (s)</oasis:entry> <oasis:entry colname="col5">CPUh</oasis:entry> <oasis:entry colname="col6">Parallel</oasis:entry> <oasis:entry colname="col7">Cells/</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4">per 24 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">h</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col5">per 24 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">h</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col6">efficiency</oasis:entry> <oasis:entry colname="col7">task</oasis:entry> </oasis:row> </oasis:thead> <oasis:tbody> <oasis:row> <oasis:entry colname="col1">Curie</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">16</oasis:entry> <oasis:entry colname="col4">27 428</oasis:entry> <oasis:entry colname="col5">121.9</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">33 472</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">64</oasis:entry> <oasis:entry colname="col4">7715</oasis:entry> <oasis:entry colname="col5">137.2</oasis:entry> <oasis:entry colname="col6">88.88 %</oasis:entry> <oasis:entry colname="col7">8368</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">128</oasis:entry> <oasis:entry colname="col4">3955</oasis:entry> <oasis:entry colname="col5">140.6</oasis:entry> <oasis:entry colname="col6">86.69 %</oasis:entry> <oasis:entry colname="col7">4184</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">16</oasis:entry> <oasis:entry colname="col3">256</oasis:entry> <oasis:entry colname="col4">2025</oasis:entry> <oasis:entry colname="col5">144.0</oasis:entry> <oasis:entry colname="col6">84.65 %</oasis:entry> <oasis:entry colname="col7">2092</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">24</oasis:entry> <oasis:entry colname="col3">384</oasis:entry> <oasis:entry colname="col4">1349</oasis:entry> <oasis:entry colname="col5">143.9</oasis:entry> <oasis:entry colname="col6">84.72 %</oasis:entry> <oasis:entry colname="col7">1395</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">48</oasis:entry> <oasis:entry colname="col3">768</oasis:entry> <oasis:entry colname="col4">777</oasis:entry> <oasis:entry colname="col5">165.8</oasis:entry> <oasis:entry colname="col6">73.54 %</oasis:entry> <oasis:entry colname="col7">697</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">64</oasis:entry> <oasis:entry colname="col3">1024</oasis:entry> <oasis:entry colname="col4">690</oasis:entry> <oasis:entry colname="col5">196.3</oasis:entry> <oasis:entry colname="col6">62.11 %</oasis:entry> <oasis:entry colname="col7">523</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">96</oasis:entry> <oasis:entry colname="col3">1536</oasis:entry> <oasis:entry colname="col4">567</oasis:entry> <oasis:entry colname="col5">241.9</oasis:entry> <oasis:entry colname="col6">50.39 %</oasis:entry> <oasis:entry colname="col7">349</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">128</oasis:entry> <oasis:entry colname="col3">2048</oasis:entry> <oasis:entry colname="col4">484</oasis:entry> <oasis:entry colname="col5">275.3</oasis:entry> <oasis:entry colname="col6">44.27 %</oasis:entry> <oasis:entry colname="col7">262</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">192</oasis:entry> <oasis:entry colname="col3">3072</oasis:entry> <oasis:entry colname="col4">505</oasis:entry> <oasis:entry colname="col5">430.9</oasis:entry> <oasis:entry colname="col6">28.29 %</oasis:entry> <oasis:entry colname="col7">174</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">256</oasis:entry> <oasis:entry colname="col3">4096</oasis:entry> <oasis:entry colname="col4">628</oasis:entry> <oasis:entry colname="col5">714.5</oasis:entry> <oasis:entry colname="col6">17.06 %</oasis:entry> <oasis:entry colname="col7">131</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">384</oasis:entry> <oasis:entry colname="col3">6144</oasis:entry> <oasis:entry colname="col4">604</oasis:entry> <oasis:entry colname="col5">1030.8</oasis:entry> <oasis:entry colname="col6">11.83 %</oasis:entry> <oasis:entry colname="col7">87</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">ForHLR1</oasis:entry> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">40</oasis:entry> <oasis:entry colname="col4">10 967</oasis:entry> <oasis:entry colname="col5">121.9</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">13 389</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">80</oasis:entry> <oasis:entry colname="col4">5645</oasis:entry> <oasis:entry colname="col5">125.4</oasis:entry> <oasis:entry colname="col6">97.14 %</oasis:entry> <oasis:entry colname="col7">6694</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">160</oasis:entry> <oasis:entry colname="col4">2876</oasis:entry> <oasis:entry colname="col5">127.8</oasis:entry> <oasis:entry colname="col6">95.33 %</oasis:entry> <oasis:entry colname="col7">3347</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">300</oasis:entry> <oasis:entry colname="col4">1584</oasis:entry> <oasis:entry colname="col5">132.0</oasis:entry> <oasis:entry colname="col6">92.31 %</oasis:entry> <oasis:entry colname="col7">1785</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">20</oasis:entry> <oasis:entry colname="col3">400</oasis:entry> <oasis:entry colname="col4">1234</oasis:entry> <oasis:entry colname="col5">137.1</oasis:entry> <oasis:entry colname="col6">88.87 %</oasis:entry> <oasis:entry colname="col7">1339</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">30</oasis:entry> <oasis:entry colname="col3">600</oasis:entry> <oasis:entry colname="col4">922</oasis:entry> <oasis:entry colname="col5">153.7</oasis:entry> <oasis:entry colname="col6">79.3 %</oasis:entry> <oasis:entry colname="col7">893</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">40</oasis:entry> <oasis:entry colname="col3">800</oasis:entry> <oasis:entry colname="col4">720</oasis:entry> <oasis:entry colname="col5">160.0</oasis:entry> <oasis:entry colname="col6">76.16 %</oasis:entry> <oasis:entry colname="col7">669</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">45</oasis:entry> <oasis:entry colname="col3">900</oasis:entry> <oasis:entry colname="col4">710</oasis:entry> <oasis:entry colname="col5">176.1</oasis:entry> <oasis:entry colname="col6">68.65 %</oasis:entry> <oasis:entry colname="col7">595</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">50</oasis:entry> <oasis:entry colname="col3">1000</oasis:entry> <oasis:entry colname="col4">704</oasis:entry> <oasis:entry colname="col5">195.6</oasis:entry> <oasis:entry colname="col6">62.31 %</oasis:entry> <oasis:entry colname="col7">536</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">60</oasis:entry> <oasis:entry colname="col3">1200</oasis:entry> <oasis:entry colname="col4">649</oasis:entry> <oasis:entry colname="col5">216.3</oasis:entry> <oasis:entry colname="col6">56.33 %</oasis:entry> <oasis:entry colname="col7">446</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">80</oasis:entry> <oasis:entry colname="col3">1600</oasis:entry> <oasis:entry colname="col4">635</oasis:entry> <oasis:entry colname="col5">282.2</oasis:entry> <oasis:entry colname="col6">43.18 %</oasis:entry> <oasis:entry colname="col7">335</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">100</oasis:entry> <oasis:entry colname="col3">2000</oasis:entry> <oasis:entry colname="col4">543</oasis:entry> <oasis:entry colname="col5">301.7</oasis:entry> <oasis:entry colname="col6">40.39 %</oasis:entry> <oasis:entry colname="col7">268</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">120</oasis:entry> <oasis:entry colname="col3">2400</oasis:entry> <oasis:entry colname="col4">459</oasis:entry> <oasis:entry colname="col5">306.0</oasis:entry> <oasis:entry colname="col6">39.82 %</oasis:entry> <oasis:entry colname="col7">223</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-full</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">28</oasis:entry> <oasis:entry colname="col4">17 151</oasis:entry> <oasis:entry colname="col5">133.4</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">19 127</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">56</oasis:entry> <oasis:entry colname="col4">8552</oasis:entry> <oasis:entry colname="col5">133.0</oasis:entry> <oasis:entry colname="col6">100.27 %</oasis:entry> <oasis:entry colname="col7">9563</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">224</oasis:entry> <oasis:entry colname="col4">2106</oasis:entry> <oasis:entry colname="col5">131.0</oasis:entry> <oasis:entry colname="col6">101.8 %</oasis:entry> <oasis:entry colname="col7">2391</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">280</oasis:entry> <oasis:entry colname="col4">1704</oasis:entry> <oasis:entry colname="col5">132.5</oasis:entry> <oasis:entry colname="col6">100.65 %</oasis:entry> <oasis:entry colname="col7">1913</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">420</oasis:entry> <oasis:entry colname="col4">1113</oasis:entry> <oasis:entry colname="col5">129.9</oasis:entry> <oasis:entry colname="col6">102.73 %</oasis:entry> <oasis:entry colname="col7">1275</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">20</oasis:entry> <oasis:entry colname="col3">560</oasis:entry> <oasis:entry colname="col4">862</oasis:entry> <oasis:entry colname="col5">134.1</oasis:entry> <oasis:entry colname="col6">99.48 %</oasis:entry> <oasis:entry colname="col7">956</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">25</oasis:entry> <oasis:entry colname="col3">700</oasis:entry> <oasis:entry colname="col4">716</oasis:entry> <oasis:entry colname="col5">139.2</oasis:entry> <oasis:entry colname="col6">95.82 %</oasis:entry> <oasis:entry colname="col7">765</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">30</oasis:entry> <oasis:entry colname="col3">840</oasis:entry> <oasis:entry colname="col4">613</oasis:entry> <oasis:entry colname="col5">143.0</oasis:entry> <oasis:entry colname="col6">93.26 %</oasis:entry> <oasis:entry colname="col7">638</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">40</oasis:entry> <oasis:entry colname="col3">1120</oasis:entry> <oasis:entry colname="col4">507</oasis:entry> <oasis:entry colname="col5">157.7</oasis:entry> <oasis:entry colname="col6">84.57 %</oasis:entry> <oasis:entry colname="col7">478</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">50</oasis:entry> <oasis:entry colname="col3">1400</oasis:entry> <oasis:entry colname="col4">437</oasis:entry> <oasis:entry colname="col5">169.9</oasis:entry> <oasis:entry colname="col6">78.49 %</oasis:entry> <oasis:entry colname="col7">383</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">55</oasis:entry> <oasis:entry colname="col3">1540</oasis:entry> <oasis:entry colname="col4">428</oasis:entry> <oasis:entry colname="col5">183.1</oasis:entry> <oasis:entry colname="col6">72.86 %</oasis:entry> <oasis:entry colname="col7">348</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-half</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">14</oasis:entry> <oasis:entry colname="col4">29 000</oasis:entry> <oasis:entry colname="col5">225.6</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">38 254</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">28</oasis:entry> <oasis:entry colname="col4">14 487</oasis:entry> <oasis:entry colname="col5">225.4</oasis:entry> <oasis:entry colname="col6">100.09 %</oasis:entry> <oasis:entry colname="col7">19 127</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">56</oasis:entry> <oasis:entry colname="col4">7143</oasis:entry> <oasis:entry colname="col5">222.2</oasis:entry> <oasis:entry colname="col6">101.5 %</oasis:entry> <oasis:entry colname="col7">9563</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">8</oasis:entry> <oasis:entry colname="col3">112</oasis:entry> <oasis:entry colname="col4">4126</oasis:entry> <oasis:entry colname="col5">256.7</oasis:entry> <oasis:entry colname="col6">87.86 %</oasis:entry> <oasis:entry colname="col7">4782</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">140</oasis:entry> <oasis:entry colname="col4">2833</oasis:entry> <oasis:entry colname="col5">220.3</oasis:entry> <oasis:entry colname="col6">102.36 %</oasis:entry> <oasis:entry colname="col7">3825</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">210</oasis:entry> <oasis:entry colname="col4">2133</oasis:entry> <oasis:entry colname="col5">248.9</oasis:entry> <oasis:entry colname="col6">90.64 %</oasis:entry> <oasis:entry colname="col7">2550</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">20</oasis:entry> <oasis:entry colname="col3">280</oasis:entry> <oasis:entry colname="col4">1583</oasis:entry> <oasis:entry colname="col5">246.2</oasis:entry> <oasis:entry colname="col6">91.6 %</oasis:entry> <oasis:entry colname="col7">1913</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">30</oasis:entry> <oasis:entry colname="col3">420</oasis:entry> <oasis:entry colname="col4">1097</oasis:entry> <oasis:entry colname="col5">256.0</oasis:entry> <oasis:entry colname="col6">88.12 %</oasis:entry> <oasis:entry colname="col7">1275</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">40</oasis:entry> <oasis:entry colname="col3">560</oasis:entry> <oasis:entry colname="col4">784</oasis:entry> <oasis:entry colname="col5">243.9</oasis:entry> <oasis:entry colname="col6">92.47 %</oasis:entry> <oasis:entry colname="col7">956</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">50</oasis:entry> <oasis:entry colname="col3">700</oasis:entry> <oasis:entry colname="col4">624</oasis:entry> <oasis:entry colname="col5">242.7</oasis:entry> <oasis:entry colname="col6">92.95 %</oasis:entry> <oasis:entry colname="col7">765</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">55</oasis:entry> <oasis:entry colname="col3">770</oasis:entry> <oasis:entry colname="col4">621</oasis:entry> <oasis:entry colname="col5">265.7</oasis:entry> <oasis:entry colname="col6">84.91 %</oasis:entry> <oasis:entry colname="col7">696</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Juqueen</oasis:entry> <oasis:entry colname="col2">32</oasis:entry> <oasis:entry colname="col3">512</oasis:entry> <oasis:entry colname="col4">11 800</oasis:entry> <oasis:entry colname="col5">1678.2</oasis:entry> <oasis:entry colname="col6">100.0 %</oasis:entry> <oasis:entry colname="col7">1046</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">64</oasis:entry> <oasis:entry colname="col3">1024</oasis:entry> <oasis:entry colname="col4">6950</oasis:entry> <oasis:entry colname="col5">1976.9</oasis:entry> <oasis:entry colname="col6">84.89 %</oasis:entry> <oasis:entry colname="col7">523</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">96</oasis:entry> <oasis:entry colname="col3">1536</oasis:entry> <oasis:entry colname="col4">4490</oasis:entry> <oasis:entry colname="col5">1915.7</oasis:entry> <oasis:entry colname="col6">87.6 %</oasis:entry> <oasis:entry colname="col7">349</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">128</oasis:entry> <oasis:entry colname="col3">2048</oasis:entry> <oasis:entry colname="col4">4100</oasis:entry> <oasis:entry colname="col5">2332.4</oasis:entry> <oasis:entry colname="col6">71.95 %</oasis:entry> <oasis:entry colname="col7">262</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">192</oasis:entry> <oasis:entry colname="col3">3072</oasis:entry> <oasis:entry colname="col4">2991</oasis:entry> <oasis:entry colname="col5">2552.3</oasis:entry> <oasis:entry colname="col6">65.75 %</oasis:entry> <oasis:entry colname="col7">174</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">256</oasis:entry> <oasis:entry colname="col3">4096</oasis:entry> <oasis:entry colname="col4">2979</oasis:entry> <oasis:entry colname="col5">3389.4</oasis:entry> <oasis:entry colname="col6">49.51 %</oasis:entry> <oasis:entry colname="col7">131</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">384</oasis:entry> <oasis:entry colname="col3">6144</oasis:entry> <oasis:entry colname="col4">2514</oasis:entry> <oasis:entry colname="col5">4290.6</oasis:entry> <oasis:entry colname="col6">39.11 %</oasis:entry> <oasis:entry colname="col7">87</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">512</oasis:entry> <oasis:entry colname="col3">8192</oasis:entry> <oasis:entry colname="col4">2468</oasis:entry> <oasis:entry colname="col5">5616.1</oasis:entry> <oasis:entry colname="col6">29.88 %</oasis:entry> <oasis:entry colname="col7">65</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">640</oasis:entry> <oasis:entry colname="col3">10 240</oasis:entry> <oasis:entry colname="col4">4462</oasis:entry> <oasis:entry colname="col5">12 691.9</oasis:entry> <oasis:entry colname="col6">13.22 %</oasis:entry> <oasis:entry colname="col7">52</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2">1024</oasis:entry> <oasis:entry colname="col3">16 384</oasis:entry> <oasis:entry colname="col4">9863</oasis:entry> <oasis:entry colname="col5">44 887.6</oasis:entry> <oasis:entry colname="col6">3.74 %</oasis:entry> <oasis:entry colname="col7">33</oasis:entry> </oasis:row> </oasis:tbody> </oasis:tgroup><?xmltex \end{scaleboxenv}?></oasis:table></table-wrap> <?xmltex \hack{\clearpage}?> </app> <app id="App1.Ch1.S7"> <title/> <?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.T4"><?xmltex \hack{\hsize\textwidth}?><caption><p>Scalasca/Score-P profile of percentages of time spent during the model integration, i.e. ignoring the model initialisation, for the 60–12 km mesh on Juqueen. The most significant contributors to each of the categories are listed for completeness.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4"> <oasis:colspec colnum="1" colname="col1" align="left"/> <oasis:colspec colnum="2" colname="col2" align="right"/> <oasis:colspec colnum="3" colname="col3" align="right"/> <oasis:colspec colnum="4" colname="col4" align="right"/> <oasis:thead> <oasis:row rowsep="1"> <oasis:entry colname="col1">Step in time integration</oasis:entry> <oasis:entry colname="col2">2048-task run</oasis:entry> <oasis:entry colname="col3">4096-task run</oasis:entry> <oasis:entry colname="col4">8192-task run</oasis:entry> </oasis:row> </oasis:thead> <oasis:tbody> <oasis:row> <oasis:entry colname="col1">Physics</oasis:entry> <oasis:entry colname="col2">13 %</oasis:entry> <oasis:entry colname="col3">9 %</oasis:entry> <oasis:entry colname="col4">4 %</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"> Microphysics</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row> <oasis:entry colname="col1"> Radiation</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row> <oasis:entry colname="col1"> PBL/CU scheme</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"> Land surface model</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row> <oasis:entry colname="col1">Communication</oasis:entry> <oasis:entry colname="col2">35 %</oasis:entry> <oasis:entry colname="col3">41 %</oasis:entry> <oasis:entry colname="col4">70 %</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"> All-to-all min/max values</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"> Exchange halo fields</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row> <oasis:entry colname="col1">Dynamics</oasis:entry> <oasis:entry colname="col2">36 %</oasis:entry> <oasis:entry colname="col3">34 %</oasis:entry> <oasis:entry colname="col4">11 %</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"> Acoustic time step</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row> <oasis:entry colname="col1"> Advance scalars</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row> <oasis:entry colname="col1"> Compute/add tendencies</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"> Solve diagnostics</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row> <oasis:entry colname="col1">I/O</oasis:entry> <oasis:entry colname="col2">16 %</oasis:entry> <oasis:entry colname="col3">16 %</oasis:entry> <oasis:entry colname="col4">15 %</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1"> Build/write output stream</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> <oasis:row> <oasis:entry colname="col1"> Boundary updates (read)</oasis:entry> <oasis:entry colname="col2"/> <oasis:entry colname="col3"/> <oasis:entry colname="col4"/> </oasis:row> </oasis:tbody> </oasis:tgroup></oasis:table></table-wrap> <?xmltex \hack{\clearpage}?> </app> <app id="App1.Ch1.S8"> <title/> <?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.T5"><?xmltex \hack{\hsize\textwidth}?><caption><p>File sizes of input (I) and output (O) files for all meshes (uncompressed netCDF3/4). For a given mesh with <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> grid cells, an estimate of the required file sizes is calculated from the other runs. The given file sizes for the comprehensive output file <monospace>output.nc</monospace> correspond to a set of 20 atmospheric variables (on 41 geometric height levels), 3 soil variables (on 4 soil levels), 32 single-level variables (surface, top of atmosphere, etc.) and 40 static variables, required for the mesh geometry. Likewise, the smaller diagnostic output files <monospace>diag.nc</monospace> contain 24 variables on single levels (surface, pre-defined pressure levels, etc.). It should be noted that the number of atmospheric, soil and single-level variables can be customised for both <monospace>output.nc</monospace> and <monospace>diag.nc</monospace> at run time using simple ASCII files. Further, in an attempt to speed up the I/O and reduce the file sizes, the output of the static variables could be suppressed for all but a single <monospace>output.nc</monospace> file.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6"> <oasis:colspec colnum="1" colname="col1" align="left"/> <oasis:colspec colnum="2" colname="col2" align="right"/> <oasis:colspec colnum="3" colname="col3" align="right"/> <oasis:colspec colnum="4" colname="col4" align="right"/> <oasis:colspec colnum="5" colname="col5" align="right"/> <oasis:colspec colnum="6" colname="col6" align="right"/> <oasis:thead> <oasis:row> <oasis:entry colname="col1">Mesh</oasis:entry> <oasis:entry colname="col2">Cells</oasis:entry> <oasis:entry colname="col3"><monospace>init.nc</monospace></oasis:entry> <oasis:entry colname="col4"><monospace>diag.nc</monospace></oasis:entry> <oasis:entry colname="col5"><monospace>restart.nc</monospace></oasis:entry> <oasis:entry colname="col6"><monospace>output.nc</monospace></oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2"/> <oasis:entry colname="col3">(I)</oasis:entry> <oasis:entry colname="col4">(O)</oasis:entry> <oasis:entry colname="col5">(O)</oasis:entry> <oasis:entry colname="col6">(O)</oasis:entry> </oasis:row> </oasis:thead> <oasis:tbody> <oasis:row> <oasis:entry colname="col1">120 km</oasis:entry> <oasis:entry colname="col2">40 962</oasis:entry> <oasis:entry colname="col3">758 MB</oasis:entry> <oasis:entry colname="col4">8.7 MB</oasis:entry> <oasis:entry colname="col5">1.6 GB</oasis:entry> <oasis:entry colname="col6">153 MB</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">100–25 km</oasis:entry> <oasis:entry colname="col2">163 842</oasis:entry> <oasis:entry colname="col3">3.0 GB</oasis:entry> <oasis:entry colname="col4">33 MB</oasis:entry> <oasis:entry colname="col5">6.0 GB</oasis:entry> <oasis:entry colname="col6">597 GB</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">60–12 km</oasis:entry> <oasis:entry colname="col2">535 554</oasis:entry> <oasis:entry colname="col3">9.7 GB</oasis:entry> <oasis:entry colname="col4">107 MB</oasis:entry> <oasis:entry colname="col5">20 GB</oasis:entry> <oasis:entry colname="col6">2.0 GB</oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">3 km</oasis:entry> <oasis:entry colname="col2">65 536 002</oasis:entry> <oasis:entry colname="col3">1.2 TB</oasis:entry> <oasis:entry colname="col4">13 GB</oasis:entry> <oasis:entry colname="col5">2.4 TB</oasis:entry> <oasis:entry colname="col6">250 GB</oasis:entry> <?xmltex \interline{[1.29pt]}?></oasis:row> <oasis:row> <oasis:entry colname="col1"/> <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col3">19.7 kB <inline-formula><mml:math display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col4">0.22 kB <inline-formula><mml:math display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col5">38.8 kB <inline-formula><mml:math display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula></oasis:entry> <oasis:entry colname="col6">4.0 kB <inline-formula><mml:math display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula></oasis:entry> </oasis:row> </oasis:tbody> </oasis:tgroup></oasis:table></table-wrap> <?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.T6"><?xmltex \hack{\hsize\textwidth}?><caption><p>Summary of the scaling tests: cheapest and fastest model run configurations for a 24 h model integration with disk I/O enabled (rt is real time).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7"> <oasis:colspec colnum="1" colname="col1" align="left"/> <oasis:colspec colnum="2" colname="col2" align="right"/> <oasis:colspec colnum="3" colname="col3" align="right"/> <oasis:colspec colnum="4" colname="col4" align="right"/> <oasis:colspec colnum="5" colname="col5" align="right"/> <oasis:colspec colnum="6" colname="col6" align="right"/> <oasis:colspec colnum="7" colname="col7" align="center"/> <oasis:thead> <oasis:row> <oasis:entry colname="col1">HPC site</oasis:entry> <oasis:entry colname="col2">Nodes</oasis:entry> <oasis:entry colname="col3">Tasks</oasis:entry> <oasis:entry colname="col4">CPUh</oasis:entry> <oasis:entry colname="col5">Nodes</oasis:entry> <oasis:entry colname="col6">Tasks</oasis:entry> <oasis:entry colname="col7">Speed-up</oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1"/> <oasis:entry colname="col2">cheapest run</oasis:entry> <oasis:entry colname="col3">cheapest run</oasis:entry> <oasis:entry colname="col4">cheapest run</oasis:entry> <oasis:entry colname="col5">fastest run</oasis:entry> <oasis:entry colname="col6">fastest run</oasis:entry> <oasis:entry colname="col7">fastest run</oasis:entry> </oasis:row> </oasis:thead> <oasis:tbody> <oasis:row> <oasis:entry namest="col1" nameend="col5">Regular 120 km mesh </oasis:entry> <oasis:entry colname="col6"/> <oasis:entry colname="col7"/> </oasis:row> <oasis:row> <oasis:entry colname="col1">Curie</oasis:entry> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">32</oasis:entry> <oasis:entry colname="col4">5.6</oasis:entry> <oasis:entry colname="col5">32</oasis:entry> <oasis:entry colname="col6">512</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>477</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">ForHLR1</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">20</oasis:entry> <oasis:entry colname="col4">4.8</oasis:entry> <oasis:entry colname="col5">16</oasis:entry> <oasis:entry colname="col6">320</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>680</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Juqueen</oasis:entry> <oasis:entry colname="col2">32</oasis:entry> <oasis:entry colname="col3">512</oasis:entry> <oasis:entry colname="col4">94.0</oasis:entry> <oasis:entry colname="col5">64</oasis:entry> <oasis:entry colname="col6">1024</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>181</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-half</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">14</oasis:entry> <oasis:entry colname="col4">9.1</oasis:entry> <oasis:entry colname="col5">20</oasis:entry> <oasis:entry colname="col6">280</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>993</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1">Jtest-full</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">28</oasis:entry> <oasis:entry colname="col4">5.1</oasis:entry> <oasis:entry colname="col5">15</oasis:entry> <oasis:entry colname="col6">420</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>949</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry namest="col1" nameend="col5">Variable 100–25 km mesh </oasis:entry> <oasis:entry colname="col6"/> <oasis:entry colname="col7"/> </oasis:row> <oasis:row> <oasis:entry colname="col1">Curie</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">16</oasis:entry> <oasis:entry colname="col4">27.1</oasis:entry> <oasis:entry colname="col5">48</oasis:entry> <oasis:entry colname="col6">768</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>387</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">ForHLR1</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">20</oasis:entry> <oasis:entry colname="col4">24.5</oasis:entry> <oasis:entry colname="col5">45</oasis:entry> <oasis:entry colname="col6">900</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>441</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Juqueen</oasis:entry> <oasis:entry colname="col2">32</oasis:entry> <oasis:entry colname="col3">512</oasis:entry> <oasis:entry colname="col4">320</oasis:entry> <oasis:entry colname="col5">384</oasis:entry> <oasis:entry colname="col6">6144</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>156</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-half</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">14</oasis:entry> <oasis:entry colname="col4">42.0</oasis:entry> <oasis:entry colname="col5">35</oasis:entry> <oasis:entry colname="col6">490</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>430</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1">Jtest-full</oasis:entry> <oasis:entry colname="col2">4</oasis:entry> <oasis:entry colname="col3">112</oasis:entry> <oasis:entry colname="col4">25.1</oasis:entry> <oasis:entry colname="col5">30</oasis:entry> <oasis:entry colname="col6">840</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>514</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry namest="col1" nameend="col5">Variable 60–12 km mesh </oasis:entry> <oasis:entry colname="col6"/> <oasis:entry colname="col7"/> </oasis:row> <oasis:row> <oasis:entry colname="col1">Curie</oasis:entry> <oasis:entry colname="col2">1</oasis:entry> <oasis:entry colname="col3">16</oasis:entry> <oasis:entry colname="col4">122</oasis:entry> <oasis:entry colname="col5">128</oasis:entry> <oasis:entry colname="col6">2048</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>179</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">ForHLR1</oasis:entry> <oasis:entry colname="col2">2</oasis:entry> <oasis:entry colname="col3">40</oasis:entry> <oasis:entry colname="col4">122</oasis:entry> <oasis:entry colname="col5">120</oasis:entry> <oasis:entry colname="col6">2400</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>188</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Juqueen</oasis:entry> <oasis:entry colname="col2">32</oasis:entry> <oasis:entry colname="col3">512</oasis:entry> <oasis:entry colname="col4">1678</oasis:entry> <oasis:entry colname="col5">512</oasis:entry> <oasis:entry colname="col6">8192</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mn>35</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry colname="col1">Jtest-half</oasis:entry> <oasis:entry colname="col2">10</oasis:entry> <oasis:entry colname="col3">140</oasis:entry> <oasis:entry colname="col4">220</oasis:entry> <oasis:entry colname="col5">55</oasis:entry> <oasis:entry colname="col6">770</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>139</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row rowsep="1"> <oasis:entry colname="col1">Jtest-full</oasis:entry> <oasis:entry colname="col2">15</oasis:entry> <oasis:entry colname="col3">420</oasis:entry> <oasis:entry colname="col4">130</oasis:entry> <oasis:entry colname="col5">55</oasis:entry> <oasis:entry colname="col6">1540</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>202</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> <oasis:row> <oasis:entry namest="col1" nameend="col5">Regular 3 km mesh </oasis:entry> <oasis:entry colname="col6"/> <oasis:entry colname="col7"/> </oasis:row> <oasis:row> <oasis:entry colname="col1">Juqueen</oasis:entry> <oasis:entry colname="col2">4096</oasis:entry> <oasis:entry colname="col3">65 536</oasis:entry> <oasis:entry colname="col4">1.3 Mio</oasis:entry> <oasis:entry colname="col5">24 576</oasis:entry> <oasis:entry colname="col6">393 216</oasis:entry> <oasis:entry colname="col7"><inline-formula><mml:math display="inline"><mml:mrow><mml:mn>6.3</mml:mn><mml:mo>×</mml:mo><mml:mtext>rt</mml:mtext></mml:mrow></mml:math></inline-formula></oasis:entry> </oasis:row> </oasis:tbody> </oasis:tgroup></oasis:table></table-wrap> <?xmltex \hack{\clearpage}?> <sec id="App1.Ch1.S8.SSx1" specific-use="unnumbered"> <title>Code availability

Supplementary material and accompanying information are available at http://dx.doi.org/10.1594/PANGAEA.849428. These contain a modified version of the MPAS release v3.1 with extensions for detailed measurements of the performance of the individual components of the model. This version can be used to reconstruct the three medium-sized test cases in Sect. and the 11-month study on the West African monsoon in Sect. . Compressed archives of the required input data for the three test cases are included in the supplementary material. The initial conditions provided for the 60–12 km variable grid test case (Sect. ) are identical to those used in Sect. . The archive also contains a modified version of a newer release v3.3 of MPAS, identical to v3.1 used in this publication, but with minor bug fixes for model stability.

All experiments in this study were carried out by D. Heinzeller with active support of M. G. Duda. The moderately sized scaling experiments presented in Sect. were designed and analysed by D. Heinzeller, while the extreme scaling experiment was designed and realised in a collaborate effort by D. Heinzeller and M. G. Duda (Sect. ). The analysis of the model performance with respect to the representation of the West African summer monsoon and its associated precipitation was conducted by D. Heinzeller and H. Kunstmann (Sect. ).

Acknowledgements

This work has been funded by the German Federal Ministry of Education and Research (BMBF) through the West African Science Service Center on Climate Change and Adapted Land Use (WASCAL). We acknowledge that the results of this research have been achieved using the PRACE Research Infrastructure resources TGCC Curie and FZJ Juqueen based in France and Germany. Access to SCC ForHLR1 was granted through a test account, and to FZJ Juropatest through an ongoing research project on FZJ Juropa. The authors acknowledge the European Centre for Medium-Range Weather Forecasts (ECMWF) for the dissemination of ERA-Interim, the National Centers for Environment Prediction (NCEP) for the dissemination of CFSR, and the Global Modeling and Assimilation Office (GMAO) and the GES DISC for the dissemination of MERRA. Further, we acknowledge the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, for providing the UDEL air temperature and precipitation, GHCN air temperature, GPCC precipitation and NOAA OI SST sea surface temperature data sets. We also thank PRACE for funding the 3rd Juqueen Porting and Tuning Workshop 2015 as part of the PRACE Advanced Training Centres courses. The authors are particularly grateful for the extensive and valuable support of the Jülich Supercomputing Centre (JSC) support team during the extreme scaling experiment: Dirk Brömmel, Wolfgang Frings, Markus Geimer, Klaus Görgen, Sabine Grießbach, Lars Hoffmann, Catrin Meyer, Michael Rambadt, Michael Stephan and Brian Wylie. We also thank the two anonymous referees for useful comments and corrections that helped to improve this manuscript. The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association. Edited by: S. Valcke

References Amdahl(1967)

Amdahl, G. M.: Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities, AFIPS Conference Proceedings, 30, 483–485, 1967.

Borkar and Chien(2011)

Borkar, S. and Chien, A. A.: The future of microprocessors, Commun. ACM, 54, 67–77, 2011.

Brömmel et al.(2015)

Brömmel, D., Frings, W., and Wylie, B.: Technical Report Juqueen Extreme Scaling Workshop 2015, Tech. rep., Jülich, Germany, available at: http://hdl.handle.net/2128/8435 (last access: 25 August 2015), 2015.

Caron(2013)

Caron, J.-F.: Mismatching perturbations at the lateral boundaries in limited-area ensemble forecasting: a case study, Mon. Weather Rev., 141, 356–374, doi:10.1175/MWR-D-12-00051.1, 2013.

Caya and Biner(2004)

Caya, D. and Biner, S.: Internal variability of RCM simulations over an annual cycle, Clim. Dynam., 22, 33–46, doi:10.1007/s00382-003-0360-2, 2004.

Davies(1983)

Davies, H. C.: Limitations of some common lateral boundary schemes used in regional NWP models, Mon. Weather Rev., 111, 1002–1012, doi:10.1175/1520-0493(1983)111<1002:LOSCLB>2.0.CO;2, 1983.

Dee et al.(2011)

Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J., Haimberger, L., Healy, S. B., Hersbach, H., Hólm, E. V., Isaksen, L., Kållberg, P., Köhler, M., Matricardi, M., Mcnally, A. P., Monge-Sanz, B. M., Morcrette, J. J., Park, B. K., Peubey, C. de Rosnay, P., Tavolato, C., Thépaut, J. N., and Vitart, F.: The ERA-Interim reanalysis: configuration and performance of the data assimilation system, Q. J. Roy. Meteorol. Soc., 137, 553–597, doi:10.1002/qj.828, 2011.

Dennis et al.(2013)

Dennis, J. M., Edwards, J., Jacob, R., Loy, R., Mirin, A., and Vertenstein, M.: Parallel I/O library (PIO), available at: http://www.cesm.ucar.edu/models/pio (last access: 25 August 2015), 2013.

Du et al.(2003)

Du, Q., Gunzburger, M. D., and Ju, L.: Constrained centroidal voronoi tessellations for surfaces, SIAM J. Sci. Comput., 24, 1488–1506, doi:10.1137/S1064827501391576, 2003.

Duda et al.(2014)

Duda, M., Fowler, L., Skamarock, W., Roesch, C., Jacobsen, D., and Ringler, T.: MPAS-Atmosphere Model User's Guide Version 3.0, available at: http://www2.mmm.ucar.edu/projects/mpas/mpas_atmosphere_users_guide_3.0.pdf (last access: 25 August 2015), 2014.

Fan and van den Dool(2008)

Fan, Y. and van den Dool, H.: A global monthly land surface air temperature analysis for 1948–present, J. Geophys. Res.-Atmos., 113, D01103, doi:10.1029/2007JD008470, 2008.

Grist and Nicholson(2001)

Grist, J. P. and Nicholson, S. E.: A study of the dynamic factors influencing the rainfall variability in the West African Sahel, J. Climate, 14, 1337–1359, doi:10.1175/1520-0442(2001)014<1337:ASOTDF>2.0.CO;2, 2001.

Harris et al.(2014)

Harris, I., Jones, P. D., Osborn, T. J., and Lister, D. H.: Updated high-resolution grids of monthly climatic observations – the CRU TS3.10 Dataset, Int. J. Climatol., 34, 623–642, doi:10.1002/joc.3711, 2014.

Harris and Durran(2010)

Harris, L. M. and Durran, D. R.: An idealized comparison of one-way and two-way grid nesting, Mon. Weather Rev., 138, 2174–2187, doi:10.1175/2010MWR3080.1, 2010.

Hourdin et al.(2010)

Hourdin, F., Musat, I., Guichard, F., Ruti, P. M., Favot, F., Filiberti, M. A., Pham, M. A. I., Grandpeix, J. Y., Polcher, J. A. N., Marquet, P., Boone, A., Lafore, J. P., Redelsperger, J. L., Dell'Aquila, A., Doval, T. L., Traore, A. K., and Gallée, H.: Amma-Model intercomparison project, B. Am. Meteorol. Soc., 91, 95–104, doi:10.1175/2009BAMS2791.1, 2010.

Karypis and Kumar(1998)

Karypis, G. and Kumar, V.: Multilevel k way partitioning scheme for irregular graphs, J. Parallel Distr. Com., 48, 96–129, 1998.

Klein et al.(2015)

Klein, C., Heinzeller, D., Bliefernicht, J., and Kunstmann, H.: Variability of West African monsoon patterns generated by a WRF multi-physics ensemble, Clim. Dynam., doi:10.1007/s00382-015-2505-5, online first, 2015.

Klemp(2011)

Klemp, J. B.: A terrain-following coordinate with smoothed coordinate surfaces, Mon. Weather Rev., 139, 2163–2169, doi:10.1175/MWR-D-10-05046.1, 2011.

Laprise(2003)

Laprise, R.: Resolved scales and nonlinear interactions in limited-area models, J. Atmos. Sci., 60, 768–779, doi:10.1175/1520-0469(2003)060<0768:RSANII>2.0.CO;2, 2003.

Lavaysse et al.(2009)

Lavaysse, C., Flamant, C., Janicot, S., Parker, D. J., Lafore, J. P., Sultan, B., and Pelon, J.: Seasonal evolution of the West African heat low: a climatological perspective, Clim. Dynam., 33, 313–330, doi:10.1007/s00382-009-0553-4, 2009.

Leduc and Laprise(2008)

Leduc, M. and Laprise, R.: Regional climate model sensitivity to domain size, Clim. Dynam., 32, 833–854, doi:10.1007/s00382-008-0400-z, 2008.

Lorenz and Kunstmann(2012)

Lorenz, C. and Kunstmann, H.: The hydrological cycle in three state-of-the-art reanalyses: intercomparison and performance analysis, J. Hydrometeorol., 13, 1397–1420, doi:10.1175/JHM-D-11-088.1, 2012.

MacDonald et al.(2011)

MacDonald, A. E., Middlecoff, J., Henderson, T., and Lee, J.-L.: A general method for modeling on irregular grids, Int. J. High Perform. C., 25, 392–403, doi:10.1177/1094342010385019, 2011.

Miguez-Macho et al.(2004)

Miguez-Macho, G., Stenchikov, G. L., and Robock, A.: Spectral nudging to eliminate the effects of domain position and geometry in regional climate model simulations, J. Geophys. Res.-Atmos., 109, D13104, doi:10.1029/2003JD004495, 2004.

Nicholson and Webster(2007)

Nicholson, S. E. and Webster, P. J.: A physical basis for the interannual variability of rainfall in the Sahel, Q. J. Roy. Meteorol. Soc., 133, 2065–2084, doi:10.1002/qj.104, 2007.

Nikulin et al.(2012)

Nikulin, G., Jones, C., Giorgi, F., Asrar, G., Büchner, M., Cerezo-Mota, R., Christensen, O. B. s., Déqué, M., Fernandez, J., Hänsler, A., van Meijgaard, E., Samuelsson, P., Sylla, M. B., and Sushama, L.: Precipitation climatology in an ensemble of CORDEX-Africa regional climate simulations, J. Climate, 25, 6057–6078, doi:10.1175/JCLI-D-11-00375.1, 2012.

Park et al.(2014)

Park, S.-H., Klemp, J. B., and Skamarock, W. C.: A comparison of mesh refinement in the global MPAS-A and WRF models using an idealized normal-mode baroclinic wave simulation, Mon. Weather Rev., 142, 3614–3634, 2014.

Prein et al.(2013)

Prein, A. F., Gobiet, A., Suklitsch, M., Truhetz, H., Awan, N. K., Keuler, K., and Georgievski, G.: Added value of convection permitting seasonal simulations, Clim. Dynam., 41, 2655–2677, doi:10.1007/s00382-013-1744-6, 2013.

Reynolds et al.(2002)

Reynolds, R. W., Rayner, N. A., Smith, T. M., Stokes, D. C., and Wang, W.: An improved in situ and satellite SST analysis for climate, J. Climate, 15, 1609–1625, doi:10.1175/1520-0442(2002)015<1609:AIISAS>2.0.CO;2, 2002.

Ringler et al.(2010)

Ringler, T. D., Thuburn, J., and Klemp, J.: A unified approach to energy conservation and potential vorticity dynamics for arbitrarily-structured C-grids, J. Comput. Phys., 229, 3065–3090, 2010.

Ringler et al.(2011)

Ringler, T. D., Jacobsen, D., Gunzburger, M., Ju, L., Duda, M., and Skamarock, W.: Exploring a multiresolution modeling approach within the shallow-water equations, Mon. Weather Rev., 139, 3348–3368, doi:10.1175/MWR-D-10-05049.1, 2011.

Ringler et al.(2013)

Ringler, T. D., Petersen, M., Higdon, R. L., Jacobsen, D., Jones, P. W., and Maltrud, M.: A multi-resolution approach to global ocean modeling, Ocean Model., 69, 211–232, doi:10.1016/j.ocemod.2013.04.010, 2013.

Rodwell and Hoskins(1996)

Rodwell, M. J. and Hoskins, B. J.: Monsoons and the dynamics of deserts, Q. J. Roy. Meteor. Soc., 122, 1385–1404, doi:10.1002/qj.49712253408, 1996.

Saha et al.(2010)

Saha, S., Moorthi, S., Pan, H. L., Wu, X., Wang, J., Nadiga, S., Tripp, P., Kistler, R., Woollen, J., Behringer, D., Liu, H., Stokes, D., Grumbine, R., Gayno, G., Wang, J., Hou, Y. T., Chuang, H. Y., Juang, H. M. H., Sela, J., Iredell, M., Treadon, R., Kleist, D., Van Delst, P., Keyser, D., Derber, J., Ek, M., Meng, J., Wei, H., Yang, R., Lord, St., Van Den Dool, H., Kumar, A., Wang, W., Long, C., Chelliah, M., Xue, Y., Huang, B., Schemm, J. K., Ebisuzaki, W., Lin, R., Xie, P., Chen, M., Zhou, S., Higgins, W., Zou, C. Z., Liu, Q., Chen, Y., Han, Y., Cucurull, L., Reynolds, R. W., Rutledge, G., Goldberg, M.: The NCEP climate forecast system reanalysis, B. Am. Meteorol. Soc., 91, 1015–1057, doi:10.1175/2010BAMS3001.1, 2010.

Schneider et al.(2011)

Schneider, U., Becker, A., Finger, P., Meyer-Christoffer, A., Rudolf, B., and Ziese, M.: GPCC full data reanalysis version 6.0 at 0.5∘: monthly land-surface precipitation from rain-gauges built on GTS-based and historic data, MIMEO, doi:10.5676/DWD_GPCC/FD_M_V6_050, 2011.

Skamarock et al.(2008)

Skamarock, W., Klemp, J., Dudhi, J., Gill, D., Barker, D., Duda, M., Huang, X.-Y., Wang, W., and Powers, J.: A Description of the Advanced Research WRF Version 3, NCAR Technical Note NCAR/TN-468+STR, doi:10.5065/D6DZ069T, 2008.

Skamarock et al.(2012)

Skamarock, W. C., Klemp, J. B., Duda, M. G., Fowler, L. D., Park, S.-H., and Ringler, T. D.: A multiscale nonhydrostatic atmospheric model using centroidal voronoi tesselations and C-grid staggering, Mon. Weather Rev., 140, 3090–3105, doi:10.1175/MWR-D-11-00215.1, 2012.

Smiatek et al.(2009)

Smiatek, G., Kunstmann, H., Knoche, R., and Marx, A.: Precipitation and temperature statistics in high-resolution regional climate models: evaluation for the European Alps, J. Geophys. Res., 114, D19107, doi:10.1029/2008JD011353, 2009.

Staniforth(1997)

Staniforth, A.: Regional modeling: a theoretical discussion, Meteorol. Atmos. Phys., 63, 15–29, doi:10.1007/BF01025361, 1997.

Sultan et al.(2003)

Sultan, B., Janicot, S., and Diedhiou, A.: The West African monsoon dynamics. Part I: Documentation of intraseasonal variability, J. Climate, 16, 3389–3406, doi:10.1175/1520-0442(2003)016<3389:TWAMDP>2.0.CO;2, 2003.

Sylla et al.(2010)

Sylla, M. B., Gaye, A. T., Jenkins, G. S., Pal, J. S., and Giorgi, F.: Consistency of projected drought over the Sahel with changes in the monsoon circulation and extremes in a regional climate model projections, J. Geophys. Res.-Atmos., 115, D16108, doi:10.1029/2009JD012983, 2010.

Sylla et al.(2013)

Sylla, M. B., Giorgi, F., Coppola, E., and Mariotti, L.: Uncertainties in daily rainfall over Africa: assessment of gridded observation products and evaluation of a regional climate model simulation, Int. J. Climatol., 33, 1805–1817, doi:10.1002/joc.3551, 2013.

Thuburn et al.(2009)

Thuburn, J., Ringler, T. D., Skamarock, W. C., and Klemp, J. B.: Numerical representation of geostrophic modes on arbitrarily structured C-grids, J. Comput. Phys., 228, 8321–8335, doi:10.1016/j.jcp.2009.08.006, 2009.

von Storch et al.(2000)

von Storch, H., Langenberg, H., and Feser, F.: A spectral nudging technique for dynamical downscaling purposes, Mon. Weather Rev., 128, 3664–3673, doi:10.1175/1520-0493(2000)128<3664:ASNTFD>2.0.CO;2, 2000.

Wang et al.(2014)

Wang, W., Bruyère, C., Duda, M., Dudhia, J., Gill, D., Kavulich, M., Keene, K., Lin, H.-C., Michalakes, J., Rizvi, S., Zhang, X., Berner, J., Smith, K., Beezley, J. D., Coen, J. L., Mandel, J., Chuang, H.-Y., McKee, N., Slovacek, T., and Wolff, J.: WRF Users Guide, available at: http://www2.mmm.ucar.edu/wrf/users/docs/user_guide_V3/ARWUsersGuideV3.pdf (last access: 25 August 2015), 2014.

Warner et al.(1997)

Warner, T. T., Peterson, R. A., and Treadon, R. E.: A tutorial on lateral boundary conditions as a basic and potentially serious limitation to regional numerical weather prediction, B. Am. Meteorol. Soc., 78, 2599–2617, doi:10.1175/1520-0477(1997)078<2599:ATOLBC>2.0.CO;2, 1997.

Weisman et al.(1997)

Weisman, M. L., Skamarock, W. C., and Klemp, J. B.: The resolution dependence of explicitly modeled convective systems, Mon. Weather Rev., 125, 527–548, doi:10.1175/1520-0493(1997)125<0527:TRDOEM>2.0.CO;2, 1997.

Willmott and Matsuura(2014)

Willmott, C. and Matsuura, K.: University of Delaware Air Temperature and Precipitation, Long Term Monthly Means V3.01, available at: http://www.esrl.noaa.gov/psd/data/gridded/data.UDel_AirT_Precip.html (last access: 25 August 2015), 2014.

Zaengel et al.(2015)

Zaengel, G., Reinert, D., Rípodas, P., and Baldauf, M.: The ICON (ICOsahedral Non-hydrostatic) modelling framework of DWD and MPI-M: description of the non-hydrostatic dynamical core, Q. J. Roy. Meteorol. Soc., 141, 563–579, 2015.

</app></app-group></back> </article>