Effective and accurate ocean and coastal wave predictions are necessary for engineering, safety and recreational purposes. Refining predictive capabilities is increasingly critical to reduce the uncertainties faced with a changing global wave climatology. Simulating WAves in the Nearshore (SWAN) is a widely used spectral wave modelling tool employed by coastal engineers and scientists, including for operational wave forecasting purposes. Fore- and hindcasts can span hours to decades, and a detailed understanding of the computational efficiencies is required to design optimized operational protocols and hindcast scenarios. To date, there exists limited knowledge on the relationship between the size of a SWAN computational domain and the optimal amount of parallel computational threads/cores required to execute a simulation effectively. To test the scalability, a hindcast cluster of 28 computational threads/cores (1 node) was used to determine the computation efficiencies of a SWAN model configuration for southern Africa. The model extent and resolution emulate the current operational wave forecasting configuration developed by the South African Weather Service (SAWS). We implemented and compared both OpenMP and the Message Passing Interface (MPI) distributing memory architectures. Three sequential simulations (corresponding to typical grid cell numbers) were compared to various permutations of parallel computations using the speed-up ratio, time-saving ratio and efficiency tests. Generally, a computational node configuration of six threads/cores produced the most effective computational set-up based on wave hindcasts of 1-week duration. The use of more than 20 threads/cores resulted in a decrease in speed-up ratio for the smallest computation domain, owing to the increased sub-domain communication times for limited domain sizes.
The computational efficiency of metocean (meteorology and oceanography) modelling has been the topic of ongoing deliberation for decades. The applications range from long-term atmospheric and ocean hindcast simulations to the fast-responding simulations related to operational forecasting. Long-duration simulations are usually associated with climate-change-related research, with simulation periods of at least 30 years across multiple spatial and temporal resolutions needed to capture key oscillations (Babatunde et al., 2013). Such hindcasts are frequently used by coastal and offshore engineering consultancies for purposes such as those related to infrastructure design (Kamphuis, 2020) or environmental impact assessments (Frihy, 2001; Liu et al., 2013).
Operational (or forecasting) agencies are usually concerned with achieving simulation speeds that would allow them to accurately forewarn their stakeholders of immediate, imminent and upcoming metocean hazards. The main stakeholders are usually other governmental agencies (e.g. disaster response or environmental affairs departments), commercial entities, and the public. Both atmospheric and marine forecasts share similar numerical schemes that solve the governing equations and thus share a similar need in computational efficiency. Fast simulation times are also required for other forecasting fields such as hydrological dam-break models (e.g. Zhang et al., 2014). Significant advancement in operational forecasting can be made by examining the way in which the code interfaces with the computation nodes, and how results are stored during simulation.
Numerous operational agencies (both private and public) make use of Simulating Waves in the Nearshore (SWAN) to predict nearshore wave dynamics (refer to Genseberger and Donners, 2020, for details regarding the SWAN numerical code and solution schemes). These agencies include the South African Weather Service (e.g. Rautenbach et al., 2020), MetOcean Solutions (a division of the MetService of New Zealand) (e.g. de Souza et al., 2020), the United Kingdom Met Office (e.g. O'Neill et al., 2016) and the Norwegian Meteorological Institute (e.g. Jeuring et al., 2019). In general, these agencies have substantial computational facilities but nonetheless still face the challenge of optimizing the use of their computational clusters between various models (being executed simultaneously). These models may include atmospheric models (e.g. the Weather Research and Forecasting (WRF) model), Hydrodynamic models (e.g. Regional Ocean Modeling System (ROMS) and the Semi-implicit Cross-scale Hydroscience Integrated System Model (SCHISM)) and spectral wave models (e.g. Wave Watch III (WW3) and SWAN; Holthuijsen, 2007; The SWAN team, 2006, 2019b, a). Holthuijsen (2007) presents a theoretical background to the spectral wave equations, wave measurement techniques and statistics as well as a concluding chapter on the theoretical analysis of the SWAN numerical model. There must also be a balance between hindcast and forecast priorities and client needs. Some of these agencies use a regular grid (instead of irregular grids, e.g. Zhang et al., 2016), with nested domains in many of their operational and hindcast projects. Here, we focus only on the computational performance of a structured regular grid (typically implemented for spectral wave models). Kerr et al. (2013) performed an inter-model comparison of computational efficiencies by comparing SWAN, coupled with ADCIRC, and the NOAA official storm surge forecasting model, Sea, Lake, and Overland Surges from Hurricanes (SLOSH); however, they did not investigate the optimal thread usage of a single model. Other examples of coupled wave and storm surge model computational benchmarking experiments include Tanaka et al. (2011) and Dietrich et al. (2012), who used a unstructured meshes to simulate waves during Hurricanes Katrina, Rita, Gustav and Ike in the Gulf of Mexico. Results from these models were presented on a log–log scale, and their experimental design tested computational thread numbers not easily obtainable by smaller agencies and companies. The latter rather require sequential versus paralleled computational efficiencies using smaller-scale efficiency metrics. Genseberger and Donners (2015) explored the scalability of SWAN using a case study focused on the Wadden Sea in the Netherlands. By investigating the efficiency of both the OpenMP (OMP) and MPI version of the then current SWAN, they found that the OpenMP was more efficient on a single node. They also proposed a hybrid version of SWAN, to combine the strengths of both implementations of SWAN: using OpenMP to more optimally share memory and MPI to distribute memory over the computational nodes.
Here we build on the case study of Genseberger and Donners using results produced in the present study for southern Africa, to answer the following research questions: (1) when using SWAN, is it always better to have as many threads/cores as possible available to solve the problem at hand? (2) What is the speed-up relationship between number of threads/cores and computational grid size? (3) At what point (number of threads/cores) do the domain sub-communications start to make the whole computation less effective? (4) What is the scalability of a rectangular computational grid in a typical SWAN set-up?
Details of the model configuration can be found in Rautenbach et al. (2020a, b). The computational domain (refer to Fig. 1) and physics used here were the same as presented in those studies.
SWAN model extent and associated bathymetry. The location of all the major coastal towns are also provided via acronyms as follows: Port Nolloth (PN), Saldanha Bay (SB), Cape Town (CT), Mossel Bay (MB), Port Elizabeth (PE), East London (EL), Durban (DN) and Richards Bay (RB).
All computations were performed on Intel Xeon Gold E5-2670 with 2.3 GHz computational nodes. A total of 28 threads/cores each with 96 GB RAM was used with 1 Gbyte inter-thread communication speed. Given that the present study was performed using a single computational node, inter-node communication speeds are not considered. Thus, given a computational node with similar processing speed, the present study should be reproducible. In general, these node specifications are reasonably standard, and therefore the present study is an acceptable representation of the SWAN scalability parameters.
SWAN 40.91 was implemented with the Van der Westhuysen whitecapping formulation (van der Westhuysen et al., 2007) and Collins bottom friction correlation (Collins, 1972) with a coefficient value of 0.015. Fully spectral wave boundary conditions were extracted from a global Wave Watch III model at 0.5 geographical degree resolution.
Here, the main aim was not the validation of the model but rather to quantify the relative computational scalabilities, as described at the end of the previous section. However, it should be noted that no nested domains were employed during the present study. Only the parent domain was used as a measure for scalability. The computational extent given in Rautenbach et al. (2020a, b) contains numerous non-wet grid cells that are not included in the computational expense of the current study. In Table 1, the size of the computational domain and resolution together with the labelling convention is given. For clarity, we define the resolutions as low, medium and high, denoted L, M and H, respectively, in the present study (noting that given the domain size, these resolutions would be classified as intermediate to high regional resolution for operational purposes).
SWAN grid resolution, grid cell numbers and reference labels.
The test for scalability ability of a model used here was the ability to respond to an increased number of computations with an increasing amount of resources. In the present study these resources are computational threads/cores. Computations were performed for an arbitrary week to assess model performance. Model spin-up was done via a single stationary computation. The rest of the computation was performed using a non-stationary computation using an hourly time step, which implied wind-wave generation within the model occurred on the timescale of the wind forcing resolution. The grid resolutions used in the present study corresponded to 0.1, 0.0625 and 0.05 geographical degrees. Local bathymetric features were typically resolved through downscaled, rotated, rectangular grids, following the methodology employed by Rautenbach et al. (2020a). A nested resolution increase of more than 5 times is also not recommended (given that the regional model is nested in the global Wave Watch III output at 0.5 geographical degree resolution, (Rautenbach et al., 2020a). Given these constraints, these resolutions represent a realistic and typical SWAN model set-up, for both operational and hindcast scenarios.
The three main metrics for estimating computational efficiency are the
The speed-up ratio is given as
The time-saving ratio is given by
The scalability of SWAN was tested based on the speed-up ratios for the grid resolutions in Table 1.
Zafari et al. (2019) recently
presented some of the first results investigating the effect of different
compilers on the scalability of a shallow water equation solver. Their
experiments compared a model compiled with GNU Compiler Collection (gcc)
7.2.0 and linked with OpenMPI and Intel C
From a practical point of view, regular SWAN grids will rarely be used in
dimensions exceeding the resolutions presented in the previous section. The
reason for this statement is twofold: (1) to downscale a spectral wave model
from a global resolution to a regional resolution should not exceed a
5-fold refinement factor and (2) when reasonably high resolutions are
required in the nearshore (to take complex bathymetric features into
account), nested domains are preferred. The reasoning will be different for
an unstructured grid approach
(Dietrich et al., 2012).
Given these limitations with the widely used structured SWAN grid approach,
SWAN grids will almost exclusively be deemed as a low spatial computational
demand model. Small tasks create a sharp drop in performance via the Intel
C
In Fig. 2, the computational scalability of SWAN
is given as a function of number of computational threads/cores.
Figure 2a shows the computational time in seconds
and here the model resolutions grouped together with not much
differentiation between them. These results also highlight the need for
performance metrics, as described in the previous section. From
Fig. 2b the MPI version of SWAN is more
efficient for all the computational domain sizes. There is also a clear
grouping between OMP and MPI. Figure 2c presents
the speed-up ratios and clearly indicates that the MPI version of SWAN
outperforms the OMP version. The closer the results are to the
Model performance as a function of the number of
computational threads/cores.
Near-linear speed-up is observed for a small number of computational threads/cores. This result agrees with those reported by Zafari et al. (2019). In Fig. 2d the results are expressed via the time-saving ratio. In this case, the curves start to asymptote with thread counts larger than approximately 6.
The behaviour noted in the results is similar to the dam-breaking
computational results reported by Zhang et al. (2014). Genseberger and Donners (2020)
present the latest finding on the scalability and benchmarking of SWAN.
However, their focus was quantifying the performance of their new hybrid
version of SWAN. In their benchmarking experiments (for the Wadden Sea, in
the Netherlands), they obtained different results to
Fig. 2a, with OMP generally producing faster
wall-clock computational times. They also considered the physical distances
between computational threads/cores and found that this parameter has a
negligible effect compared to differences between OMP and MPI, over an
increasing number of threads/cores. Their benchmarking also differed from
the results presented here as they only provided results as a function of
node number. Each one of their nodes consisted of 24 threads/cores. In the
present study, the benchmarking of a single node (28 threads/cores) is
evaluated compared with a serial computation on a single thread. For
benchmarking, without performance metrics, they found that the wall clock
times, for the iterations and not a full simulation, reached a minimum (for
large computational domains) at 16 nodes (16
The present study investigated the scalability of SWAN, a widely used
spectral wave model. Three typical wave model resolutions were used for
these purposes. Both the OpenMP (OMP) and the Message Passing Interface
(MPI) implementations of SWAN were tested. The scalability is presented via
three performance metrics: the efficiency, speed-up ratio and the timesaving
ratio. The MPI version of SWAN outperformed the OMP version based on all
three metrics. The MPI version of SWAN performed best with the largest
computational domain resolution, resulting in the highest speed-up ratios.
The time-saving ratio indicated a decrease after approximately six
computational threads/cores. This result suggests that six threads/cores
are the most effective configuration for executing SWAN. The largest
increases in speed-up and efficiency were observed with small thread counts.
According to Genseberger and Donners (2020),
computational times decrease up to
The open-source version of SWAN was run for the purposes of the present
study. SWAN may be downloaded from
CR conceptualized the study, executed the experiments and wrote the paper. He also secured the publication funding. JCM and KRB reviewed the paper.
The authors declare that they have no conflict of interest.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was funded by the National Research Foundation of South Africa (grant number 116359).
This paper was edited by Julia Hargreaves and reviewed by two anonymous referees.