Advances in coastal modeling and computation provide the opportunity to examine non-hydrostatic and compressible fluid effects at very small scales, but the cost of these new capabilities and the accuracy of these models versus trusted non-hydrostatic codes has yet to be determined. Here the Coastal and Regional Ocean COmmunity model (CROCO, v1.2) and the National Center for Atmospheric Research large-eddy simulation (NCAR-LES) model are compared, with a focus on their simulation accuracy and computational efficiency. These models differ significantly in numerics and capabilities, so they are run on common classic problems of surface-forced, boundary-layer turbulence. In terms of accuracy, we compare turbulence statistics, including the effect of the explicit subgrid-scale (SGS) parameterization, the effect of the second (dilatational) viscosity, and the sensitivity to the speed of sound, which is used as part of the CROCO compressible turbulence formulation. To gauge how far CROCO is from the NCAR-LES, we first compare the NCAR-LES with two other non-hydrostatic Boussinesq approximation LES codes (PALM and Oceananigans), defining the notion and magnitude of accuracy for the LES and CROCO comparison. To judge efficiency of CROCO, strong and weak scaling simulation sets vary different problem sizes and workloads per processor, respectively. Additionally, the effects of 2D decomposition of CROCO and NCAR-LES and supercomputer settings are tested. In summary, the accuracy comparison between CROCO and the NCAR-LES is similar to the NCAR-LES compared to other LES codes. However, the additional capabilities of CROCO (e.g., nesting, non-uniform grid, and realism of ocean configuration in general) and its weakly compressible formulation come with roughly an order of magnitude of additional costs, despite efforts to reduce them by adjusting the second viscosity and speed of sound as far as accuracy allows. However, a new variant of the non-hydrostatic CROCO formulation is currently undergoing prototype testing and should enable faster simulations by releasing the stability constrain by the free surface. Overall, when the additional features of CROCO are needed (nesting, complex topography, etc.) additional costs are justified, while in idealized settings (a rectangular domain with periodic boundary conditions) the NCAR-LES is faster in arriving at nearly the same result.

Coastal ocean modeling using limited domain sizes and open boundaries has been a standard practice for decades

The Coastal and Regional Ocean COmmunity model (CROCO) is a modeling platform for the regional and coastal ocean primarily supported by French institutes working on environmental sciences and applied mathematics (IRD, INRIA, IFREMER, CNRS, and SHOM). Built on a version (ROMS_AGRIF) of the Regional Ocean Modeling System and the non-hydrostatic kernel of SNH (a pseudo-compressible solver developed in Toulouse), CROCO has the objective to resolve problems of very fine-scale coastal areas through nesting while at the same time operating as a standard coarse-resolution coastal modeling system

A snapshot of a horizontal component of the water velocity simulated by CROCO changes increasing depth, illustrating the turbulent behavior of the CROCO model simulation.

The addition of a non-hydrostatic solver is a rare feature to incorporate into a coastal model such as CROCO, but some applications on small-scale coastal dynamics will require nonhydrostatic capability. The scalings of the fluid equations for common oceanographic problems

Typically, non-hydrostatic ocean models also employ the Boussinesq approximation

In order to test the accuracy and computational efficiency of CROCO, an idealized ocean setting is applied as a benchmark

In this paper, the comparisons between CROCO and NCAR-LES are divided into two major aspects: model prediction accuracy (Sect.

In this section, we compare the turbulence statistics simulated by the NCAR boundary-layer LES model

All simulations in this section use the following configuration. The grid has 256 uniformly spaced points in each direction (including the NCAR-LES pseudo-spectral collocation grid). The domain size is 320

The NCAR-LES model uses a two-part SGS eddy viscosity model of

The CROCO NBQ model offers several options for the SGS parameterizations. In this paper, we consider two options, namely the use of only numerical diffusion and the SGS model of

Here, we compare the NCAR-LES model with the CROCO NBQ model. As we will see shortly, the results show that these two models produce very similar boundary-layer flows, with differences comparable to those among the different LES (Sect.

The CROCO model uses a time-splitting method and uses two different time steps for the so-called fast and slow modes. In this subsection, all of the CROCO runs use a slow-mode time step of 0.5

The CROCO NBQ model has two constants related to the fast mode, namely the speed of sound

Comparison between the NCAR-LES model (solid) and the CROCO NBQ model (dashed). The line color indicates the surface forcing as shown in the legend.

Comparison between the NCAR-LES model (solid) and the CROCO NBQ model (dashed). The line color indicates the surface forcing as shown in the legend.

Figures

Each profile is an average of 21 samples taken every 1/40 (about 25 min) of the inertial period during

To understand these figures, let us first explain the non-dimensionalization used. Figures

Here, for notational simplicity, we use a positive value when energy is coming into the water.

For ease of notation, we use an energy-flux-based velocity scaleWhile

In Fig.

The length scale

The factor 4.5 is an empirical non-dimensional coefficient. Equation (

When the wind energy mixes the surface water very well and thereby significantly distracts the available potential energy due to the surface cooling, it may be more appropriate to use

In Fig.

Figure

Figure

Figure

The CROCO runs produced weaker mixed-layer deepening, although Fig.

That is, the normalized-buoyancy frequency of the pycnocline tends to be smaller for the CROCO runs, while the dimensional

Time series of the mixed-layer base depth

Figures

Figures

To further investigate this difference, the spectra of 1D discrete fast Fourier transform (FFT) modes and the circularly integrated 2D energy spectra of

Comparison of the 1D discrete FFT spectra with

Comparison of the 2D spectra averaged in circular rings at constant horizontal wavenumber magnitude from the runs with

This subsection shows how explicit SGS diffusion terms affect the results in Sect.

Time series of the mixed-layer base depth

The previous subsection also showed that the resolved turbulence quantities near the surface tend to be larger with the CROCO model without an explicit SGS parameterization. This difference also significantly reduces with the addition of the explicit SGS parameterization, as shown in Figs.

Each profile is an average of 21 samples taken every 1/40 of the inertial period during

Comparison between the NCAR run (solid) and the C3VS CROCO run (dashed) including explicit SGS dissipation with

Comparison between the NCAR run (solid) and the C3VS CROCO run (dashed) including explicit SGS dissipation with

A stronger near-surface diffusion weakens the resolved turbulence. There are some small remaining differences, but they are expected because different explicit SGS parameterizations are used in the NCAR and CROCO models.

In summary, the NCAR results and the CROCO results are overall very comparable. There are some minor differences, but most of them are due to the different SGS parameterization. The only notable difference that may not be attributable to the SGS parameterization difference is that the NCAR model runs tend to produce more internal waves in the stratified part.

For the CROCO NBQ model runs, an unphysically large value of the second viscosity

By increasing

Reducing the speed-of-sound parameter

Most statistics have only small differences that should be considered negligible for the given limited domain size (not shown). The only possibly non-negligible difference appears in the internal wave strength seen below

Decreasing

NCAR-LES was developed at NCAR to simulate planetary-boundary-layer turbulence

The Parallelized Large-Eddy Simulation Model (PALM) was developed at Leibniz Universität Hannover (Germany) as a turbulence-resolving LES model for atmospheric- and oceanic-boundary-layer flows, specifically designed to run on massive parallel computer architectures

Both NCAR-LES and PALM have been widely used in simulating atmospheric- and oceanic-boundary-layer turbulence under various idealized and realistic conditions, while Oceananigans is a new (v0.83.0 is used here), fast, and user-friendly software package for numerical simulations of geophysical fluid dynamics developed at the Massachusetts Institute of Technology in the Julia programming language

Ideally, the differences in the discretization and SGS closure schemes among the three LES models should not affect the horizontally and temporally averaged turbulence statistics for the ocean-surface-boundary-layer problem, as long as the grid cells are small enough to capture the dominant turbulent structures and the model domain is large enough to collect robust statistics. Here we assess to what extent this assumption is valid using two idealized cases: a case dominated by wind-driven shear turbulence with

Figures

A comparison of the horizontally and temporally averaged turbulence statistics among NCAR-LES (solid), PALM (dashed), and Oceananigans (dotted) in a case dominated by wind-driven shear turbulence. The normalized turbulence statistics include

Same as Fig.

In reproducing

These comparisons spanned a wider range of convective forcing (over a factor of 100) and a wider range of wind stresses (a factor of 4) than the comparisons in the previous sections. The largest differences among the simulations, however, were consistent with the preceding results. According to the comparison of horizontal (

Many factors affect the model computing efficiency, such as the structure and assignment of computing platform, MPI parallelization, 2D decomposition of the model, and some specific physical parameterizations, particularly ones that have consequences for the stability and allowable time step size. In this section, we compared the computational efficiency of the CROCO and NCAR-LES models.

The number and allocation of nodes and processors used for computing and the availability of threads matter for the model efficiency. In this study, the Cheyenne supercomputer is used for efficiency tests. The Cheyenne supercomputer, built for NCAR, operates as one of the world’s most energy-efficient and high-performance computers. Cheyenne consists of 4032 dual-socket nodes with 2.3 GHz Intel Xeon E5-2697V4 processors with 18 cores each for a total of 145 152 cores and a peak performance of 5.34 PetaFLOPS. Nodes have either 64 GB or 128 GB RAM (DDR4-2400) and networked using Mellanox EDR InfiniBand high-speed interconnects with a bandwidth of 25 GBps, bidirectional, per link. The simulations presented in this paper all ran on Cheyenne with exclusive use of the nodes. In each efficiency test, the number of nodes, the number of CPUs per node, the number of MPI processes, and the number of OpenMP threads can be specified.

Combinations of nodes and CPUs per node with different problem sizes and the total number of processors were tested. When the problem size and total number of processors are fixed, we find that the combination of more nodes and fewer CPUs per node makes the CROCO model compute more efficiently. When fewer processors per node are used, most systems still typically charge for the unused processors on each node, so this is not more efficient overall; it is just more efficient per processor in use. However, this combination of the selection of nodes and CPUs per node is more costly, and so it is typically better to stick to affordable and moderate numbers despite the higher performance because more nodes requested to Cheyenne make jobs wait longer in the waiting queue and thus the overall time to complete runs is longer although the computing time is shorter.

NCAR-LES uses pseudo-spectral discretization in the horizontal. Fast Fourier transforms (FFTs) are used to evaluate horizontal derivatives, which requires global data at all grid points in the direction along which the derivatives are evaluated. Thus, a simple domain decomposition in the two horizontal directions would need the frequent exchange of a large amount of data between different processors, which limits the computational performance. To address this, a 2D domain decomposition is used in NCAR-LES

2D domain decomposition on nine processors.

CROCO is currently supported by two parallelization options, MPI and OpenMP, which, respectively, represent distributed memory and shared memory. The awareness of CROCO MPI or OpenMP settings is necessary to be defined as needed, and the use of MPI or OpenMP is exclusive. According to the test results, when the OpenMP is not called for during compilation in CROCO, the computing time with or without OpenMP threads on Cheyenne does not affect timing, so it offers no advantages. In this paper, CROCO is used without OpenMP and with MPI, which means only one thread is used for each processor on Cheyenne, and the decomposition of processors and distribution across nodes impact the computing efficiency. The following discussion focuses on the MPI parallelization option.

The structure of CROCO MPI decomposition is divided into the XI and ETA direction, NP_XI and NP_ETA, in CROCO, and the codes represent the number of the processor assignment in XI and ETA horizontal directions, respectively. When NP_XI

The performance of the model efficiency for varying problem sizes and workload per processor are shown in Figs.

In

Figure

Figure

In the weak scaling comparison, significant experimentation using different 2D decompositions for models, different node configurations, and different CPU_per_node choices was carried out to optimize these settings for each processor count and computation size. The structure of the optimal processor grid distributions is not always a square layout. It is possible, but unlikely, that a more efficient configuration exists that was not tried. These aspects affect the workload per processor and also the comparison results and are the reason why some scaling results are slightly more or less efficient than expected in certain configurations.

Computational time per grid point per time step for different combinations of problem size for CROCO (solid) and NCAR-LES (dashed), as an example of strong scaling. NDTFAST

Computational time per grid point for a fixed amount of work (i.e., same number of slow time steps and grid points) per processor (an example of weak scaling) with 11 fast (barotropic and pseudo-acoustic) time steps per slow (baroclinic) time step

In order to evaluate the performance of the ocean model CROCO with non-hydrostatic kernels, this paper uses NCAR-LES as a benchmark for comparison. The study begins with a comparison of several different LES versions, and then, because of their close agreement, only NCAR-LES is used elsewhere. Two comparison aspects of CROCO and NCAR-LES are simulation accuracy and computational efficiency.

In the accuracy tests, the effect of the explicit SGS parameterization, the second viscosity parameter, and the speed-of-sound parameter are varied to understand these key factors impacting simulation accuracy. Once these parameters are considered, the NCAR-LES results and the CROCO results are overall within expected variations. The simulated mean flows are very similar. The only notable differences are (1) that the CROCO surface velocity tends to be slightly higher, (2) that the CROCO surface temperature tends to be slightly lower, and (3) that the CROCO pycnocline entrainment is weaker. These effects are best explained by noting that CROCO's numerical diffusion is weaker than the explicit SGS plus implicit diffusion of the NCAR model. The NCAR runs have stronger internal waves (contributing no buoyancy flux when statistically steady) and less resolved turbulent mixing. There are other minor differences, but most of them are expected due to the different SGS parameterization and limited averaging windows. Overall, the differences between CROCO and the NCAR-LES are similar to the differences between three different LES codes. The only notable difference that may not be attributable to the SGS parameterization difference is that the NCAR model runs tend to produce more internal waves where higher stratification is present, a result that is also sensitive to the speed-of-sound setting in CROCO. As for the effect of the second (dilatation) viscosity parameter, increasing

In efficiency tests, based on the Cheyenne supercomputer platform, the difference between CROCO and NCAR-LES performance at weak and strong scaling on their computational parallelization and 2D decomposition was found. The strong scaling represents the computational time per grid point per time step for different combinations of processors for each problem size, and the weak scaling represents computational time per grid point for a fixed amount of work per processor. In both cases, the computational efficiency of CROCO and NCAR-LES per time step is comparable. The number of fast subcycle time steps in CROCO affects its efficiency, but it ranged from 2 times to half as expensive as NCAR-LES per time step. To sum up, CROCO and LES are comparable on their simulation accuracy and computational efficiency per time step.

However, in these idealized test cases where the advantages of a weak compressibility approach to realistic simulations (where CROCO has specialized capabilities) are unimportant, these capabilities limit the time step in CROCO to be 6 to 14 times smaller, depending on the strength of the forcing. CROCO optimizations are ongoing and will be documented in future publications; using the Runge–Kutta version of CROCO may allow approximately a doubling of time step length

The only simulation result difference between CROCO and NCAR-LES that was not attributable to the SGS parameterization differences is that NCAR-LES tends to produce slightly more internal waves. The CROCO solutions were found to be insensitive to the values of the second viscosity and the speed of sound over wide ranges. Therefore, an artificially large value of the second viscosity and an artificially small value of speed of sound were used to increase the time step stably and accurately, as long as the speed of sound is faster than the speed of the fastest process that needs to be properly simulated (this constraint will be eased with the new variant currently being tested). Overall, when the additional features of CROCO are needed (nesting, complex topography, free surface, etc.), these additional costs can be justified, while in idealized settings, in a rectangular domain with mathematically well-defined periodic boundary conditions, the NCAR-LES is faster at arriving at nearly the same result.

The latest CROCO ROMS code is available for download at

The simulation data underlying this paper are available from the permanent archive of the Brown Digital Repository at

XF, NS, QL, and BFK conceived the project. XF, NS, and QL ran the simulations. PM, FA, and PS contributed expertise on the CROCO and NCAR-LES models. All authors participated in the writing. XF organized the writing contributions. This project is XF's Master's degree project, and BFK is her advisor.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Baylor Fox-Kemper and Xiaoyu Fan and computing at Brown University were supported by the National Science Foundation (NSF) (grant no. 1655221). Time on Cheyenne was supplied by the Computational and Information Systems Laboratory at the National Center for Atmospheric Research

This research has been supported by the National Science Foundation Office of Integrative Activities no. 1655221.

This paper was edited by P. N. Vinayachandran and reviewed by two anonymous referees.