Large-eddy simulation (LES) provides a physically sound approach to study complex turbulent processes within the atmospheric boundary layer including urban boundary layer flows. However, such flow problems often involve a large separation of turbulent scales, requiring a large computational domain and very high grid resolution near the surface features, leading to prohibitive computational costs. To overcome this problem, an online LES–LES nesting scheme is implemented into the PALM model system 6.0. The hereby documented and evaluated nesting method is capable of supporting multiple child domains, which can be nested within their parent domain either in a parallel or recursively cascading configuration. The nesting system is evaluated by first simulating a purely convective boundary layer flow system and then three different neutrally stratified flow scenarios with increasing order of topographic complexity. The results of the nested runs are compared with corresponding non-nested high- and low-resolution results. The results reveal that the solution accuracy within the high-resolution nest domain is clearly improved as the solutions approach the non-nested high-resolution reference results. In obstacle-resolving LES, the two-way coupling becomes problematic as anterpolation introduces a regional discrepancy within the obstacle canopy of the parent domain. This is remedied by introducing canopy-restricted anterpolation where the operation is only performed above the obstacle canopy. The test simulations make evident that this approach is the most suitable coupling strategy for obstacle-resolving LES. The performed simulations testify that nesting can reduce the CPU time up to 80 % compared to the fine-resolution reference runs, while the computational overhead from the nesting operations remained below 16 % for the two-way coupling approach and significantly less for the one-way alternative.

Large-eddy simulation (LES) has been used for basic research of atmospheric boundary layer (ABL) phenomena using idealized model setups for decades. At present it is becoming an important method in applied research on realistic, very detailed, and complicated flow systems such as urban ABL problems

Many conventional continuum-based numerical solution methods (e.g., finite-element and finite-volume methods) allow variable resolution so that the resolution can be concentrated to the area of principal interest and relaxed elsewhere. However, only unstructured grid systems allow full advantage to be taken of spatially variable resolution. Many general-purpose computational fluid dynamics packages offer unstructured grid systems, but according to our experience such solvers are usually computationally decidedly less efficient than ABL-tailored LES models, such as PALM

The idea of grid nesting is to simultaneously run a series of two or more LES model domains with different spatial extents and grid resolutions. In this implementation the outermost and coarsest-resolution LES domain (termed “root” domain henceforth), which acts as a “parent” to its “child” domains, obtains its boundary conditions in a conventional manner, whereas the nested LES domain (child) always obtains its boundary condition from its respective parent domain through interpolation. In one-way coupled nesting only the children obtain information from their parents. In such a coupling strategy, the instantaneous child and parent solutions can deviate within the volume of the nest. If a stronger binding between the solutions is desired, the child solution needs to be incorporated into the parent solution. This is achieved in two-way coupled nesting, where the parent solutions are influenced by their children through so-called “anterpolation”

The child-to-parent anterpolation can be implemented using, for instance, the post insertion (PI) approach

Recently,

One-way coupled obstacle-resolving LES has been applied to a built environment by

The paper is organized as follows. Section

The PALM model system

Both the Boussinesq and the anelastic approximation require incompressibility of the flow. To provide this feature a predictor–corrector method is used where an equation is solved for the modified perturbation pressure after every Runge–Kutta sub-time step

Parallelization of PALM is achieved by using the message passing interface

PALM offers several embedded models to simulate physical processes within the urban environment. Namely, without the intention of providing an exhaustive list, this embraces a land surface

The nesting system we have developed is based on the concept of parent and child domains. Each parent domain can enfold multiple child domains, but a child domain can, naturally, only have one parent domain. The top-level domain, also called the root domain, acts as a parent domain to child domains at the first nesting level. The child domains at first nesting level might have subsequent child domains for which they then act as parent domains (cascading arrangement); see Fig.

A schematic example of a nested configuration involving both cascading and parallel child domains is shown on

In general, the system is designed as two-way coupled nesting, in which a child domain can affect its parent domain and vice versa. It is possible, however, to run the system in a one-way coupled mode where no feedback from the child domain is incorporated in its parent domain. Moreover, it is possible to use the system as a pure vertical one-dimensional nesting, where the lowest part of the model (e.g., the atmospheric surface layer where the dominant turbulent eddies are usually very small) can be run as a child domain with finer grid spacing than its parent domain that compasses the entire boundary layer. In the case of pure vertical nesting, cyclic boundary conditions must be set on all the lateral boundaries. Unlike the method proposed by

The present nesting approach is a variant of the PI method, in which the communication between each parent–child couple is realized via interpolations (from parent to child) and anterpolations (from child to parent) after each Runge–Kutta sub-step and just before the pressure solver. The latter then ensures that mass conservation is enforced in the anterpolated solution in the parent domain.

The current implementation poses a few restrictions for the nested setups. Moreover, the interpolation and anterpolation methods, which are discussed in the following sections, are based on certain assumptions, e.g., on the grid line matching between parent and child domains leading to a few more restrictions. Altogether these restrictions are as follows:

the child domain must always be completely inside its parent domain, and there must be a margin of four parent grid cells between the boundaries of child and parent domains;

parallel child domains must not overlap each other, and there must be a margin of at least four child grid cells between two parallel child domains;

the domain decomposition of all child domains must be such that the sub-domain size is never smaller than the parent grid-spacing in the respective direction;

buildings or any other topography or geometry must not reach the child domain top;

all the grid-spacing ratios must be integer-valued;

the outer boundaries of child domains must match with grid planes in its parent domain;

no grid stretching is allowed in the child domains, and in root domain it is allowed only above the top boundary of the highest child domain.

Ideally, the coupling actions, i.e., data transfers between the domains, anterpolation, and interpolation, would be performed after the pressure-correction step using the final divergence-free velocity field on both parent and child. To achieve this in the context of the pressure-correction method employed in PALM requires a staged arrangement of the coupling actions such that a child first sends data to its parent and after receiving the data the parent anterpolates and performs the pressure correction step. After the pressure-correction step the parent sends data to the child, which interpolates new boundary conditions from the received data and performs the pressure-correction step. The purely vertical nesting method implemented in PALM by

Flow chart illustrating the nesting actions in the case of three domains in cascading order. In the case of more than three domain levels, more branches similar to the current middle branch would be added. Blue boxes represent baseline PALM actions, while the other colors indicate nesting-specific actions. In one-way coupling only the actions indicated by pink color are invoked.

In the present method, the overestimation of the horizontal velocity component variances can be mostly avoided by using the integral mass-balance forcing (Eq.

In addition to the velocity field, all other prognostic variables are also coupled, except for the SGS turbulent kinetic energy (

Further technical implementation issues are discussed in Appendix

Conservation of fluxes through the nest boundaries is an essential condition for a nesting algorithm. By flux conservation we mean that the total flux through a nest–domain boundary is equal to the total flux through the corresponding plane in the parent domain. This must not be confused with the mass-conservation error discussed in Sect.

Earlier studies by

In principle, the most straightforward way to satisfy flux conservation is to directly use the flux on the parent grid cell face on the nested boundary and to distribute it onto the underlying child grid cell faces akin to the finite-volume method. However, PALM is based on the finite-difference method, and thus its architecture does not support this method. Therefore, it is necessary to construct an interpolation procedure that is at least approximately flux-conservative.

Before laying out the new interpolation procedure in Sect.

We first consider the work by

In order to ensure flux conservation for all prognostic variables,

Staggered velocity component nodes in cases of odd (3)

In our view, the limitations of the method by

In another deviation from the

As stated above, the flux conservation condition is satisfied approximately for

Above, the flux conservation was discussed generally without taking into account the effects of the actual discretization scheme employed in the advection algorithm. In this subsection, a mismatch of the advection term approximations on the child and parent sides in PALM is first identified and discussed, and subsequently a method to reduce the flux-conservation error resulting from this mismatch is proposed.

In Sect.

The first-order upwind scheme makes the advected values on child-boundary flux points independent of the child solution itself if the local flow direction is into the child domain. This is important from a flux-conservation point of view as the flux into the child domain should be entirely controlled by the parent solution. On the other hand, the first-order upwind scheme leads to values on the child boundary flux points that may differ from those on the corresponding grid plane on the parent side as those are interpolated with the fifth-order scheme. Therefore, additional flux-conservation errors may be generated.

We have not found any way to totally eliminate the resulting additional flux-conservation error, but we can reduce it in the following way. Instead of using the original parent grid values in the parent-to-child interpolation, we replace them with values pre-interpolated (Fig.

A schematic illustration of the interpolation operations on the left child boundary as an example. The child grid nodes on the left side are boundary ghost nodes. Phase 1 operations belong to the TBP, and phase 2 operations belong to the actual parent-to-child interpolation using Eqs. (

The TBP must not employ more than one parent grid layer behind a child domain boundary because the child has no information about the parent domain geometry outside the first parent grid layer. An interpolation stencil reaching further away could penetrate a vertical wall leading to erroneous interpolation. Therefore, the best available choice is to simply use the average of the parent grid values on both sides of the child domain boundary, i.e., a second-order interpolation. Obviously it is different from the fifth-order scheme, but we argue that the difference between values interpolated onto the boundary plane using the fifth-order and second-order schemes can be expected to be smaller than the difference between those interpolated using the fifth-order and first-order schemes. Our numerical tests support this argument.

On the top boundary there is no geometry, and hence we can use a wider TBP stencil there. We ended up using the third-order

The sequence of interpolation operations is illustrated in Fig.

We evaluated the flux-conservation error in a simple test run modeling a horizontally homogeneous slightly convective boundary layer over flat terrain with capping inversion at

According to our numerical tests presented in the Sect.

Anterpolation is used to feed the child domain solution back to its parent domain. Generally, anterpolation consists of filtering the fine-grid child solution

Buffer zones where the anterpolation is omitted are applied next to the child domain boundaries (except the bottom boundary). The main purpose of the buffer zones is to avoid an unstable feedback loop between the anterpolation and interpolation. The default width of these buffer zones is two prognostic grid nodes. The user may choose a different value for the buffer width, but the minimum allowed width is one parent grid spacing. This is because the layer of nodes nearest to the child boundary is directly used in the interpolation and using an anterpolated value for interpolation leads to a strongly unstable behavior. The buffer zones are comparable to the relaxation zones applied in the nesting system of the WRF-LES model

The anterpolation algorithm is implemented in the PALM model with a feature that enables its application in a spatially selective manner such that the operation is only performed within the computational domain that is above a user-defined vertical threshold. This practice is discovered to resolve complications that arise when two-way coupled nesting is applied in obstacle-resolved LES simulations where the anterpolated solution within the obstacle canopy introduces discrepancies in the coarser parent solution. Thus, we label this approach canopy-restricted (CR) anterpolation, and the coupling is referred to as two-way CR. The necessity of this anterpolation strategy is motivated and its effectiveness demonstrated in Sect.

In order to evaluate the nesting strategy, show its benefits, and point out its limits, we performed a series of nested model simulations for different grid-spacing ratios and respective non-nested reference simulations for different atmospheric situations. The idea is not to mainly validate the PALM model against experimental data but instead to systematically compare the nested-domain results to corresponding non-nested fine- and coarse-grid reference results and to show that the nested-domain solutions are closer to the fine-grid reference solutions than the coarse-grid reference solutions. PALM has been already evaluated for various ABL flows against measurement data

We simulated a homogeneously heated flat-terrain convective boundary layer and a purely shear-driven flat-terrain boundary layer. Further, to investigate the performance of the grid nesting in more complex situations where non-flat topography is present, we performed two-staged nested simulations for a neutrally stratified flow over a smooth three-dimensional hill and will compare the results against wind tunnel data. These three test cases were simulated only employing the two-way coupling. Second, we simulated a neutrally-stratified urban boundary layer flow over a regular staggered arrangement of building cubes using one- and two-staged nesting, and will compare the nested simulation results to corresponding non-nested fine- and coarse-grid simulation results. Details concerning the different simulation setups are given in their respective sections. Note that for the sake of simplicity velocity components will hereafter be addressed by lower case variable names only, regardless of whether they refer to the flow in the parent or the child domain.

The nesting method is first evaluated for a pure convective boundary layer (CBL) with zero mean wind. We set up one child domain that is centered within the parent domain. For the root domain, cyclic lateral boundary conditions were set. A homogeneous and time-constant surface sensible heat flux of

Instantaneous horizontal

Figure

The 30 min time-averaged and horizontally averaged profiles of

Figure

The heat flux profiles in the child and parent simulations decrease linearly with height within the CBL and are in good agreement with the fine-reference simulation. For the parent simulation we note the near-surface kink in the heat flux (see Fig.

The variances of the horizontal and vertical velocity components, as well the skewness of the vertical component, depend strongly on the grid spacing as the coarse- and fine-grid reference simulations show, where the variances (skewness) become smaller (larger) for increasing grid spacing. The parent simulation agrees well with the coarse-resolution simulation, indicating that the anterpolation changes the parent flow field only marginally. The variances and skewness in the child simulations agree with the fine-reference profiles, except for the upper regions of the child domain where the variances are slightly overestimated. The child profiles are almost independent of grid-spacing ratio and are close to the reference simulation profile. This indicates that the child solutions are almost independent of the chosen grid-spacing ratio in the studied cases.

Although there is no mean horizontal advection in the zero-mean wind CBLs, spatially and temporally local horizontal advection always takes place, and therefore flow structures are advected locally from parent to child (and vice versa). Therefore, advected flow structures may need a certain fetch to adjust to the changed grid spacing. In order to get an idea of how much distance from the lateral child boundaries is required to observe similar turbulence properties as in a non-nested fine-resolution reference simulation, we performed a spectral analysis. Therefore, we sampled time series of turbulent kinetic energy (TKE) and

Frequency spectra of the TKE

Horizontal cross section of 5 h time-averaged vertical velocity at

Even though the child simulations yield turbulence profiles, spectra, and instantaneous flow patterns similar to the fine-grid reference simulation for a pure buoyancy-driven flow, the nested simulation nevertheless creates side effects on the flow which appear as a secondary circulation (SC). This SC is not caused by a violation of mass conservation that has been discussed in Sect.

SCs develop above surface heterogeneities mainly due to differential surface heating of the air, resulting in mean updrafts and downdrafts over the stronger and less-heated patches, respectively. However, since we prescribe the same surface sensible heat flux in the parent and the child simulations, differential surface heating cannot be the reason for the SC in the nested simulation.

Even though this inherent artificially induced SC only appears when the flow is averaged over a longer time under quasi-stationary conditions (no diurnal cycle, no change in the mean wind, etc.), nested simulation results should be interpreted carefully in terms of SCs. In particular, since the strength of the artificial SC is on the order of “real-world” circulations over heterogeneous terrain, these two may become superimposed, altering the pattern of the vertical transport of sensible and latent heat. Although we did not succeed in proving our hypothesis, we encourage other researchers to look for the existence of such SCs in any nested models by analyzing the time-averaged results.

As further test cases, we set up a series of boundary layer flow simulations with increasing order of complexity. First, to evaluate the performance of grid nesting in shear-driven boundary layer flows, we simulated a flow over a homogeneous flat surface in order to compare first- and second-order moments from a nested simulation against reference simulations. In a second step, we simulated a flow over a smooth three-dimensional hill for comparison of nested simulation results against wind tunnel data. Finally, in order to illustrate the advantages of the grid nesting in more complex setups, we simulated a flow over a staggered arrangement of cubes mounted on a flat surface.

The parent domain size for all neutrally stratified simulations was

In order to obtain a turbulent inflow, we applied a turbulence recycling method according to

Further, in order to avoid persistent streaks in the

In order to evaluate the effect of the nesting, we performed additional non-nested reference simulations with 4 and 2

Figure

Instantaneous horizontal cross section of the

The 2 h time-averaged and

Figure

Figure

The 2 h time-averaged and

Figure

Frequency spectra of the resolved-scale TKE taken at different sampling locations downstream of the inflow boundary for the neutrally stratified boundary layer at

In order to further analyze the flow adjustment within the child domain, we computed resolved-scale TKE spectra at different distances from the child inflow boundary. The spectra were calculated from time series of the three velocity components that were sampled at different locations within the domain. The final spectra were then obtain by averaging individual spectra over all locations with identical distance to the inflow boundary, assuming that the flow is parallel to the

In contrast to a buoyancy-driven boundary layer, the flow in a purely shear-driven boundary layer requires a sufficiently large development distance to adjust to the finer grid resolution. However, a purely shear-driven flow over a flat homogeneous surface can certainly be considered as an extreme case in terms of flow adjustment, as the vertical turbulent exchange, which is primarily driven by surface-roughness-induced shear, is rather low compared to less idealized flows over non-flat terrain or with obstacles included. Hence, we expect that the required fetch length may decrease for rougher surfaces and more complex surface geometries.

The hill case is studied to compare flow statistics against the wind tunnel observations conducted by

The terrain height of the smooth three-dimensional hill is given by

The 2 h time-averaged vertical cross section of the

The 2 h time-averaged vertical profiles of the standard deviation of the

The 2 h time-averaged vertical profiles of the standard deviation of the

Figure

Root-mean-square deviation (RMSD) of the simulated

The final test case features a neutral atmospheric boundary layer flow over flat terrain, which becomes incident with a staggered pattern of cubical obstacles. The resulting flow scheme resembles urban canopy turbulence where the interaction between roughness elements and ABL turbulence is primarily resolved. Here, the cubical obstacle height is

An overview of the cubical obstacle case layout. The obstacles are cubes with 40 m sides. The figure displays two nested arrangements: version 1 (v1), featuring a root domain and a secondary nest domain, and version 2 (v2), which also includes a tertiary nest domain embedded within a larger secondary nest. The root and nested domains are indicated with

To demonstrate the flexibility of the nesting implementation, we carried out simulations with two different nested configurations illustrated in Fig.

First, in the context of obstacle-resolved LES, we motivate the employment of an optional canopy-restricted (CR) anterpolation strategy introduced in Sect.

Instantaneous close-up view of vorticity magnitude (

The visualization indicates the strength and spatial structure of the resolved turbulent eddies and how they are affected by grid resolution. The differences are significant. In such an obstacle-resolving LES, the increased grid resolution has the ability to alter the flow solution to such a degree that the anterpolation introduces details to the coarser parent that are inconsistent with the rest of the parent's flow solution. Particularly with blunt-body obstacle canopy flows, this discrepancy is clearly manifested as a locally changing resultant pressure drag (caused by the obstacles) within the anterpolated domain. To inspect this, we compute the resultant pressure drag coefficient

Resultant pressure force coefficients

This problematic behavior is significantly abated by adopting the CR anterpolation strategy setting and by setting the vertical threshold at

To further evaluate the nesting performance, we exploit root-(normalized)-mean-square difference (RNMSD or RMSD) and fractional bias (FB) as comparison metrics

Both model variants (v1 and v2) are included in the analysis to demonstrate how the size and placement of the child domains affect the metrics and to illustrate the possibility of employing a cascade of nested domains. Although no comparison metrics are presented for the second child solution featuring 1 m resolution, its influence is embedded in the solution of the first child.

Vertical distributions of root-(normalized)-mean-square difference (RNMSD or RMSD) of velocity components for configuration v1

Vertical distribution of RNMSD of the velocity variances for configuration v1

Fractional bias (FB) values evaluated over 15

The RNMSD and RMSD profiles for the velocity components and their variances depicted in Figs.

The one-way coupling approach consistently performs better than the unmodified two-way coupled approach in all metrics, but it is also associated with a systematic bias that is larger than the value by coarse reference. However, if the modest systematic shift in streamwise velocity can be accepted, the one-way coupling offers a cost-effective nesting coupling approach (see Sect.

Table

Total number of grid points

Due to the interpolation and anterpolation and the accompanied inter-model data transfer, the nesting itself consumes CPU time. In our tests the workload with respect to the number of grid points treated by a processor element was equal among the parent and the child simulations. With this optimal configuration, the two-way nesting consumed about 10 %–16 % of the CPU time in our tests, while it consumes only about 2 % in the one-way nesting. This suggests that most of the CPU time taken by two-way nesting is consumed in the anterpolation and the associated child to parent data transfer.

It is important to bear in mind that if the workload between child and parent processes is not well balanced, the faster processes need to wait before the data transfer can start until the slower processes reach that point, reducing the computational efficiency of the nesting.

This article documents and evaluates an online LES–LES nesting scheme implemented into the PALM model system 6.0. The nesting system relies on the post-insertion approach and features both one-way and two-way coupling approaches. We give a detailed description of the model's relevant technical, algorithmic, and numerical aspects and provide evidence for the accuracy gains the method introduces with a dramatically reduced computational cost compared to globally refined grid resolution. The nesting approach has proven particularly essential in urban boundary layer studies requiring obstacle-resolving LES.

The implementation of this three-dimensional nesting system is based on two-level parallelism involving inter-model and intra-model parallelization using the MPI. This enables our nesting implementation to flexibly support multiple child domains, which can be nested within their parent domain either in a parallel or recursively cascading configuration. All solutions involved within the nested simulation are advanced using a globally synchronized time step, whereas the coupling between each parent–child pair is performed with interpolation (parent to child) and anterpolation (child to parent) operations.

The nesting method is evaluated by performing a series of numerical experiments with an objective to demonstrate that the refined child solution (nested within a coarser parent) approaches the non-nested reference solution obtained by employing fine resolution globally.

The first test case features horizontally homogeneous convective boundary layer (CBL) with no mean wind. In this case, first- and second-order boundary layer statistics are well captured in the child domain and are closely comparable to non-nested high-resolution reference statistics. Further, due to the local nature of turbulence production and the weak advection from parent into the child, the flow statistics show almost no dependence on the distance to the child boundaries. However, in the case of averaging times that are several hours long, we found that a nonphysical secondary circulation develops despite the surface heating being homogeneous. We hypothesize that this secondary circulation is an inherent consequence of the spatially changing description of flow physics in the parent and child solutions. Even though we demonstrated this issue with a rather idealized setup using an unrealistically long averaging time and these nonphysical circulations are probably minimized in less idealized simulations, e.g., those with wind, a diurnal cycle, or realistic terrain surfaces, we believe that this should be kept in mind when applying the nesting system to CBL problems.

The second test case simulated neutrally stratified boundary layer flow over flat terrain. The nested simulations reveal that the flow solution within the child domain must undergo a development phase, as the flow solution adjusts to the higher resolution before reaching equilibrium state again. The required development length depends on the grid-spacing ratio between parent and child. However, a purely shear-driven flow over a homogeneous flat terrain can be considered an extreme scenario with respect to the development length of turbulence, while in cases with more complex surface geometry the flow adapts within shorter development distances. Beyond the development distance, the child solution for grid-spacing ratios of 2 and 3 agree well with the non-nested fine-reference solution, but in cases with a grid-spacing ratio of 4 the results clearly deviate from the fine-reference solution.

The third numerical experiment featured boundary layer flow (similar to second test case) over a smooth three-dimensional hill. This test case also exploits wind tunnel measurements to strengthen the nesting model evaluation. In this case, the flow statistics in the windward and the leeward part of the hill are almost the same as in a fine-reference simulation and agree well with wind tunnel observations presented in

The final test case examines a flow system where a fully developed boundary layer flow becomes incident with a staggered arrangement of cube-shaped obstacles. This flow scenario closely resembles an obstacle-resolving urban boundary layer flow situation. The case revealed that in two-way coupled simulations, the anterpolated child solution introduces discrepancies within the parent domain, which manifest as elevated pressure drag within the anterpolated zone. This complication is remedied by introducing a canopy-restricted anterpolation approach where anterpolation is omitted within the obstacle canopy. By computing comparison metrics, root-normalized-mean-square difference, and fractional bias to quantify the difference between the fine-reference and nested solutions, the canopy-restricted two-way coupling is shown to be the best coupling strategy for obstacle-resolving LES studies.

Future development is planned to include the following tasks. Incorporation of PALM's Lagrangian particle model in the nesting system in order to enable Lagrangian dispersion studies in urban environments in such a way that particles can be transferred between parent and child domains depending on their position. Thus, the long-distance transport of, e.g., pollutants, can be simulated in a coarse-resolution parent grid, while dispersion on the street scale for specific locations can be simulated in a fine-resolution child domain. We note that this has been already implemented into PALM and is available to users, but further sensitivity tests with respect to the treatment of stochastic subgrid-scale particle speeds

Furthermore, to date the time step in all parent and child models has been synchronized and restricted to the minimum of the time steps determined by each model independently using the CFL criterion. In our experience, the global time step is often restricted by the flow around building edges where high wind speeds occur within the fine-grid child domains. Hence, we plan to implement a time-splitting into PALM where the parent and child models will be coupled only at the end of the parent time steps. This would allow us to run coarser-scale parent domains with larger time steps. Thus, computational time could be saved in the time-integration of the parent simulation as well as in the inter-model communication between parent and child.

The nested model system is implemented using two levels of MPI communicators. The inter-model communication (communication between model domains) is handled by a global communicator using the one-sided communication pattern (remote memory access, RMA). The intra-model communication (communication between subdomains within each model domain) is two-sided and it is handled using a 2-D communicator that has a different color for each model. The intra-model communication system is the baseline parallelization of PALM

Data transferred from parent to child and from child to parent is always stored in the coarser parent model grid in order to minimize the amount of data transfer. This means that the interpolations and anterpolations are always performed by the child. For this purpose, children contain auxiliary arrays that follow the parent grid spacings and indexing for each prognostic variable to be coupled, covering the overlap domain plus the necessary number of ghost node layers.

Mapping between each parent and child model domain decompositions and all the necessary index mappings are determined in the initialization phase and stored so that the coupling actions during the time-stepping are straightforward and efficient.

Initial conditions for the root are set similarly to non-nested runs. The root then sends initial field data to its children, which interpolate their own initial conditions from the data received from the root. Next the first-level children send their data to their children, if any, and so on. The basic interpolation subroutines for child boundary conditions operate only on the ghost nodes behind the child model boundaries. Therefore a separate three-dimensional interpolation subroutine is implemented to generate initial fields for all the nest domains from their parent model fields. The same interpolation algorithm is used here as in the interpolations for child boundary conditions.

The data transfer between parents and children is conducted by code contained by five specific Fortran modules forming a module set called PALM model coupler (PMC). Calls to the PMC subroutines are mostly made in the PMC interface module (pmc_interface_mod.f90), such that only a small number of calls to the PMC interface subroutines are needed within the baseline PALM code. In this way, the changes to the baseline code were kept minimal. The PMC interface module also contains subroutines for the nesting-related initialization actions, interpolation, anterpolation, child mass-balance forcing, etc.

While reading the input namelists, the PALM root process checks if a namelist called “&nesting_parameters” is given in the parameter input file PARIN. If not, subroutine called pmc_init_model resets all nesting-related parameters (coupling_layout etc.) and sets MPI_COMM_WORLD as the base global MPI-communicator comm_palm. The run then continues in standard way without nesting. If the namelist “&nesting_parameters” is found and correctly input, the root process of the root model distributes this information to all other processes via MPI_COMM_WORLD. Following this, all the necessary nesting-related parameters are determined, and the base communicator is split into different colors for each model based on the model identification number. The term color here means that the communicator has the same name for all models (process groups), but they are, however, individual communicators, guaranteeing that communication of one model is not interfered with by the others. The splitting is performed by calling MPI_COMM_SPLIT. Now each model has its own process group and associated individual base communicator color, such that each model's internal communication is not visible to other models. After this the mappings between models are determined. Each model, except the root model, identifies its parent model and creates an inter-communicator between the process groups of itself and its parent model. This is realized by calling MPI_INTERCOMM_CREATE. In the same way, each model identifies all of its children, if any, and creates inter-communicators between the process groups of itself and all of its children. These inter-communicators are only used to transfer setup data between the root processes of the parent and child models. For 3D model data transfers between parent and child, a specific intra-communicator is created by merging inter-communicators of each process within the remote process groups. This is made after pmc_init_model separately for child and parent models (note that a model may be both parent and child) in subroutines pmc_childinit and pmc_parentinit by calling MPI_INTERCOMM_MERGE. After the PMC initialization, the run of each model goes as usual. A Cartesian topology-based communicator comm_2d is created by each model from the color of the base communicator comm_palm using MPI_CART_CREATE.

The internal model communication is done in the usual way, i.e., by calling the boundary exchange routines. All data transfer between parent and child models is done within the PMC interface. For this communication MPI one-sided communication (RMA) is used. An RMA window is opened on the parent side. To transfer data from parent to child, the parent fills the RMA window via a local copy. After synchronization via MPI_WIN_FENCE, the child processes can fetch the data across the network with MPI_GET. While transferring data in the opposite direction, the child first transfers the data via MPI_PUT. After another MPI_WIN_FENCE call, the parent copies the data out of the RMA window into the local model data area.

Should higher interpolation accuracy across the boundaries be sought, the following considerations are relevant. As stated by

As stated above, the quadratic

The PALM model system is freely available at

Coordination of the study was done by AH and SR. Design and implementation of the inter-model communication was done by KK. Theoretical considerations and implementation of the nesting interface was done mainly by AH, with contributions from SR, MA, MS, and BM. Simulations, post-processing, and analysis of model results were done by AH, MS, MA, BM, CK, FB, GT, and NM. Drafting of the manuscript was done by AH, MA, MS, and BM. Revision of the manuscript was done by all authors.

The authors declare that they have no conflict of interest.

Test runs with PALM have been performed at the supercomputers of the North-German Super-computing Alliance (HLRN), Germany, and CSC – IT Center for Science, Finland. We wish to thank the anonymous referees, as well as Sebastian Giersch at Leibniz Universität Hannover and Jukka-Pekka Keskinen at Finnish Meteorological Institute, for their help in improving the manuscript.

This research has been supported by the Academy of Finland (grant no. 277664), the Federal German Ministry of Education and Research (BMBF) (grant no. 01LP1601), and the Federal German Ministry of Economy and Energy (BMU) (grant no. 0325719C.).

This paper was edited by Paul Ullrich and reviewed by two anonymous referees.