An N -dimensional Fortran interpolation programme (NterGeo.v2020a) for geophysics sciences – application to a back-trajectory programme (Backplumes.v2020r1) using CHIMERE or WRF outputs

An interpolation programme coded in Fortran for irregular N -dimensional cases is presented and freely available. The need for interpolation procedures over irregular meshes or matrixes with interdependent input data dimensions is frequent in geophysical models. Also, these models often embed look-up tables of physics or chemistry modules. Fortran is a fast and powerful language and is highly portable. It is easy to interface models written in Fortran with each other. Our programme does not need any libraries; it is written in standard Fortran and tested with two usual compilers. The programme is fast and competitive compared to current Python libraries. A normalization option parameter is provided when considering different types of units on each dimension. Some tests and examples are provided and available in the code package. Moreover, a geophysical application embedding this interpolation programme is provided and discussed; it consists in determining back trajectories using chemistry-transport or mesoscale meteorological model outputs, respectively, from the widely used CHIMERE and Weather Research and Forecasting (WRF) models.


Introduction
Interpolation is commonly used in geophysical sciences for post-treatment processing to evaluate model performance against ground station observations. The NetCDF Operators (NCO) library (Zender, 2008) is commonly used in its recent version (v4.9.2) for horizontal and vertical interpolations to manage climate model outputs. The most frequent need is to interpolate in 3-D spatial dimension and time, and therefore in four dimensions. Fortran is extensively used for atmosphere modelling software (Sun and Grimmond, 2019;e.g. the Weather Research and Forecasting model -WRF, Skamarock et al., 2008; the Geophysical Fluid Dynamics Laboratory atmospheric component version 3 -GFDL AM3, Donner et al., 2011). More generally, geophysical models can use look-up tables of complex modules instead of a full coupling strategy between these modules, which is the case of the CHIMERE model (Mailler et al., 2017) with the embedded ISORROPIA module dealing with chemistry and thermodynamics (Nenes et al., 1998(Nenes et al., , 1999. In such a case, the look-up table can easily exceed five dimensions to approximate the model. In parallel, artificial intelligence methods are developed and can explore the behaviour of complex model outputs that requires fast interpolation methods. While more recent modern languages like Python are used in the scientific community, Fortran remains widely used in the geophysics and engineering community and is known as one of the faster languages in time execution, performing well on array handling, parallelization and, above all, portability. Some benchmarks are available on website to evaluate the Published by Copernicus Publications on behalf of the European Geosciences Union. 92 B. Bessagnet et al.: N-dimensional interpolator performance of languages on simple to complex operations (Kouatchou, 2018).
The parameterization techniques proposed to manage aerosol-droplet microphysical schemes (Rap et al., 2009) can employ either the modified Shepard interpolation method (Shepard, 1968) or the Hardy multiquadric interpolation method (Hardy, 1971(Hardy, , 1990, and the numerical results obtained show that both methods provide realistic results for a wide range of aerosol mass loadings. For the climate community, a comparison of six methods for the interpolation of daily European climate data is proposed by Hofstra et al. (2008); some of these methods use kriging-like methods with the capability to use co-predictors like the topography.
A Python procedure called scipy.interpolate.griddata is freely available (Scipy, 2014). Unfortunately, this programme is not really adapted to our problem; it could be not enough optimized for our objective as it can manage fully unstructured datasets. The goal of this paper is to present a programme to interpolate in a grid or a matrix which can be irregular (varying intervals) but structured, with the possibility to have interdependent dimensions (e.g. longitude interval edges which depend on longitude, latitude, altitude and time). We think this type of programme can be easily implemented within models or to manage model outputs for posttreatment issues. In short, the novelty of this programme is to fill the gap of interpolation issues between the treatment of very complex unstructured meshes and simple regular grids for a general dimension N.
In order to quantify the impact of such a new interpolation programme and show examples of its use, it is implemented in the Backplumes back-trajectory model, developed by the same team as the CHIMERE model (Mailler et al., 2017). This host model is well fit for this implementation, because the most important part of its calculation is an interpolation of a point in a model grid box. This paper describes (i) the methodology and the content of the interpolation programme package NterGeo and (ii) an application of this programme embedded in the new back-trajectory programme, Backplumes. These two codes are freely available (see code availability section).

Development of the interpolation programme
The NterGeo programme is fit for exploring irregular but structured grids or look-up tables defined by a unique size for each dimension, which of course can be different from one to another dimension. The space intervals can vary along a dimension and the grid interval edges in each dimension can depend on other dimensions. Two versions have been developed: (i) a version for "regular" arrays with independent dimensions and (ii) a "general" version for possible interdependent dimensions, e.g. to handle 3-D meshes which have time-varying spatial coordinates. The code does not need any libraries and is written in standard Fortran. Our interpolation code was tested with gfortran (GNU Fortran project) and ifort (Intel). Since our programme does not include specific options and is not function compiler dependent, there is no reason to have limitations or errors with other compilers. The top shell calling script in the package provides two sets of options for "production" and "debugging" modes. Assuming the X array, the result of the function f transforming X to Y array in R can be expressed as Y (x 1 , . . ., x N ) = f (X(x 1 , . . ., x N )). (1) N is the dimension of the array, and x i is the coordinates at dimension i ∈ [1, N ] of the point X that we want to interpolate.

The programme for regular grids
A programme (interpolation_regular.F90) for regular grids (i.e. with independent dimensions) is available.
To handle this type of grid, a classical multilinear interpolation is performed. Figure 1 shows the variables for N = 3 defined hereafter in the section. For the particular case of a regular grid with independent dimensions, the resultỸ of the multilinear interpolation of the 2 N identified neighbours can be expressed as B.  with δ i the binary digit equal to 0 or 1, and the weights w δ i i for i ∈ [1, N ] defined as Variable i is the list of interval edges on each dimension i and does not depend on other dimensions. θ δ i i indicates the bottom (δ i = 0) and top (δ i = 1) edges on each dimension i ∈ [1. . .N ] so that x i ∈]θ 0 i , θ 1 i ]. Y k is a one-dimensional array with 2 N elements storing the value Y of the function at the identified neighbours on each dimension: with k ∈ [0, 2 N − 1]. The tuple (δ N . . .δ i . . .δ 1 ) is the binary transformation of integer k defined as N −1 i=0 (δ i × 2 i ). The coefficients k = w δ N N . . .w δ i i . . .w δ 1 1 as a product of weighting factors on each direction can be seen as a binary suite that is convenient to handle in a compacted and optimized Fortran programming strategy for the regular grid version of the code (Appendix B).

The general programme
Considering the general programme called interpolation_general.F90, the coordinates of edge points are stored in a one-dimensional array of n = N i=1 I i elements with I i the number of edges on each dimension i. The tuple of coordinates (j 1 , . . ., j N ) of an interval edge θ i k , with j i the indexed coordinate on dimension i, is transformed in a one-dimensional array indexed on k ∈ [1, n] by with I 0 = 1 for initialization. Once the nearest neighbour is found, the resultỸ of the interpolation is a weighting procedure of the 2 N closest vertex using a Shepard interpolation (Shepard, 1968) based on the inverse distance calculations: with Y k = f (ϒ k ) the value of the function f at neighbour ϒ k of coordinates (θ 1 k , . . ., θ N k ) and The distance d k between the point of interest of coordinates (x 1 , . . ., x N ) to the neighbour k ∈ [1, n] is calculated as The previous formulas are valid for d k = 0; in the case of d k = 0, the procedure stops and exits, returning the exact value of the corresponding data of the nearest neighbour. For a distorted mesh or matrix, or dimensions with different units (e.g. mixing time with length), a hard-coded option (norm=.true. or .false.) is also available to normalize the intervals with an average interval i value for the calculation of distances, so that 3 Computation strategy for the general programme The list of input/output arguments is provided in Appendix C. In the main programme, calling the subroutine the key point is to transform first the N-dimension matrix into a 1-D array. An example of a main programme calling the subroutine is provided in the code package. The computation strategy in the subroutine can be broken down into the sequential steps as follows: i. Find the nearest neighbour of the input data by minimizing a distance with a simple incremental method stepping every ±1 coordinates on each dimension (detailed later in this section).
ii. Scan the surroundings of the nearest point within the matrix on ±1 step on each dimension and store the corresponding block of input data to be tested. The size of the block is therefore (1 + 2 × 1) N but can be extended to (1 + 2 × 2) N if we increase the scanning process to ±2 on each dimension (hard-coded option iconf of 1 or 2 in the declaration block).
iii. Calculate the distance to the previously selected input data. A p-distance concept is adopted (hard-coded option pnum in the declaration block). The pnum value p should be greater than or equal to 1 to verify the Minkowski inequality and be considered as a metric.
iv. Sort the previous block of data in ascending order and stop the sorting process when the first 2 N point is selected. The code offers the possibility to use only the first N + 1 neighbour (hard-coded option neighb in the declaration block) that is sufficient and faster in most cases.
v. Calculate the weights and then the final result.
The first step consisting in finding the first neighbour is the trickiest and is broken down into several steps. Figure 2 displays an example in 2-D of the step-by-step procedure to find the nearest neighbour.
i. The procedure initializes the process starting from the first point of the input data grid or taken from the last closest point if given in an argument as a non-null value.
ii. A delta of coordinates is applied based on an average delta on each dimension to improve the initialization. This computation step of delta is externalized as it can be time consuming and should be done once for all target points at which we want to interpolate.
iii. A test between the target value and the input data grid point coordinates determines the ±1 steps to add on each dimension (see Fig. 2 for an example in 2-D).
iv. If the grid point falls on the edges or outside the borders, the closest coordinates within the matrix are selected.
v. A test on the p-distance computation between the running point and the target is performed so that if the distance calculated at iteration N it is equal to the distance at iteration N it − 2 the closest point is found.
vi. If the distance is larger than the characteristic distance of the cell, the point is considered to be outside the borders of the input data grid. Therefore, the code allows a slight extrapolation if the target point is not too far from the borders.
vii. At this stage, the procedure can stop if the distance to the closest vertex is 0, returning to the main programme with the exact value of the input data grid.

Visual example in 2-D for a regular grid
As an example to visualize the capacity of the general programme, the 2-D function used in Scipy (2014) is used to test our procedure. The function is Our input data grid is a regular grid with regular intervals of 0.02 from 0 to 1 for x 1 and x 2 with therefore 51 points on each dimension. We propose to interpolate on a finer regular grid with n = 100 × 100, 200 × 200 and 300 × 300 points on each dimension. For these three interpolation cases, a normalized mean square error (NMSE) of the resultỸ j for the full grid point number j can be calculated against the true value Y j of the function as with Y j the mean value Y j as 1 n n j =1 Y j . For the three cases, the CPU time for the interpolation is evaluated and displayed in Table 1 for Machine 1 (Appendix E). As expected, the time consumption is obviously proportional to the number of points in which to interpolate. Figure 3 displays the evolution of the NMSE with the parameter p of the p-distance definition. There is a discontinuity of the NMSE from p = 1 to p = 1 + with a slight increase with p in an asymptotic way (Fig. 4). The NMSE decreases with the number of points but a slight increase is observed from 200 × 200 to 300 × 300.

Example in 5-D for a regular grid
Still using the general programme, an example in 5-D (N = 5) is proposed using the function with x 1 , x 2 , x 3 , x 4 , x 5 ∈ [0, 1]. The input data grid is a regular grid of I i = 35 interval edges on each dimension i ∈ [1, 5] with 35 5 = 52 521 875 grid points. The goal is to find the results on a coarse grid of nine elements on each dimension with 9 5 = 59 049 grid points. This case is an opportunity to test the influence of the number of neighbours in calculating the result. In our case, the parameter p of the p distance is set to p = 2. The interpolation seems to provide better performance of the NMSE for our function with less neighbours (case N + 1) and obviously with a lower CPU time (Table 2). This could certainly depend on the type of function to interpolate.
Another test with the 5-D case is performed to test the influence of the normalization as defined in Eq. (9) (flag norm) by defining an irregular grid still with 35 5 = 52 521 875 input data points but with (i) random intervals values and (ii) one dimension depending on another. The definition of the input grid is defined in Appendix D and provided in the code package. With a similar order of magnitude of consumed CPU time, the normalization norm=.True. produces a NMSE of 0.499 % compared to the NMSE of 0.822 % for norm=.False. There is then an added value of using such a normalization with comparable CPU time consumption (rising from 2.68 to 3.44 s for our case).

Comparison with Python for a regular grid
The code has been tested against the Python procedure scipy.interpolate.griddata, freely available from Scipy (2014), for the following function: The input data grid is a regular grid of I i = 35 interval edges on each dimension i ∈ [1, 5] with 35 3 = 42 875 grid points. The goal is to find the results on a coarse grid of nine elements on each dimension with 9 3 = 729 grid points. A case in 3-D has been used for this test because the Python library was not able to work with very large datasets (overflow error), while our programme could work perfectly. Here, scipy.interpolate.griddata is used with the bilinear interpolation option, while our method is configured with p = 2. Table 3 clearly shows how the Fortran code is faster compared to the Python library. However, the bilinear interpolation method seems to provide a higher accuracy than the inverse distance method embedded in our programme. Nevertheless, the error produced by our method looks acceptable.   An advantage of Backplumes for the WRF and CHIMERE users is that the code is dedicated to directly read output results of these models. Being developed by the CHIMERE developer teams, the code is completely homogeneous with CHIMERE in terms of numerical libraries. Another advantage is that the code is very fast and calculates hundreds of trajectories in a few minutes. Using the wind fields of WRF or CHIMERE, and running on the same grid, the results of back trajectories are fully consistent with the simulations done by the models.
Backplumes is dedicated to calculate transport but not chemistry: only passive air particles (or tracers) are released. But a distinction could be made between gaseous or particulate tracers: for the latter, a settling velocity is calculated to have a more realistic trajectory. The model is easy to use and light because a small set of meteorological parameters is required. These meteorological parameters are described in Table G1 for WRF and CHIMERE.
The first step of the calculation is to choose a target location as a starting point. The user must select a date, longitude, latitude and altitude, obviously included in the mod-elled domain and during the modelled period. From this starting point, the model will calculate trajectories back in time. The number of trajectories is up to the user and may vary from one to several hundreds of tracers.
At each time step and for each trajectory, the position of the air mass is estimated by subtracting its pathway travelled as longitude λ, latitude φ and altitude z to the current position. To do so, all necessary variables are interpolated with the NterGeo.v2020a interpolation programme described in the previous section. The calculation is described in Appendix G.
In order to respect the Courant-Friedrichs-Lewy (CFL) number, a sub-time step may be calculated. If the input data are provided hourly (as in many regional models), the meteorological variables are interpolated between the two consecutive hours to obtain refined input data.
The goal of Backplumes is to estimate all possible back trajectories. Then, starting from one unique point, it is necessary to add a pseudo-turbulence in the calculation of the altitude. Depending on the vertical position of the tracer, several hypotheses are made. Two parameters are checked for each tracer and each time step: (i) the boundary layer height enables us to know if the tracer is in the boundary layer or above in the free troposphere, (ii) the surface sensible heat fluxes enable to know if the atmosphere is stable or unstable.
When the tracer is diagnosed in the boundary layer, there are two cases: the boundary layer is stable or unstable. If the boundary layer is stable, Q 0 < 0, the tracer stays in the boundary layer at the same altitude. The new vertical position of the tracer is If the boundary layer is unstable, Q 0 > 0, the tracer is considered in the convective boundary layer and may be located at every level in this boundary layer for the time before this. Therefore, a random function is applied to reproduce a potential vertical mixing.
The random function "Rand" calculates a coefficient between 0 and 1 to represent stochastic vertical transport of the tracer.
It is considered that 15 mn is representative of a wellmixed convective layer (Stull, 1988). If the time step is larger than 15 mn, the random function is applied. But if the time step is less than 15 mn, the vertical mixing is reduced to the vicinity of the current position of the tracer. In this case, we have where z = 1 2 z k−1 t + z k+1 t and k is the vertical model level corresponding to z t .
In the free troposphere, the evolution of the tracer is considered to be influenced by the vertical wind component. A random function is applied to estimate its possible vertical motion with values between 0 and w/2 m s −1 , representative of all possible values of vertical wind speed in the troposphere, Stull (1988). The vertical variability of the tracer's position in the free troposphere is calculated by diagnosing the vertical velocity as

Examples of back-trajectory computations
An example is presented for the same case and the WRF and CHIMERE models. The difference between the two models is the number of vertical levels (35 for WRF and 20 for CHIMERE, from the surface to 200 hPa). The online modelling system WRF-CHIMERE is used, meaning that the horizontal grid is the same (a large domain including Europe and Africa and with x = y = 60 km). The wind field is the same for both models: CHIMERE directly uses the wind field calculated by WRF. The boundary layer height is different between the two models, with WRF using the Hong et al. (2006) schemes and CHIMERE using the Troen and Mahrt (1986) scheme. The surface sensible heat flux is the same between the two models, with CHIMERE using the flux calculated by WRF. WRF has more vertical model levels than CHIMERE; thus, meteorological fields are interpolated from WRF to CHIMERE. It impacts the horizontal and vertical wind fields. Figure 5 presents the results of back trajectories launched on 10 August 2013 at 12:00 UTC. The location is at longitude 10 • E and latitude 25 • N, with an altitude of 0 m a.g.l. This location is of no scientific interest but is in the middle of the domain, in order to have the longer trajectories. The complete duration of trajectories represents 10 d back in time. Overall, 120 trajectories are launched at the same position and time. They are randomly mixed when they are in the boundary layer to represent the mixing and the diffusion.
The most important part of the plume comes from the north of the starting point. For this main plume, the calculation is similar between the two models. Another large part of Backplumes is modelled at the east of the starting point. However, this fraction is mainly modelled with WRF but not with CHIMERE, where only a few trajectories are diagnosed. One possible explanation may be found by analysing the vertical transport of the trajectories. Figure 6 presents all plumes displayed in the previous figure but projected along the same time-altitude axis. The differences between the two Backplumes results are mainly due to the calculation of the boundary layer height. When WRF Figure 5. Back trajectories calculated using CHIMERE and WRF modelled meteorological fields. The starting point is at longitude 10 • E and latitude 25 • N, with an altitude of 0 m a.g.l. on 10 August 2013 at 12:00 UTC. It corresponds to a case studied during the Chemistry-Aerosol Mediterranean Experiment (ChArMEx) campaign (Menut et al., 2015).
diagnoses an altitude of ≈ 3000 m, CHIMERE diagnoses ≈ 2000 m, leading to different direction and wind speed. Then, this implies a split of the plumes with WRF but not with CHIMERE. This illustrates the sensitivity of the result to the driver model. But, in both cases, the answer in our case is clearly that the main contributions of the air masses located at the starting point are mainly coming from the north-east, crossing Tunisia and then the Mediterranean Sea and Europe. The main difference between the two calculations is the eastern part of the plume, which is more intense with WRF than CHIMERE.

Conclusions
A new interpolation programme written in Fortran has been developed to interpolate on N -dimensional matrices. It has been evaluated for several dimension cases up to N = 5. The code is fast compared to similar Python routines and highly portable in existing geophysical codes. The interpolation programme works for any dimension N above 2 and is designed to work with irregular but structured grids (characterized by a size for each dimension) or look-up tables. Already used in its "regular" version in CHIMERE, the "general" programme has been tested on a new real application which calculates air mass back trajectories from two widely used atmospheric models: CHIMERE and WRF. This interpolation programme can be used for any application in geophysics and engineering sciences and also to explore large structured matrices.  Code availability. The current versions of the models are freely available. The exact version of the model used to produce the results used in this paper is archived on Zenodo for NterGeo at https://doi.org/10.5281/zenodo.3733278 (Bessagnet, 2020) under the GNU General Public License v3.0 or later, as are input data and scripts to run the model and produce the plots for all the simulations presented in this paper. The Backplumes model is an opensource code and is available on the CHIMERE model website: https: //www.lmd.polytechnique.fr/~menut/backplumes.php (last access: 21 December 2020, Menut, 2020).
Author contributions. BB developed the code. LM and BB codeveloped and implemented the code in the Backplumes.v2020r1 model. MB supported BB for the developments.
Competing interests. The authors declare that they have no conflict of interest.
Financial support. This research was funded by the DGA (French Directorate General of Armaments; grant no. 2018 60 0074) in the framework of the NETDESA project.
Review statement. This paper was edited by Juan Antonio Añel and reviewed by two anonymous referees.