This work presents a software package for the interpolation of climatological variables, such as temperature and precipitation, using kriging techniques. The purposes of the paper are (1) to present a geostatistical software that is easy to use and easy to plug in to a hydrological model; (2) to provide a practical example of an accurately designed software from the perspective of reproducible research; and (3) to demonstrate the goodness of the results of the software and so have a reliable alternative to other, more traditional tools. A total of 11 types of theoretical semivariograms and four types of kriging were implemented and gathered into Object Modeling System-compliant components. The package provides real-time optimization for semivariogram and kriging parameters. The software was tested using a year's worth of hourly temperature readings and a rain storm event (11 h) recorded in 2008 and retrieved from 97 meteorological stations in the Isarco River basin, Italy. For both the variables, good interpolation results were obtained and then compared to the results from the R package gstat.

Meteorological forcing data such as rainfall, temperature, and
solar radiation are the dominant controlling factors for the hydrological
cycle, energy balance, and ecosystem processes

Generally, kriging can be applied to a wide range of datasets

However, kriging can be computationally more demanding than other
interpolation techniques. To overcome this problem, most applications that
implement kriging interpolators use either long time series with long
time steps, such as
daily,

Based on these premises, we set ourselves two objectives with this work. The
first was to provide an efficient and precise tool for spatial estimations
and interpolation of environmental quantities. The second was to make use of
an implementing strategy that favors the usability of the software, its
maintenance, its inspection, and its extension and, hopefully, makes
scientific work easier. This second goal comes under the contemporary efforts
to promote open science (e.g.,

The Spatial Interpolation Kriging package (version 0.9.8) (GEOframe-SIK,
henceforth simply SIK) is presented here. It is a package that makes
estimates of any spatially distributed environmental data at hourly steps (or
sub-hourly when it is reasonable). SIK is designed according to the Object
Modeling System v3 (OMS3) framework

The first is used for the production of the experimental semivariograms.

The second is used for the production of the theoretical semivariograms.

The third is used for the kriging interpolation.

The last is used for an automatic and easy jackknife resampling to assess the error of estimates.

SIK inherits some previous code used, for instance, in

To make the code more flexible, easily extensible, and maintainable, SIK was
completely refactored and a systematic use of design patterns (DPs)

Several geostatistical tools have been made available to the scientific
community. Among them, PyKrige (

As well as being open source, SIK is the only Java-based and component-based software of those mentioned above. Moreover, it implements a quick way to plug in to hydrological models and automatic calibration algorithms. We decided to compare the performances of SIK and R gstat since the latter is one of the most widely used tools in the scientific community.

The present paper is organized as follows. First, some preliminary
information on kriging interpolation is given in Sect.

Kriging is a group of geostatistical techniques used to interpolate the value
of random fields based on spatial autocorrelation of measured data

Three main variants of kriging can be distinguished

simple kriging (SK), which considers the mean to be known and constant throughout the study area;

ordinary kriging (OK), which accounts for local fluctuations of the mean, limiting the stationarity to the local neighborhood (in this case the mean is unknown);

kriging with a trend model (here detrended kriging, DK), which considers that the local mean varies within the local neighborhood.

The trend can be, for example, a linear regression model between the
investigated variables and an auxiliary variable, such as elevation or slope.
According to the procedure shown in

Variants of OK and DK are local ordinary kriging (LOK) and local detrended kriging (LDK), respectively. In this case the estimate is only influenced by the measurements belonging to a neighborhood, which are usually defined either in a maximum searching radius or as a set number of stations which are closer to the interpolation point. In the LDK case, the trend is estimated locally too, and therefore it can take local trend variations into account.

The SIK package implements both OK and DK since local mean may vary
significantly over the study area and the SIK assumption about the mean could
be too strict

The workflow of the main algorithm for solving an interpolation problem with
kriging can be summarized in the following steps:

get the data from gauges,

build the empirical semivariogram,

fit a theoretical model to the semivariogram,

use the theoretical model for solving the kriging system,

produce continuous surface maps or pointwise time series of the quantity desired in any point of the domain,

calculate estimation errors.

The last step underlines that we are interested not only in estimating a
variable (temperature, rainfall intensity, or other scalars) but also in
evaluating the errors of our estimate. In addition to the spatial variable estimate,
kriging also returns a variance of the estimate. However,

Therefore, to estimate the errors produced by kriging interpolations, we
chose the leave-one-out (LOO) cross-validation technique

On the basis of the analysis of the mathematical problems and of the use cases delineated in the previous section, the design of the software was organized into four OMS3 components, the logic of which is explained below.

The component-based environmental modeling framework OMS3

In addition to minimizing couplings, the advantage of building within a modular
software framework is the production of a code that is more flexible and
easier to maintain and be inspected by third parties. Multiple algorithms can
be implemented within the same component or in various components and
inserted in the MS as alternatives. Thus, inside the same chain of tools,
different candidate solutions to the same hydrological problem can be
compared. More details on OMS3 can be found in

Figure

The distance vector, the name, and the parameters for the theoretical
semivariogram models (sill, nugget, and range), are the inputs of SIK-TV.
Particle swarm optimization (PSO)

Using the alternative connection, PSO can be connected to SIK-LOO, which
implements the iterative procedure necessary to estimate errors of
interpolation. Given

Flow chart of interpolation and validation process represented with the relative OMS components.

Each of the four OMS3 components presented in the previous section can
contain alternative solutions. For example, in SIK-TV the software design
allows for multiple theoretical variogram models, while in the SIK-K
component the four types of kriging listed in Sect.

In principle, we could have implemented a single component for every single
type of variogram and kriging but this would have exploded
the number of software modules to maintain. However, to “close the code to
modification and keep it open to extensions”

In general, DP implement rules that allow us, for instance, to separate code
parts that are going to vary from those that are going to remain the same.
The adoption of these DPs, once their rationale is understood, makes the code
easier to be read and maintained. While largely known among programmers, DPs
are not widely known in the scientific community, which has remained
largely impervious to these techniques, and just a few examples of good
practice can be found in the scientific literature (

The various theoretical semivariogram models or kriging types to be chosen at
run time were encapsulated by using the Simple Factory class

Figure

Implementation of the Java Simple Factory for the choice of the theoretical variogram model in the component SIK-TV.

The dependency inversion principle, according to which high-level modules
should not depend on low-level modules,

Figure

Flow chart of
the connection of SIK to a GEOframe-NewAGE configuration as used in

A second MS is presented in Fig.

Flow chart of the connection of SIK to the SO-ET component. In this MS, the maps of the temperature and input of SO-ET are interpolated.

Figures

Here, we delineate the practices implemented in building the SIK package for
making it a reproducible research system (RRS) (e.g.

Although the initial code (let us call it v0.1) was already available from a
control version system under a GPL v3 license
(

Code v0.1 did not include a building tool. These tools can be considered a
modern evolution of the UNIX “make” (e.g.,

Another important step in the management of the code was the implementation
of a continuous integration system (

It ensures the building and testing of the source code at each commit, forcing the good practice of preparing tests for each software module developed.

Continuous integration

Since GitHub is a repository and not an archival system, we decided to use
Zenodo (

To test the performances of the modeling solutions presented in
Figs.

Geo-location of study area and position of meteorological stations.

The catchment area is about 4200 km

Data used for the testing were provided by Provincia Autonoma di Bolzano
(local government), and collected into the Adige database
(

In the available dataset (2003–2013) we identified the year with the smallest number of missing data, which was 2008, and then we used it to test the SIK components.

A quality check was made to eliminate any outliers. Also, the spatial distribution of the no value was analyzed in order to assess the number of bins of distances in which to compute the semivariance. In fact, to reduce the number of points in the experimental semivariogram, the pairs of locations are grouped based on their distance from one another. This grouping process is known as binning. For each time step, we found that about 10 % of stations were not recording data. Therefore, since the mean number of active stations for each time step was 70–80, we decided to use eight bins. This choice was also supported by a visual inspection of the shape of the experimental semivariance, which confirmed that by using eight bins the number of stations involved were neither too low nor too high.

In order to assess the goodness of SIK performances, two applications were performed:

an interpolation of 1 year of hourly temperature data;

an interpolation of a rainfall event, also at hourly time steps.

First, the analysis of the semivariance was performed and experimental semivariograms were fitted using all 11 theoretical models. The model that gave the best fitting was then used for the interpolation of the temperature and rainfall variables using the four types of kriging. Kriging performances were assessed using the LOO cross validation. The two local cases (LOK and LDK) were performed using a fixed number of closer stations. In particular, we decided to use 10 stations for the temperature case since it was a good compromise between the distance among the stations and the mean number of recording stations for each time step. Regarding the local interpolation of precipitation, the number of closer stations was five, given the prevalently convective nature of summer precipitation and the lower number of active gauge stations for each time step. Finally, results obtained from the interpolation of the temperature dataset were compared to the results obtained with R gstat, in order to assess the differences between the two packages, their easiness of use, and their performances.

The first application of SIK components was carried out using the temperature dataset. The hourly experimental semivariograms were computed and then fitted using the 11 available theoretical models.

Figure

Fitting of the experimental semivariogram using PSO for 15 June 2008 12:00 CET.

Table

Performance results of semivariogram models used.

In order to asses the goodness of the interpolation, we performed the LOO cross validation using the optimized hourly values of sill, nugget, and range for the Bessel model, which is one of the best semivariograms according to the previous results.

Figure

For both the OK and LOK cases the performances were very poor
(NSE

A strong trend between temperature and elevation (

Monthly variation in the NSE index over the entire hourly temperature dataset using the Bessel semivariogram model.

The spatialization of temperature was performed for each pixel of the DEM,
applying the LDK and the Bessel semivariogram model.
Figure

Maps of spatialized temperature for 15 February 2008 and 15 June 2008. Two bubble plots are overlapped, which represent the RMSE between the measured and interpolated values.

The application to a rainfall dataset was made at event scale; specifically, a rainfall event of 11 h between the 29 and 30 June 2008. The event was chosen because it was the longest and most intense recorded by the highest number of stations for 2008.

Figure

Box plots of the semivariograms of the precipitation event of 29 and 30 June 2008.

The optimized values of range, nugget, and sill were then used for the four types
of kriging interpolations. Figure

Comparison among the four types of kriging and the measured rainfall.

Table

Results in terms of goodness of fit indexes between the measured and interpolated rainfall values for two stations.

The spatial interpolation of the precipitation was also performed for each pixel
of the DEM, applying the LOK and the Bessel semivariogram model.
Figure

Spatial interpolation of the precipitation applying LOK and the Bessel semivariogram model. The bubble plot of the RMSE is overlapped.

Comparison between the performances of gstat and SIK packages in the interpolation of the temperature dataset.

Comparison between the performances of gstat and SIK packages in the interpolation of the rainfall dataset.

A comparison between SIK and the R package gstat was made in order
to highlight their differences and similarities, and to justify the
deployment of an alternative software. We performed a qualitative comparison
between the two softwares accounting for design, the implemented features,
and the accuracy of the results. Benchmarks or quantitative performance
comparisons would not have been useful or completely truthful since the
“velocity” of computation (a classic quantitative comparison) depends on
too many factors, some of which are described below. Moreover, in our
opinion, the two tools that we analyzed have different purposes. This can be
seen just by looking at the features of the relative programming languages.
The gstat software is developed in C with a part of the code in R
language. It must be executed using the various R environments. SIK is
developed in Java (7) as a group of OMS components and it can be executed
within the OMS console, as a stand-alone Java program, or embedded in other
codes in languages that support Java bindings. Java is slower than third-generation languages such as C. However, in the course of Java development
several optimizations, such as “just-in-time compilation” and “adaptive
optimization”, have been introduced to improve the performance of its Java
virtual machine (JVM). These techniques identify recurrently executed
algorithms, so-called “hot spots”, and dynamically recompile them at
run time. Eventually, the hot spots gain valuable computational speed. C is
one of the fastest compiled languages. But only the computational core of
gstat is coded in C; the management of temporal steps, such as
“for-loops”, and data structures must be scripted in R. Undoubtedly, R is a
very powerful programming language, mainly because of its flat learning
curve and easy syntax and semantics, but it is fully interpreted, which
makes it very slow. As a result, the comparison of the speed of computation
for a single temporal kriging interpolation is unfair against Java since the
JVM cannot exploit its optimization tools for a single computation. Conversely, the comparison of the speed of computation for a year of hourly
kriging interpolations is biased against R because temporal steps affect
most of the computational time. In terms of functionality, gstat
computes both omnidirectional and directional semivariograms, while SIK does
not implement directional semivariograms yet (although we have included this
feature on the software wish list). Furthermore, gstat provides four
more theoretical semivariogram models with respect to SIK: Matern, Matern
with Stein's parameterizations, Wave, and Legendre. Adding the desired
theoretical model to any SIK-TV component would be easy and straightforward,
thanks to the DP implemented, as shown in Fig.

Figure

Figure

In conclusion, gstat is a powerful, flexible tool to obtain fast results with fast scripting in answer to single, specific questions (with some implementation efforts user-side); SIK is a tool that is ready to be integrated into broader MSs, specifically because of its OMS-compliant design. The interpolations of both temperature and rainfall confirm the quality and accuracy of the predictions obtained using the SIK package, demonstrating that it is a good competitor of R gstat.

This paper presents a new modeling package for the spatial interpolation of environmental variables. It includes 11 theoretical semivariogram models and four types of kriging interpolations. To test the performance of the SIK package, two applications were performed: the interpolation of 1 year of temperatures and the interpolation of a rainfall event. Data were retrieved from a dataset of 97 stations located in the Isarco Valley in Italy and the resolution of the interpolation grid data was 100 m.

Several characteristics make the SIK package a good competitor tool among those available in the literature. From the user
perspective,

it can be used as a stand-alone;

it can be plugged in to the hydrological modeling system GEOframe-NewAGE;

it can be used with all OMS-compliant components, such as calibration tools for the optimization of the parameters;

it includes a tool for the automatic estimation of errors;

its results are presented in data formats that can be visualized directly by GIS;

a variety of MSs can be obtained, according to user needs;

it is faster than gstat in everyday use, under certain conditions.

From the programmer perspective, the implementation of DP makes the package easy to maintain and suitable for future improvements. All the elements are close to modification and open to extension. Further developments of the package are easy and straightforward. Examples of such developments might include integrating new types of kriging, implementing a different selection method of the gauge stations, and the addition of nonlinear relationships between the interpolated variable and an auxiliary variable.

The interpolations of both the temperature and the rainfall gave very good results, with a high agreement between the measured and the interpolated variables. The tests also show how it is possible to choose between 11 variograms and four kriging alternatives and to compare the outcomes easily. Conversely, the single rainfall event did not show trend with elevation.

In comparison with gstat, the SIK package proved to be a good alternative, regarding both the easiness of use and the accuracy of the interpolation.

An OSF project with all the components
needed to reproduce the results shown in this paper has been created and is
available at the following link:

Kriging is a group of geostatistical techniques used to
interpolate the value of random fields based on spatial autocorrelation of
measured data,

As shown in various textbooks, e.g.,

If isotropy of the spatial statistics of the quantity analyzed is assumed,
then the semivariogram is given by (e.g.,

Once B has been determined, the system (

Coefficient of determination

The coefficient of determination,

Nash–Sutcliffe efficiency

The Nash–Sutcliffe efficiency (NSE) is a normalized model efficiency
coefficient. It determines the relative magnitude of the residual variance
compared to the measured data variance

where

Percent bias

Percent bias (PBIAS) measures the average tendency of the simulated values to be larger or smaller than the corresponding measured ones. The optimal value of PBIAS is 0, with small values indicating accurate model simulation. Positive values indicate overestimation bias, while negative values indicate model underestimation bias.

where

Root-mean-square error

The root-mean-square error (RMSE) is given by

where

Using

Bessel semivariogram

Circular semivariogram

Exponential semivariogram

Gaussian semivariogram

Hole semivariogram

Linear semivariogram

Logarithmic semivariogram

Pentaspherical semivariogram

Periodic semivariogram

Power semivariogram

Spherical semivariogram

MB, GF, and FS developed the model code integrated in the GEOframe-SIK package. MB, FS, MB, and WA designed the experiments and performed the simulations. RR planned the research, coordinated and supervised all its phases, and provided the financial support with his funding. MB prepared the paper with contributions from all co-authors.

The authors declare that they have no conflict of interest.

The authors acknowledge Trento University project CLIMAWARE
(