Multiple-point geostatistics enable the realistic simulation of complex
spatial structures by inferring statistics from a training image. These
methods are typically computationally expensive and require complex
algorithmic parametrizations. The approach that is presented in this paper
is easier to use than existing algorithms, as it requires few independent
algorithmic parameters. It is natively designed for handling continuous
variables and quickly implemented by capitalizing on standard libraries.
The algorithm can handle incomplete training images of any dimensionality,
with categorical and/or continuous variables, and stationarity is not
explicitly required. It is possible to perform unconditional or conditional
simulations, even with exhaustively informed covariates. The method provides
new degrees of freedom by allowing kernel weighting for pattern matching.
Computationally, it is adapted to modern architectures and runs in constant
time. The approach is benchmarked against a state-of-the-art method. An
efficient open-source implementation of the algorithm is released and can be
found here (
The highlights are the following:
A new approach is proposed for pixel-based multiple-point geostatistics
simulation. The method is flexible and straightforward to parametrize. It natively handles continuous and multivariate simulations. It has high computational performance with predictable simulation times. A free and open-source implementation is provided.
Geostatistics is used widely to generate stochastic random fields for modeling and characterizing spatial phenomena such as Earth's surface features and geological structures. Commonly used methods, such as the sequential Gaussian simulation (Gómez-Hernández and Journel, 1993) and turning bands algorithms (Matheron, 1973), are based on kriging (e.g., Graeler et al., 2016; Li and Heap, 2014; Tadić et al., 2017, 2015). This family of approaches implies spatial relations using exclusively pairs of points and expresses these relations using covariance functions. In the last 2 decades, multiple-point statistics (MPS) emerged as a method for representing more complex structures using high-order nonparametric statistics (Guardiano and Srivastava, 1993). To do so, MPS algorithms rely on training images, which are images with similar characteristics to the modeled area. Over the last decade, MPS has been used for stochastic simulation of random fields in a variety of domains such as geological modeling (e.g., Barfod et al., 2018; Strebelle et al., 2002), remote-sensing data processing (e.g., Gravey et al., 2019; Yin et al., 2017), stochastic weather generation (e.g., Oriani et al., 2017; Wojcik et al., 2009), geomorphological classification (e.g., Vannametee et al., 2014), and climate model downscaling (a domain that has typically been the realm of kriging-based methods; e.g., Bancheri et al., 2018; Jha et al., 2015; Latombe et al., 2018).
In the world of MPS simulations, one can distinguish two types of approaches. The first category is the patch-based methods, where complete patches of the training image are imported into the simulation. This category includes methods such as SIMPAT (Arpat and Caers, 2007) and DISPAT (Honarkhah and Caers, 2010), which are based on building databases of patterns, and image quilting (Mahmud et al., 2014), which uses an overlap area to identify patch candidates, which are subsequently assembled using an optimal cut. CCSIM (Tahmasebi et al., 2012) uses cross-correlation to rapidly identify optimal candidates. More recently, Li (2016) proposed a solution that uses graph cuts to find an optimal cut between patches, which has the advantage of operating easily, efficiently, and independently of the dimensionality of the problem. Tahmasebi (2017) proposed a solution that is based on “warping” in which the new patch is distorted to match the previously simulated areas. For a multivariate simulation with an informed variable, Hoffimann et al. (2017) presented an approach for selecting a good candidate based on the mismatch of the primary variable and on the mismatch rank of the candidate patches for auxiliary variables. Although patch-based approaches are recognized to be fast, they are typically difficult to use in the presence of dense conditioning data. Furthermore, patch-based approaches often suffer from a lack of variability due to the pasting of large areas of the training image, which is a phenomenon that is called verbatim copy. Verbatim copy (Mariethoz and Caers, 2014) refers to the phenomenon whereby the neighbor of a pixel in the simulation is the neighbor in the training image. This results in large parts of the simulation that are identical to the training image.
The second category of MPS simulation algorithms consists of pixel-based algorithms, which import a single pixel at the time instead of full patches. These methods are typically slower than patch-based methods. However, they do not require a procedure for the fusion of patches, such as an optimal cut, and they allow for more flexibility in handling conditioning data. Furthermore, in contrast to patch-based methods, pixel-based approaches rarely produce artefacts when dealing with complex structures. The first pixel-based MPS simulation algorithm was ENESIM, which was proposed by Guardiano and Srivastava (1993), where for a given categorical neighborhood – usually small – all possible matches in the training image are searched. The conditional distribution of the pixel to be simulated is estimated based on all matches, from which a value is sampled. This approach could originally handle only a few neighbors and a relatively small training image; otherwise, the computational cost would become prohibitive and the number of samples insufficient for estimating the conditional distribution. Inspired by research in computer graphics, where similar techniques are developed for texture synthesis (Mariethoz and Lefebvre, 2014), an important advance was the development of SNESIM (Strebelle, 2002), which proposes storing in advance all possible conditional distributions in a tree structure and using a multigrid simulation path to handle large structures. With IMPALA, Straubhaar et al. (2011) proposed reducing the memory cost by storing information in lists rather than in trees. Another approach is direct sampling (DS) (Mariethoz et al., 2010), where the estimation and the sampling of the conditional probability distribution are bypassed by sampling directly in the training image, which incurs a very low memory cost. DS enabled the first use of pixel-based simulations with continuous variables. DS can use any distance formulation between two patterns; hence, it is well suited for handling various types of variables and multivariate simulations.
In addition to its advantages, DS has several shortcomings: DS requires a threshold – which is specified by the user – that enables the algorithm to differentiate good candidate pixels in the training image from bad ones based on a predefined distance function. This threshold can be highly sensitive and difficult to determine and often dramatically affects the computation time. This results in unpredictable computation times, as demonstrated by Meerschman et al. (2013). DS is based on the strategy of randomly searching the training image until a good candidate is identified (Shannon, 1948). This strategy is an advantage of DS; however, it can also be seen as a weakness in the context of modern computer architectures. Indeed, random memory access and high conditionality can cause (1) suboptimal use of the instruction pipeline, (2) poor memory prefetch, (3) substantial reduction of the useful memory bandwidth, and (4) impossibility of using vectorization (Shen and Lipasti, 2013). While the first two problems can be addressed with modern compilers and pseudorandom sequences, the last two are inherent to the current memory and CPU construction.
This paper presents a new and flexible pixel-based simulation approach,
namely QuickSampling (QS), which makes efficient use of modern hardware.
Our method takes advantage of the possibility of decomposing the standard
distance metrics that are used in MPS (
The remainder of this paper is structured as follows: Sect. 2 presents the proposed algorithm with an introduction to the general method of sequential simulation, the mismatch measurement using FFTs, and the sampling approach of using partial sorting followed by methodological and implementation optimizations. Section 3 evaluates the approach in terms of quantitative and qualitative metrics via simulations and conducts benchmark tests against DS, which is the only other available approach that can handle continuous pixel-based simulations. Section 4 discusses the strengths and weaknesses of QS and provides guidelines. Finally, guidelines and the conclusions of this work are presented in Sect. 5.
We recall the main structure of pixel-based MPS simulation algorithms (Mariethoz and Caers, 2014, p. 156), which is summarized and adapted for QS in the pseudocode given below. The key difference between existing approaches is in lines 3 and 4 of Fig. 1, when candidate patterns are selected. This is the most time-consuming task in many MPS algorithms, and we focus only on computing it in a way that reduces its cost and minimizes the parametrization. The pseudocode for the QS algorithm is given here.
Inputs:
The choice of pattern metric:
find the neighborhood compute the mismatch map between select a good candidate usind quantile sorting over the mismatch map (Sect. 2.4) asisgn the valule of the selected candidate to
Distance-based MPS approaches are based on pattern matching
(Mariethoz and Lefebvre, 2014). Here, we rely on the observation
that many common matching metrics can be expressed as weighted sums of the
pixelwise mismatch
For the simulation of continuous variables, the most commonly used mismatch
metric is the
Using Eq. (1), this
For multivariate pixels, such as a combination of categorical and continuous
values, the mismatch
The approach that is proposed in this work is based on computing a mismatch map in the training image (TI) for each simulated pixel. The mismatch map is a grid that represents the patternwise mismatch for each location of the training image and enables the fast identification of a good candidate, as shown by the red circle in Fig. 1.
Example of a mismatch map for an incomplete pattern. Blue represents good matches, yellow bad matches, and purple missing and unusable (border effect) data. The red circle highlights the minimum of the mismatch map, which corresponds to the location of the best candidate.
If we consider the neighborhood
Some lags may not correspond to a value, for example, due to edge effects in
the considered images or because the patterns are incomplete. Missing
patterns are inevitable during the course of a simulation using a sequential
path. Furthermore, in many instances, there can be missing areas in the
training image. This is addressed by creating an indicator variable to be
used as a mask, which equals
Then, Eq. (5) can be expressed as follows:
Combining Eqs. (4) and (7), we get
Finally, with
Let us consider the general case in which only some variables are informed
and the weighting can vary for each variable. Equation (10) can be extended for this case by defining separate masks and weights
Equation (11) can be expressed using the convolution
theorem applied to cross-correlation:
By linearity of the Fourier transform, the summation can be performed in
Fourier space, thereby reducing the number of transformations:
Equation (13) is appropriate for modern computers,
which are well suited for computing FFTs (Cooley et al., 1965; Gauss,
1799). Currently, FFTs are well implemented in highly optimized libraries
(Rodríguez, 2002). Equation (13) is the
expression that is used in our QS implementation, because it reduces the
number of Fourier transforms, which are the most computationally expensive
operations of the algorithm. One issue with the use of FFTs is that the
image
The computation of the mismatch map (Eq. 13) is
deterministic; as a result, it incurs a constant computational cost that is
independent of the pixel values. Additionally,
Eq. (13) is expressed without any constraints on
the dimensionality. Therefore, it is possible to use the
The second contribution of this work is the
In detail, the candidate selection procedure is as follows: All possible
candidates are ranked according to their mismatch, and one candidate is
randomly sampled among the
Illustration of the
An alternative sampling strategy for reducing the simulation time is presented in Appendix A3. However, this strategy can result in a reduction in the simulation quality.
The value of non-integer
In many applications, spatially exhaustive TIs are available. In such cases, the equations above can be simplified by dropping constant terms from Eq. (1), thereby resulting in a simplified form for Eq. (13). Here, we take advantage of the ranking to know that a constant term will not affect the result.
As in Tahmasebi et al. (2012), in the
An efficient implementation of QS was achieved by (1) performing precomputations, (2) implementing an optimal partial sorting algorithm for selecting candidates, and (3) optimal coding and compilation. These are described below.
According to Eq. (13),
Parameters that were used for the simulations in Fig. 3. Times are specified for simulations without parallelization. MVs represents multivariates.
Hardware that was used in the experiments.
In the QS algorithm, a substantial part of the computation cost is incurred
in identifying the
Examples of unconditional continuous and categorical simulations in 2D and 3D and their variograms. The first column shows the training images that were used, the second column one realization, and the third column quantitative quality metrics. MV v1, MV v2, and MV v3 represent a multivariate training image (and the corresponding simulation) using three variables. The first two metrics are scatter plots of MV v1 vs. MV v2 of the training image and the simulation, respectively. The third metric represents the reproduction of the variogram for each of MV v1, MV v2, and MV v3.
Due to the intensive memory access by repeatedly scanning large training
images, interpreted programming languages, such as MATLAB and Python, are
inefficient for a QS implementation and, in particular, for a parallelized
implementation. We provide a NUMA-aware (Blagodurov et al., 2010) and
flexible C/C
The FFTw library (Frigo and Johnson, 2018) provides a flexible and
performant architecture-independent framework for computing
This section presents illustrative examples for continuous and categorical case studies in 2D and in 3D. Additional tests are reported in Appendix A4. The parameters that are used for the simulations of Fig. 3 are reported in Table 1.
The results show that simulation results are consistent with what is typically observed with state-of-the-art MPS algorithms. While simulations can accurately reproduce TI properties for relatively standard examples with repetitive structures (e.g., MV, Strebelle, and Folds), training images with long-range features (typically larger than the size of the TI) are more difficult to reproduce, such as in the Berea example. For multivariate simulations, the reproduction of the joint distribution is satisfactory, as observed in the scatter plots (Fig. 3). More examples are available in Appendix A4, in particular Fig. A2 for 2D examples and Fig. A3 for 3D examples.
QS simulations are benchmarked against DS using the “Stone” training image
(Fig. 4). The settings that are used for DS are
based on optimal parameters that were obtained via the approach of Baninajar
et al. (2019), which uses stochastic optimization to find optimal
parameters. In DS, we use a fraction of scanned TI of
Training image that was used for benchmarking and sensitivity analysis.
The comparison is based on qualitative (Fig. 5) and quantitative (Fig. 6) metrics, which include directional and omnidirectional variograms, along with the connectivity function, the Euler characteristic (Renard and Allard, 2013), and cumulants (Dimitrakopoulos et al., 2010). The connectivity represents the probability for two random pixels to be in the same connected component. This metric is suited to detect broken structures. The Euler characteristic represents the number of objects subtracted by the number of holes of the objects, and is particularly adapted to detect noise in the simulations such as salt and pepper. Cumulants are high-order statistics and therefore allow for considering the relative positions between elements. The results demonstrate that the simulations are of a quality that is comparable to DS. With extreme settings (highest pattern reproduction regardless of the computation time), both algorithms perform similarly, which is reasonable since both are based on sequential simulation and both directly import data from the training image. The extra noise present in the simulation is shown in the Euler characteristic. Furthermore, it demonstrates that the use of a kernel can reduce this noise to get better simulations.
With QS, kernel weighting allows for fine tuning of the parametrization to
improve the results, as shown in Fig. 8. In this
paper, we use an exponential kernel:
Examples of conditional simulations and their standard deviation over 100 realizations that are used in the benchmark between QS and DS.
Benchmark between QS (with and without kernel) and DS over six metrics using each time 100 unconditional simulation.
In this section, we perform a sensitivity analysis on the parameters of QS
using the training image in Fig. 4. Only essential
results are reported in this section (Figs. 7 and 8); more exhaustive test results are
available in the appendix
(Figs. A4 and A5).
The two main parameters of QS are the number of neighbors
Figure 7 (and Appendix Fig. A4) shows that large
In this section, we investigate the scalability of QS with respect to the size of the simulation grid, the size of the training image grid, the number of variables, incomplete training images, and hardware. According to the test results, the code will continue to scale with new-generation hardware.
As explained in Sect. 2.3 and
2.4, the amounts of time that are consumed by the two
main operations of QS (finding candidates and sorting them) are independent
of the pixel values. Therefore, the training image that is used is not
relevant. (Here, we use simulations that were performed with the training image of
Fig. 4 and its classified version for categorical
cases.) Furthermore, the computation time is independent of the
parametrization (
We also test our implementation on different types of hardware, as summarized in Table 2. We expect Machine (2) to be faster than Machine (1) for medium-sized problems due to the high-memory-bandwidth requirement of QS. Machine (3) should also be faster than Machine (1), because it takes advantage of a longer vector computation (512-bit vs. 256-bit instruction set).
Sensitivity analysis on one simulation for the two main parameters of QS using a uniform kernel.
Figure 9 plots the execution times on the three tested machines for continuous and categorical cases and with training images of various sizes. Since QS has a predictable execution time, the influence of the parameters on the computation time is predictable: linear with respect to the number of variables (Fig. 9a, b), linear with respect to the size of the simulation grid, and following a power function of the size of the training image (Fig. 9c). Therefore, via a few tests on a set of simulations, one can predict the computation time for any other setting.
Figure 9d shows the scalability of the algorithm when
using the path-level parallelization. The algorithm scales well until all
physical cores are being used. Machine (3) has a different scaling factor
(slope). This suboptimal scaling is attributed to the limited memory
bandwidth. Our implementation of QS scales well with an increasing number of
threads (Fig. 9d), with an efficiency above 80 %
using all possible threads. The path-level parallelization strategy that was
used involves a bottleneck for large numbers of threads due to the need to
wait for neighborhood conflicts to be resolved (Mariethoz, 2010). This effect
typically appears for large values of
Sensitivity analysis on the kernel parameter
Efficiency of QS with respect to all key parameters. Panels
The parametrization of the algorithm (and therefore simulation quality) has
almost no impact on the computational cost, which is an advantage. Indeed,
many MPS algorithms impose trade-offs between the computation time and the
parameters that control the simulation quality, thereby imposing difficult
choices for users. QS is comparatively simpler to set up in this regard. In
practice, a satisfactory parametrization strategy is often to start with a
small
QS is adapted for simulating continuous variables using the
Combining multiple continuous and categorical variables can be challenging
for MPS approaches. Several strategies have been developed to overcome this
limitation, using either a different distance threshold for each variable
or a linear combination of the errors. Here we use the second approach,
taking advantage of the linearity of the Fourier transform. The relative
importance can be set in
There may be cases where QS is slower than DS, in particular when using a large training image that is highly repetitive. In such cases, using DS can be advantageous as it must scan only a very small part of the training image. For scenarios of this type, it is possible to adapt QS such that only a small subset of the training image is used; this approach is described in Appendix A3. In the cases of highly repetitive training images, this observation remains true also for SNESIM and IMPALA.
Furthermore, QS is designed to efficiently handle large and complex training images (up to 10 million pixels), with high variability of patterns and few repetitions. Larger training images may be computationally burdensome, which could be alleviated by using a GPU implementation thus allowing gains up to 2 orders of magnitude.
QS can be extended to handle the rotation and scaling of patterns by applying a constant rotation or affinity transformation to the searched patterns (Strebelle, 2002). However, the use rotation-invariant distances and affinity-invariant distances (as in Mariethoz and Kelly, 2011), while possible in theory, would substantially increase the computation time. Mean-invariant distances can be implemented by simply adapting the distance formulation in QS. All these advanced features are outside the scope of this paper.
QS is an alternative approach for performing
The QS framework provides a complete and explicit mismatch map, which can be used to formulate problem-specific rules for sampling or even solutions that take the complete conditional probability density function into account, e.g., a narrowness criterion for the conditional probability density function of the simulated value (Gravey et al., 2019; Rasera et al., 2020), or to use the mismatch map to infer the optimal parameters of the algorithm.
Standard partial sorting algorithms resolve tie ranks deterministically, which does not accord with the objective of stochastic simulation with QS, where variability is sought. Here, we propose an online heap-based partial sort. It is realized with a single scan of the array of data using a heap to store previously found values. This approach is especially suitable when we are interested in a small fraction of the entire array.
Random positions of the
This partial sort outperforms random exploration of the mismatch map. However, it is difficult to implement efficiently on GPUs. A solution is still possible for shared-memory GPUs by performing the partial sort on the CPU. This is currently available in the proposed implementation.
search in reinitialize the counter insert increment swap the last position with another of the same value insert the value at the expected position increment change the position of
To handle categorical variables, a standard approach is to consider each category as an independent variable. This requires as many FFTs as classes. This solution renders it expensive to use QS in cases with multiple categories.
An alternative approach is to encode the categories and to decode the mismatch from the cross-correlation. It has the advantage of only requiring only a single cross-correlation for each simulated pattern.
Here, we propose encoding the categories as powers of the number of
neighbors, such that their product is equal to one if the class matches. In
all other cases, the value is smaller than one or larger than the number of
neighbors.
In this scenario, in Eq. (1) this encoded
distance
Table A1 describes this process for three classes,
namely
Consider the following combination:
This encoding strategy provides the possibility of drastically reducing the
number of FFT computations. However, the decoding phase is not always
implementable if a nonuniform matrix
The principle of considering a fixed number of candidates can be extended by,
instead of taking the
The results of applying this strategy are presented in Table A2 and Fig. A1. The experimental results demonstrate that the partial exploration approach that is provided by splitting substantially accelerates the processing time. However, Fig. A1 shows that the approach has clear limitations when dealing with training images with complex and nonrepetitive patterns. The absence of local verbatim copy can explain the poor-quality simulation results.
Example of encoding for three classes and nine neighbors and their associated products. The emboldened main diagonal shows the situations when the search classes match their corresponded values.
Comparison of QS using the entire training image and using training image splitting. In these examples, the training image is split into two images over each dimension. The original training images are presented in Fig. 2.
Computation times and speedups for the full and partial exploration approaches. Times are specified for simulations with path-level parallelization.
Examples of 2D simulations: the first three rows represent three variables of a single simulation. Parameters available in Table A3.
Examples of 3D simulation results. Parameters available in Table A4.
Simulation parameters for Fig. A2. Times are specified for simulations without parallelization.
Simulation parameters for Fig. A3. Times are specified for simulations without parallelization.
Complete sensitivity analysis, with one simulation for the two main parameters of QS.
Complete sensitivity analysis, with one simulation for each
kernel with
The convolution theorem (Stockham, 1966; Krant, 1999; Li and Babu, 2019) can
be easily extended to cross-correlation (Bracewell, 2000). The flowing
derivation shows the validity of the theorem for any function
The source code and documentation of the QS simulation algorithm are
available as part of the G2S package at:
MG proposed the idea, implemented and optimized the QS approach and wrote the article. GM provided supervision, methodological insights and contributed to the writing of the article.
The authors declare that they have no conflict of interest.
This research was funded by the Swiss National Science Foundation. Thanks to Intel for allowing us to conduct numerical experiments on their latest hardware using the AI DevCloud. Thanks to Luiz Gustavo Rasera for his comments, which greatly improved the article; to Ehsanollah Baninajar for running his optimization method, which improved the reliability of the benchmarks; and to all the early users of QS for their useful feedback and their patience in waiting for this article. A particular thanks to Ute Mueller and Thomas Mejer Hansen that accepted to review the paper and provided constructive comments which significantly improved the quality of the paper.
This research has been supported by the Swiss National Science Foundation (grant no. 200021_162882).
This paper was edited by Adrian Sandu and reviewed by Ute Mueller and Thomas Mejer Hansen.