Research paperAccelerating Sequential Gaussian Simulation with a constant path
Introduction
Sequential Gaussian Simulation (SGS) is a popular method for generating stochastic values on a grid under the constraints of a statistical model and, possibly, some initially known values, herein referred to as hard data. SGS has been extensively used by practitioners because of its intuitive theoretical basis, its simple numerical implementation, and its high flexibility (e.g. Gómez-Hernández and Journel, 1993, Pebesma and Wesseling, 1998). Arguably, the major drawback of SGS is its computational cost. The exact estimation of kriging relies on taking into account all conditioning nodes, which results in large linear systems that need to be solved. For a square matrix of size n, common linear solvers have a computational complexity of (Trefethen and Bau, 1997), which means that the computational effort is proportional to the cube of the matrix dimension. Therefore, the sequential simulation of a grid with N nodes represents an -type problem (Dimitrakopoulos and Luo, 2004).
Various attempts have been undertaken to reduce the associated computational cost. The most widespread approach is the so-called limited or moving neighbourhood, that is, the approximation of the kriging estimate by using only a limited number of conditioning points referred to as the neighbours (e.g. Isaaks and Srivastava, 1989, Deutsch and Journel, 1992, Goovaerts, 1997). This reduces the computational complexity of SGS to , where k denotes the number of neighbours. This approach is rooted in the observation that neighbours which are located far away from the simulated point receive small or even vanishing weights. This effect originates from the rapid decrease of correlation with distance inherent to most covariance functions and is enhanced by the presence of intermediate neighbours screening the influence of those behind (e.g. Chilès and Delfiner, 1999). However, the omission of neighbours has shown to bias the simulation covariance matrix (Emery and Peláez, 2011, Nussbaumer et al., 2017), which in turn results in artifacts in the realizations (e.g. Meyer, 2004). Recent works on reducing such detrimental effects, while limiting the neighbourhood size and optimizing the computational efficiency, include those of Gribov and Krivoruchko, 2004, Rivoirard and Romary, 2011 and Dimitrakopoulos and Luo (2004).
An alternative to reducing the size of the kriging covariance matrix is to approximate it. Barry and Kelley Pace (1997) formulate covariance-based kriging, which leads to the inversion of sparse symmetric matrices. Sparse matrix solvers considerably improve the computational performance, but this approach is limited to simulations based on covariance functions with a finite range. Furrer et al., 2006, Memarsadeghi and Mount, 2007 further increase the sparsity of the matrix by tapering the covariance for large lag-distances. Related approaches comprise the approximate iterative method (Billings et al., 2002), the low rank approximation (Kammann and Wand, 2003), the Sherman-Morrison-Woodbury formula (Sakata et al., 2004), and fast summation methods (Memarsadeghi et al., 2008, Srinivasan et al., 2008).
Another approach is to only consider simulations whose covariance function is from a limited set of easily solvable covariance models. Omre et al. (1993) propose the screening sequential simulation, which provides exact simulations for covariance models with the Markov property, such as, for example, the exponential model in 1D. Hartman and Hössjer (2008) approximate the simulated Gaussian field with a set of Gaussian Markov random fields (Rue and Tjelmeland, 2002), which can be simulated exactly and efficiently. Finally, Cressie and Johannesson (2008) consider covariance models composed of a fixed number of basic non-stationary functions. This technique is also referred to as fixed-rank kriging. A related approach is the predictive processes method (e.g. Banerjee et al., 2008).
A more general technique to cope with the high computation costs of SGS is parallelization, which reduces the computation time by splitting the work among several cores (Vargas et al., 2007, Mariethoz, 2010, Nunes and Almeida, 2010, Rasera et al., 2015). It is important to note that parallelization does not reduce the computational burden, but merely spreads it over several cores, and hence is just a useful complement to the other techniques.
The approach explored in this study aims at decreasing the overall computational cost by taking advantage of the large number of realizations typically needed in geostatistical applications. Indeed, an uncertainty assessment can only be performed with an ensemble of realizations spanning the variability of outcomes. When the simulation path, that is, the order in which the nodes are simulated, is kept identical among multiple realizations, the neighbourhood configurations of each simulated node are also identical throughout these realizations. Because the kriging weights are computed solely with the relative distances between nodes, a constant neighbourhood configuration produces the same kriging weights. Therefore, these weights only need to be computed once and then can be re-used for all realizations. This reduces the computational effort of each additional realization to simple matrix multiplications.
While some works outline the advantages of using a constant path (e.g. Verly, 1993), the overwhelming majority still discourages its use, because of the risk to draw correlated realizations, and rather advocate a randomized path to explore the solution space more homogenously (e.g. Deutsch and Journel, 1992, Goovaerts, 1997). Conversely, Cáceres et al., 2010, Boisvert and Deutsch, 2011 reported that using a constant path in SGS does not result in a significant reduction of the space of uncertainty for neither first- nor second- order statistics, while allowing for compelling reductions in computational cost. However, both studies are based on empirical evidence and hence the generic validity of their findings remains to be verified.
In the present work, we seek to provide a thorough understanding of the implications of changing the simulation path in order to assess the constant path method. The paper is organized as follows. We begin by presenting a methodological description of randomized path simulations (section 2), followed by the implementation of a constant path method (section 3) and the quantification of the associated computational gains (section 4). Finally, we discuss some limitations of the covariance matrix evaluation (section 5).
Section snippets
Theory of randomized paths simulations
In order to understand the implications of generating stochastic realizations based on the same simulation path, the links between the random function (RF) Z, the realizations z, and the path need to be explored in some detail.
Pseudo-code
The simplest modification of the classical SGS (Algorithm 1) allowing for a constant path consists in moving the loop over the realizations into the loop over the grid nodes, following the computation of the weights (Algorithm 2). This simple permutation of loops illustrates why the simulation path P has to remain the same for all realizations. Algorithm 1 Traditional SGS. Algorithm 2 SGS with constant path.
Parallelization
Parallelization is easily implemented on the traditional SGS by sending each realization to a
Simulation errors
In the following, six covariance functions (exponential, Gaussian, spherical, hyperbolic, k-Bessel, and cardinal sine) are considered because of their variability close to the origin and near the range as well as is their popularity. A correlation range of 20 nodes normalized by their integral value is used. Because of the large cost of computing the exact covariance matrix, a small grid of 65 × 65 nodes is used in this section.
Unlucky path
An unlucky path is a particular path which leads to especially bad covariance matrix reproduction. These paths can be discussed by comparing the tails of the errors distribution in Fig. 2, Fig. 3, Fig. 4. With the reduction of variance demonstrated in equation (20), maximal SFN values in Fig. 2 also decrease with a largest number of simulation paths used. Based on Fig. 3, Fig. 4, increasing the neighbourhood size or using a multi-grid approach reduces even further the occurrence of unlucky
Conclusions
This study analyzed in detail the benefits and limitations of using a constant simulation path in SGS. Compared to the traditional implementation of SGS, the use of a constant path provides computational gains of several orders of magnitude. While randomizing the path among realizations slightly reduces the covariance errors, this reduction is only significant for a limited number of randomized paths. Most importantly, our analyses demonstrated that, for the majority of our simulations, these
Acknowledgments
This work has been supported by a grant from the Swiss National Science Foundation.
References (43)
- et al.
Programs for kriging and sequential Gaussian simulation with locally varying anisotropy using non-Euclidean distances
Comput. Geosci.
(2011) - et al.
Fast kriging of large data sets with Gaussian Markov random fields
Comput. Stat. Data Anal.
(jan 2008) A general parallelization strategy for random path based geostatistical simulation methods
Comput. Geosci.
(jul 2010)- et al.
Parallelization of sequential Gaussian, indicator and direct simulation algorithms
Comput. Geosci.
(aug 2010) - et al.
Gstat: a program for geostatistical modelling, prediction and simulation
Comput. Geosci.
(jan 1998) - et al.
A conflict-free, path-level parallelization approach for sequential simulation algorithms
Comput. Geosci.
(2015) - et al.
An efficient algorithm for kriging approximation and optimization with large-scale sampling data
Comput. Meth. Appl. Mech. Eng.
(2004) - et al.
Gaussian predictive process models for large spatial data sets
J. Roy. Stat. Soc. B
(sep 2008) - et al.
Kriging with large data sets using sparse matrix techniques
Commun. Stat. Simulat. Comput.
(1997) - et al.
Interpolation of geophysical data using continuous global surfaces
Geophysics
(nov 2002)
Random Function : a New Formalism for Applied Geostatistics
Speeding up Conditional Simulation: Using Sequential Gaussian Simulation with Residual Substitution
Geostatistics. Vol. 497 of Wiley Series in Probability and Statistics
Fixed rank kriging for very large spatial data sets
J. Roy. Stat. Soc. B
GSLIB: Geostatistical Software Library and User's Guide
Generalized sequential gaussian simulation on group size and screen-effect approximations for large field simulations
Math. Geol.
Testing the correctness of the sequential algorithm for simulating Gaussian random fields
Stoch. Environ. Res. Risk Assess.
Assessing the accuracy of sequential Gaussian simulation and cosimulation
Comput. Geosci.
Covariance tapering for interpolation of large spatial datasets
J. Comput. Graph Stat.
Geostatistics Tróia ’92. Vol. 5 of Quantitative Geology and Geostatistics
Geostatistics for Natural Resources Evaluation
Cited by (16)
Statistical upscaling workflow for warm solvent injection processes – Longitudinal and transverse dispersivity and thermal conductivity
2023, Chemical Engineering ScienceAcceleration strategies for large-scale sequential simulations using parallel neighbour search: Non-LVA and LVA scenarios
2022, Computers and GeosciencesCitation Excerpt :A recent work described in Peredo et al. (2018) follows the same path, preserving the original values of the single-core execution by splitting the neighbour search and simulation steps. In the same track, Nussbaumer et al. (2018) shows a similar approach using a constant path for multiple simulations, proposing a parallel search for neighbour, as first task, followed by the simulation step, with focus on execution of multiple realizations. In this work, a deep dive into a specific algorithm for parallel neighbour search is presented, which is also coupled with a previous work from the same authors.
The in situ stress field and microscale controlling factors in the Ordos Basin, central China
2020, International Journal of Rock Mechanics and Mining SciencesCitation Excerpt :This is a random method that combines Gaussian probability theory and a sequential simulation algorithm to generate the spatial distribution of continuous variables. The main steps can be summarized as101,102: (1) analyze the variogram of each type of lithofacies based on the existing data; (2) determine the random path; (3) calculate the conditional probability distribution function at the simulation node according to the variogram model and the kriging method; (4) randomly extract the quantile according to the probability distribution function, obtain a simulated value of that point, and add it to the conditional data; and (5) visit the next node along a random path to simulate until all nodes are simulated. The accuracy of the geological model directly determines the accuracy of the in situ stress field simulation.3,42,103–107
Antithetic random fields applied to mine planning under uncertainty
2018, Computers and Geosciences