Elsevier

Computers & Geosciences

Volume 112, March 2018, Pages 121-132
Computers & Geosciences

Research paper
Accelerating Sequential Gaussian Simulation with a constant path

https://doi.org/10.1016/j.cageo.2017.12.006Get rights and content

Highlights

  • We make the case for the use of a constant path approach in SGS.

  • Randomizing the simulation path brings small reduction of the covariance errors.

  • Constant multi-grid path combined with an increased neighbourhood size constitute the optimal solution.

Abstract

Sequential Gaussian Simulation (SGS) is a stochastic simulation technique commonly employed for generating realizations of Gaussian random fields. Arguably, the main limitation of this technique is the high computational cost associated with determining the kriging weights. This problem is compounded by the fact that often many realizations are required to allow for an adequate uncertainty assessment. A seemingly simple way to address this problem is to keep the same simulation path for all realizations. This results in identical neighbourhood configurations and hence the kriging weights only need to be determined once and can then be re-used in all subsequent realizations. This approach is generally not recommended because it is expected to result in correlation between the realizations. Here, we challenge this common preconception and make the case for the use of a constant path approach in SGS by systematically evaluating the associated benefits and limitations. We present a detailed implementation, particularly regarding parallelization and memory requirements. Extensive numerical tests demonstrate that using a constant path allows for substantial computational gains with very limited loss of simulation accuracy. This is especially the case for a constant multi-grid path. The computational savings can be used to increase the neighbourhood size, thus allowing for a better reproduction of the spatial statistics. The outcome of this study is a recommendation for an optimal implementation of SGS that maximizes accurate reproduction of the covariance structure as well as computational efficiency.

Introduction

Sequential Gaussian Simulation (SGS) is a popular method for generating stochastic values on a grid under the constraints of a statistical model and, possibly, some initially known values, herein referred to as hard data. SGS has been extensively used by practitioners because of its intuitive theoretical basis, its simple numerical implementation, and its high flexibility (e.g. Gómez-Hernández and Journel, 1993, Pebesma and Wesseling, 1998). Arguably, the major drawback of SGS is its computational cost. The exact estimation of kriging relies on taking into account all conditioning nodes, which results in large linear systems that need to be solved. For a square matrix of size n, common linear solvers have a computational complexity of O(n3) (Trefethen and Bau, 1997), which means that the computational effort is proportional to the cube of the matrix dimension. Therefore, the sequential simulation of a grid with N nodes represents an O(N4)-type problem (Dimitrakopoulos and Luo, 2004).

Various attempts have been undertaken to reduce the associated computational cost. The most widespread approach is the so-called limited or moving neighbourhood, that is, the approximation of the kriging estimate by using only a limited number of conditioning points referred to as the neighbours (e.g. Isaaks and Srivastava, 1989, Deutsch and Journel, 1992, Goovaerts, 1997). This reduces the computational complexity of SGS to O(k3N), where k denotes the number of neighbours. This approach is rooted in the observation that neighbours which are located far away from the simulated point receive small or even vanishing weights. This effect originates from the rapid decrease of correlation with distance inherent to most covariance functions and is enhanced by the presence of intermediate neighbours screening the influence of those behind (e.g. Chilès and Delfiner, 1999). However, the omission of neighbours has shown to bias the simulation covariance matrix (Emery and Peláez, 2011, Nussbaumer et al., 2017), which in turn results in artifacts in the realizations (e.g. Meyer, 2004). Recent works on reducing such detrimental effects, while limiting the neighbourhood size and optimizing the computational efficiency, include those of Gribov and Krivoruchko, 2004, Rivoirard and Romary, 2011 and Dimitrakopoulos and Luo (2004).

An alternative to reducing the size of the kriging covariance matrix is to approximate it. Barry and Kelley Pace (1997) formulate covariance-based kriging, which leads to the inversion of sparse symmetric matrices. Sparse matrix solvers considerably improve the computational performance, but this approach is limited to simulations based on covariance functions with a finite range. Furrer et al., 2006, Memarsadeghi and Mount, 2007 further increase the sparsity of the matrix by tapering the covariance for large lag-distances. Related approaches comprise the approximate iterative method (Billings et al., 2002), the low rank approximation (Kammann and Wand, 2003), the Sherman-Morrison-Woodbury formula (Sakata et al., 2004), and fast summation methods (Memarsadeghi et al., 2008, Srinivasan et al., 2008).

Another approach is to only consider simulations whose covariance function is from a limited set of easily solvable covariance models. Omre et al. (1993) propose the screening sequential simulation, which provides exact simulations for covariance models with the Markov property, such as, for example, the exponential model in 1D. Hartman and Hössjer (2008) approximate the simulated Gaussian field with a set of Gaussian Markov random fields (Rue and Tjelmeland, 2002), which can be simulated exactly and efficiently. Finally, Cressie and Johannesson (2008) consider covariance models composed of a fixed number of basic non-stationary functions. This technique is also referred to as fixed-rank kriging. A related approach is the predictive processes method (e.g. Banerjee et al., 2008).

A more general technique to cope with the high computation costs of SGS is parallelization, which reduces the computation time by splitting the work among several cores (Vargas et al., 2007, Mariethoz, 2010, Nunes and Almeida, 2010, Rasera et al., 2015). It is important to note that parallelization does not reduce the computational burden, but merely spreads it over several cores, and hence is just a useful complement to the other techniques.

The approach explored in this study aims at decreasing the overall computational cost by taking advantage of the large number of realizations typically needed in geostatistical applications. Indeed, an uncertainty assessment can only be performed with an ensemble of realizations spanning the variability of outcomes. When the simulation path, that is, the order in which the nodes are simulated, is kept identical among multiple realizations, the neighbourhood configurations of each simulated node are also identical throughout these realizations. Because the kriging weights are computed solely with the relative distances between nodes, a constant neighbourhood configuration produces the same kriging weights. Therefore, these weights only need to be computed once and then can be re-used for all realizations. This reduces the computational effort of each additional realization to simple matrix multiplications.

While some works outline the advantages of using a constant path (e.g. Verly, 1993), the overwhelming majority still discourages its use, because of the risk to draw correlated realizations, and rather advocate a randomized path to explore the solution space more homogenously (e.g. Deutsch and Journel, 1992, Goovaerts, 1997). Conversely, Cáceres et al., 2010, Boisvert and Deutsch, 2011 reported that using a constant path in SGS does not result in a significant reduction of the space of uncertainty for neither first- nor second- order statistics, while allowing for compelling reductions in computational cost. However, both studies are based on empirical evidence and hence the generic validity of their findings remains to be verified.

In the present work, we seek to provide a thorough understanding of the implications of changing the simulation path in order to assess the constant path method. The paper is organized as follows. We begin by presenting a methodological description of randomized path simulations (section 2), followed by the implementation of a constant path method (section 3) and the quantification of the associated computational gains (section 4). Finally, we discuss some limitations of the covariance matrix evaluation (section 5).

Section snippets

Theory of randomized paths simulations

In order to understand the implications of generating stochastic realizations based on the same simulation path, the links between the random function (RF) Z, the realizations z, and the path pi need to be explored in some detail.

Pseudo-code

The simplest modification of the classical SGS (Algorithm 1) allowing for a constant path consists in moving the loop over the realizations into the loop over the grid nodes, following the computation of the weights (Algorithm 2). This simple permutation of loops illustrates why the simulation path P has to remain the same for all realizations.

Algorithm 1

Traditional SGS.

Algorithm 2

SGS with constant path.

Parallelization

Parallelization is easily implemented on the traditional SGS by sending each realization to a

Simulation errors

In the following, six covariance functions (exponential, Gaussian, spherical, hyperbolic, k-Bessel, and cardinal sine) are considered because of their variability close to the origin and near the range as well as is their popularity. A correlation range of 20 nodes normalized by their integral value is used. Because of the large cost of computing the exact covariance matrix, a small grid of 65 × 65 nodes is used in this section.

Unlucky path

An unlucky path is a particular path which leads to especially bad covariance matrix reproduction. These paths can be discussed by comparing the tails of the errors distribution in Fig. 2, Fig. 3, Fig. 4. With the reduction of variance demonstrated in equation (20), maximal SFN values in Fig. 2 also decrease with a largest number of simulation paths used. Based on Fig. 3, Fig. 4, increasing the neighbourhood size or using a multi-grid approach reduces even further the occurrence of unlucky

Conclusions

This study analyzed in detail the benefits and limitations of using a constant simulation path in SGS. Compared to the traditional implementation of SGS, the use of a constant path provides computational gains of several orders of magnitude. While randomizing the path among realizations slightly reduces the covariance errors, this reduction is only significant for a limited number of randomized paths. Most importantly, our analyses demonstrated that, for the majority of our simulations, these

Acknowledgments

This work has been supported by a grant from the Swiss National Science Foundation.

References (43)

  • A. Boucher

    Random Function : a New Formalism for Applied Geostatistics

    (2007)
  • A. Cáceres et al.

    Speeding up Conditional Simulation: Using Sequential Gaussian Simulation with Residual Substitution

    (2010)
  • J.-P. Chilès et al.

    Geostatistics. Vol. 497 of Wiley Series in Probability and Statistics

    (mar 1999)
  • N. Cressie et al.

    Fixed rank kriging for very large spatial data sets

    J. Roy. Stat. Soc. B

    (jan 2008)
  • C.V. Deutsch et al.

    GSLIB: Geostatistical Software Library and User's Guide

    (1992)
  • R. Dimitrakopoulos et al.

    Generalized sequential gaussian simulation on group size and screen-effect approximations for large field simulations

    Math. Geol.

    (jul 2004)
  • X. Emery

    Testing the correctness of the sequential algorithm for simulating Gaussian random fields

    Stoch. Environ. Res. Risk Assess.

    (dec 2004)
  • X. Emery et al.

    Assessing the accuracy of sequential Gaussian simulation and cosimulation

    Comput. Geosci.

    (sep 2011)
  • R. Furrer et al.

    Covariance tapering for interpolation of large spatial datasets

    J. Comput. Graph Stat.

    (sep 2006)
  • J.J. Gómez-Hernández et al.

    Geostatistics Tróia ’92. Vol. 5 of Quantitative Geology and Geostatistics

    (1993)
  • P. Goovaerts

    Geostatistics for Natural Resources Evaluation

    (1997)
  • Cited by (16)

    • Acceleration strategies for large-scale sequential simulations using parallel neighbour search: Non-LVA and LVA scenarios

      2022, Computers and Geosciences
      Citation Excerpt :

      A recent work described in Peredo et al. (2018) follows the same path, preserving the original values of the single-core execution by splitting the neighbour search and simulation steps. In the same track, Nussbaumer et al. (2018) shows a similar approach using a constant path for multiple simulations, proposing a parallel search for neighbour, as first task, followed by the simulation step, with focus on execution of multiple realizations. In this work, a deep dive into a specific algorithm for parallel neighbour search is presented, which is also coupled with a previous work from the same authors.

    • The in situ stress field and microscale controlling factors in the Ordos Basin, central China

      2020, International Journal of Rock Mechanics and Mining Sciences
      Citation Excerpt :

      This is a random method that combines Gaussian probability theory and a sequential simulation algorithm to generate the spatial distribution of continuous variables. The main steps can be summarized as101,102: (1) analyze the variogram of each type of lithofacies based on the existing data; (2) determine the random path; (3) calculate the conditional probability distribution function at the simulation node according to the variogram model and the kriging method; (4) randomly extract the quantile according to the probability distribution function, obtain a simulated value of that point, and add it to the conditional data; and (5) visit the next node along a random path to simulate until all nodes are simulated. The accuracy of the geological model directly determines the accuracy of the in situ stress field simulation.3,42,103–107

    View all citing articles on Scopus
    View full text