Elsevier

Computers & Geosciences

Volume 85, Part A, December 2015, Pages 210-233
Computers & Geosciences

Case study
Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming

https://doi.org/10.1016/j.cageo.2015.09.016Get rights and content

Highlights

  • This work is part of an effort to accelerate geostatistical simulation codes.

  • We apply acceleration techniques to a package of legacy geostatistical codes (GSLIB).

  • Acceleration techniques are code optimization and hybrid OpenMP/MPI parallelization.

  • Accelerations were applied to variogram, kriging and sequential simulation.

  • Elapsed time and speedup results are shown.

Abstract

The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners and researchers. Despite its widespread use, few attempts have been reported in order to bring this package to the multi-core era. Using all CPU resources, GSLIB algorithms can handle large datasets and grids, where tasks are compute- and memory-intensive applications. In this work, a methodology is presented to accelerate GSLIB applications using code optimization and hybrid parallel processing, specifically for compute-intensive applications. Minimal code modifications are added decreasing as much as possible the elapsed time of execution of the studied routines. If multi-core processing is available, the user can activate OpenMP directives to speed up the execution using all resources of the CPU. If multi-node processing is available, the execution is enhanced using MPI messages between the compute nodes.Four case studies are presented: experimental variogram calculation, kriging estimation, sequential gaussian and indicator simulation. For each application, three scenarios (small, large and extra large) are tested using a desktop environment with 4 CPU-cores and a multi-node server with 128 CPU-nodes. Elapsed times, speedup and efficiency results are shown.

Introduction

The Geostatistical Software Library (GSLIB), originally presented by Deutsch and Journel (1998), has been used in the geostatistical community for more than thirty years. It contains plotting utilities (histograms, probability plots, Q–Q/P–P plots, scatter plots, location maps), data transformation utilities, measures for spatial continuity (variograms), kriging estimation and stochastic simulation applications. Among these components, estimation and simulation are two of the most used components, and can be executed with large data sets and estimation/simulation grids. Large scenarios require several minutes/hours of elapsed time to finish, due to the heavy computations involved andtheir sequential implementation. Sincetheir original development,these routines have helped many researchers and practitioners in their studies, mainly due to the accuracy and performance delivered by this package. Many efforts have been proposed to accelerate or enhance the scope of the original package,WinGslib (Statios LLC, 2001), SGeMS (Remy et al., 2009) and HPGL (High performance geostatistics library, 2010) being the most relevant efforts. SGeMS and HPGL move away from Fortran and implement Python and C/C++ code in conventional and new algorithms. Although there is a significant gain with this change, for many practitioners and researchers, the simplicity of Fortran code and the availability of an extensive pool of modified GSLIB-based programs make it hard to abandon this package.

According to the authors' knowledge, few efforts have been reported in order to accelerate the GSLIB package by itself: analyzing, optimizing and accelerating the original Fortran routines. In this work we present case studies of accelerations performed on original GSLIB routines (in their Fortran 90 versions), using code optimization and multi-core programming techniques. We explain our methodology, in which a performance profile is obtained from the original routine, with the aim of identifying overhead sources in the code. After that, incremental modifications are applied to the code in order to accelerate the execution. OpenMP (Chandra et al., 2001) directives and MPI (Snir et al., 1998) instructions are added in the most time consuming parts of the routines. Similar experiences inother geostatistical codes have been reported in Straubhaar et al. (2013) and Peredo et al. (2014).

Section snippets

GSLIB structure

According to GSLIB documentation (Deutsch and Journel, 1998), the software package is composed of a set of utility routines, compiled and wrapped as a static library named gslib.a, and a set of applications that call some of the wrapped routines. We will refer to these two sets as utilities and applications. Typically, a main program and two subroutines compose an application (Fig. 1). The first subroutine is in charge of reading the parameters from the input files, and the second subroutine

Methodology

Re-design: First we have to re-design the application/utility code to identify the state of each variable, array or common block during the execution. This step is necessary to enable the user/programmer to identify the scope of each variable (data-flow analysis), in order to insert OpenMP directives into the code in a smooth and easy way.

Profiling and code optimization: After re-design, we have to study the run-time behavior of the application using a profiler tool. In our case we choose the

Case study

The proposed methodology was applied to accelerate four GSLIB applications: gamv, kt3d, sgsim and sisim. We tested the final versions of the applications in two Linux-based systems: the Server, running SUSE operating system with multiple nodes of 2×8-cores Intel Xeon CPU E526702.60 GHz interconnected through a fast Infiniband FDR10 network, and the Desktop, running openSUSE operating system with a single node of 1×4-cores Intel Xeon CPU E312253.10 GHz. All programs were compiled using GCC gfortran

Conclusions and future work

We have shown a methodology to accelerate GSLIB applications and utilities based on code optimizations and hybrid parallel programming using multi-core and multi-node execution with OpenMP directives and MPI task distribution. The methodology was tested in four well-known GSLIB applications: gamv, kt3d, sgsim and sisim. All tests were performed in Linux-based systems. However, no additional external libraries or intrinsic operating system routines were used, so the code could be compiled and

Source code

The current version of the modified codes can be downloaded from https://github.com/operedo/gslib-alges.

Acknowledgements

The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the Barcelona Supercomputing Center – Centro Nacional de Supercomputación (Spain) which supports the Marenostrum supercomputer, and the National Laboratory for High Performance Computing (Chile), which supports the Leftraru supercomputer. Additional thanks are owed to industrial supporters of ALGES laboratory, in particular Yamana Gold, as well as the Advanced Mining Technology Center

References (25)

  • C.V. Deutsch et al.

    GSLIBGeostatistical Software Library and User's Guide

    (1998)
  • S.L. Graham et al.

    gprofA Call Graph Execution Profiler

    SIGPLAN Not

    (1982)
  • Cited by (8)

    • Acceleration strategies for large-scale sequential simulations using parallel neighbour search: Non-LVA and LVA scenarios

      2022, Computers and Geosciences
      Citation Excerpt :

      In the case of local anisotropy, each location of the domain in study presents different preferential directions of continuity (Boisvert, 2010; Boisvert and Deutsch, 2011), which is commonly known as Locally Varying Anisotropy (LVA). Regarding previous works related to accelerating large scale geostatistical simulations, novel attempts in isotropic modelling have been reported in Vargas et al. (2007), Nunes and Almeida (2010), Peredo et al. (2015) and Rasera et al. (2015), in order to accelerate classical methods using different algorithmic approaches combined with multi-core and distributed architectures, particularly MPI and OpenMP. A recent work described in Peredo et al. (2018) follows the same path, preserving the original values of the single-core execution by splitting the neighbour search and simulation steps.

    • Direct Multivariate Simulation - A stepwise conditional transformation for multivariate geostatistical simulation

      2021, Computers and Geosciences
      Citation Excerpt :

      The statistical analysis of multiple stochastic realizations is crucial for decision-making and risk management processes, because it allows the quantification of the uncertainty of the predictions. Several computational toolboxes are available in the public domain (Pebesma and Wesseling, 1998; Pebesma, 2004; Hansen, 2004; Goovaerts, 2010; Peredo et al., 2015; Liu and Grana, 2019; Hansen et al., 2018). The sequential simulation approach (Journel, 1994) is one of the fundamental concepts in geostatistics.

    • A path-level exact parallelization strategy for sequential simulation

      2018, Computers and Geosciences
      Citation Excerpt :

      The straightforward approach is the realization-level, where each realization is performed by different operating system processes or threads by changing appropriately the pseudo-random seed or other structural parameters in each run. Peredo et al. (2015) and Navarro et al. (2014) applied this approach to the SISIM and SGSIM routines from GSLIB (Deutsch and Journel, 1998). Path-level parallelization is based on the partition of the domain into zones that can be handled by different processes or threads.

    View all citing articles on Scopus
    View full text