Introduction

Petroleum geomechanics forms a critical part of reservoir engineering and wellbore construction models (Rhett 1998; Bazyrov et al. 2017; Akbarpour and Abdideh 2020; Mohamadian et al. 2021). Interactions of stress fields with subsurface lithologies and the formed structures require a comprehensive understanding of the mechanical behavior of the lithologies associated with gas and oil fields. Such an understanding helps to overcome many problematic drilling and field development challenges and reduce operational costs (Hudson et al. 2005; Rajabi et al. 2022a).

The development of geomechanical models depends on the availability of reliable data from laboratory analysis. This involves mechanical tests on wellbore core samples recovered from the subsurface sedimentary columns penetrated during gas and oil field exploration and development (Khoshouei and Bagherpour 2021; Miah et al. 2021). However, due to the high cost and time associated with wellbore coring operations, few oil or gas field wells are actually sampled by coring. This means that the availability of geomechanical measurements from cores is severely restricted. Consequently, estimates and extrapolations for these parameters have to be used. Many empirical relationships have been developed to compensate for this shortcoming based on the use of petrophysical well-log data (Eberhart-Phillips et al. 1989; Jørstad et al. 1999; Sohail et al. 2020). The basic input requirement for many geomechanical empirical relationships is shear wave velocity (VS) (Ghorbani et al. 2021). Moreover, for cost reasons and the limited geomechanical considerations associated with many historical wells, most wellbore logging suites do not record VS using the advanced and expensive dipole sonic log.

Due to subsurface heterogeneities, geomechanical variables commonly vary across gas and oil reservoir formations and along the wellbore profiles (especially in directional/ horizontal wells). Consequently, VS prediction is often required based on a few core measurements combined with well-log variables recorded continuously along the wellbore profiles. Machine learning (ML) methods provide an alternative method to make more reliable VS predictions than those provided by empirical relationships (Ashraf et al. 2020; Vo Thanh et al. 2020; Ali et al. 2021; Thanh et al. 2022; Vo-Thanh et al. 2022).

The compaction of reservoir and consequential subsidence associated with the Ekofisk field (North Sea) caused a great deal of additional cost to the field owners, which could have been avoided by evaluating the potential behavior of subsurface formations to engineering operations by applying appropriate geomechanical studies (Dusseault 2011). That field case highlights the necessity of conducting careful geomechanical studies for effective field development, thereby preventing extra operational costs (Fourie and Vawda 1992). However, providing appropriate geomechanical studies requires geomechanical data from the sedimentary sections of interest. Such data can be obtained in two ways. The first method is to measure the required data through the time intensive and costly geomechanical laboratory experiments on the available core plugs. This method provides non-continuous geomechanical data (limited to some specific points distributed across the sedimentary section) (Stark et al. 2014). The second method provides geomechanical data indirectly from petrophysical data, from which valuable rock properties, including porosity, density, and shear/compressional velocity, can be usefully determined (Medetbekova et al. 2021). The latter method is cost-effective since it does not require time consuming experiments and provides a continuous geomechanical dataset across the logged section of a wellbore (Tokeshi et al. 2013). Among the petrophysical logs required for this method, VS tends not to be routinely recorded in every well drilled in oil and gas fields, due to the additional operational cost associated with the specific logging tool required to record it (Wang et al. 2020). As a result, establishing predictive models for indirect evaluation of VS can be very useful for conducting geomechanical studies. Additionally, VS data is valuable for assisting decision-making in the selection of drilling locations and wellbore trajectories to ensure they achieve maximum well stability, preventing sand production, and the selection of appropriate zones for hydraulic fracturing (Fourie and Vawda 1992; Stark et al. 2014).

There are two conventional ways commonly used to estimate VS. These are (i) predictive models based on rock physics, and (ii) empirical correlation-based relationships (Wang et al. 2019). Modeling methods use the physical properties of rocks to develop petrophysical models to predict VS. Indeed, in rock physics modeling, VS is obtained by studying different rock physics models to calculate rocks’ effective elastic parameters. The factors that are typically considered in rock physics modeling are, porosity, pore shape, fluid inclusion properties, and matrix mineralogy (Wang et al. 2020). Several different physic-based models have developed for so far VS estimation (Xu and White 1995; Sun et al. 2008; Zhang et al. 2012; Guo and Li 2015; Darvishpour et al. 2019; Zhang et al. 2020; Ali et al. 2021). Theoretically, the rock physics model-based methods are not limited in application to specific geographic areas or petroleum basins, because they adequately address many of the drawbacks of empirical equations. Nevertheless, most of the modeling methods based on rock physics involve very complicated estimation processes due to their need to make assumptions about the shape of pores. Such assumptions tend to reduce, to some degree, the validity of the estimation results. Besides, in such models the matrix elastic parameters, compositions, and the mixing mode must be taken into account, together with the effects of pore shapes and the fluid constituents, to achieve accurate VS predictions. As a result of these difficulties, the models based on rock physics are of low efficiency and their complexity limits their appeal for real-world drilling and field development applications. The empirical correlation methods have been widely used to estimate VS from compressional wave velocity (VP) since they are quick and simple to apply, and relatively reliable methods (Wang et al. 2020) (Bailey and Dutton 2012; Lee 2013; Ojha and Sain 2014; Oloruntobi et al. 2019; Oloruntobi and Butt 2020). The reliability of empirical correlation equations originates from the fact that most of the factors affecting VP also influence VS in a similar manner but to different degrees (Xu and White 1995; Oloruntobi and Butt 2020). Table 1 lists some of the most commonly used empirical equations developed for VS prediction involving various relationships with VP. Vs signals recorded can be influenced by earthquake effects (Güllü and Pala 2014; Güllü and Jaf 2016; Güllü and Karabekmez 2017).

Table 1 Published common empirical correlations used to predict shear wave velocity (VS)

The fact that most empirical correlations for VS prediction oy iolve VP (Table 1) limits their accuracy and tends to make them field or basin specific. The results of these empirical equations are considerably influenced by lithology type, which may lead to inadequate prediction accuracy (Akhundi et al. 2014; Güllü and Jaf 2016). Besides, the lack of generalizability to other fields and their poor fit with real data across an entire sedimentary section limits the confidence with which such relationships can be applied (Güllü and Pala 2014; Güllü and Jaf 2016; Gholami et al. 2020; Oloruntobi and Butt 2020; Rajabi et al. 2021; Rajabi et al. 2022a). In recent years, the much-improved computational efficiency and prediction accuracy achieved by various ML methods has resulted in various ML methods being applied to predict VS from well-log input data (Eskandari et al. 2004; Rezaee et al. 2007; Rajabi et al. 2010; Asoodeh and Bagheripour 2013, 2014; Gholami et al. 2014; Maleki et al. 2014; Oloruntobi et al. 2019; Gholami et al. 2020; Wang et al. 2020; Zhang et al. 2020). The datasets used in those models are typically verified with just a few core measurements and in some cases, include seismic data, with details listed in Table 2 (Al-Dousari et al. 2016). However, as ML and deep learning (DL) methods improve and more extensive datasets become available from around the globe, much scope remains to improve on VS prediction accuracy (Wang et al. 2020; Wood 2020). Moreover, the possibility exists to make the methodologies more robust and generalizable within hydrocarbon fields and across sedimentary basins.

Table 2 ML techniques previously proposed for predicting VS. See

In this paper, three recently developed techniques are developed and evaluated to predict VS for several wells drilled in a giant oil field with both carbonate and sandstone reservoirs using data from standard well logs (Fig. 1). These include two HML techniques: multi-hidden layer extreme learning machine hybridized with a particle swarm optimizer (MELM-PSO); and MELM hybridized with a genetic algorithm (MELM-GA). The third technique is the DL model convolutional neural network (CNN). The main novelty and features of this study are to develop, apply, and compare Vs predictions from these three techniques applied to a large multiple-well dataset from a giant oil field. The Vs prediction performance of the DL and HML algorithms is also compared, for the same dataset, with commonly used empirical Vs prediction models. Recent research has applied machine and deep learning algorithms, as robust computational tools to many engineering fields in order to solve a wide range of problems. Furthermore, full-scale comparison is performed between the hybrid machine learning models and a deep learning model. This identifies the most effective and accurate model for predicting the shear wave velocity. As a verification measure, we also address possible concerns about ensuring the integrity and repeatability of the proposed machine learning practical models by applying them to data from another well in the field. As a fast and very low-cost solution compared to other available methods, the technique involves only minor disadvantages. Execution constraints (appropriate computer system processing power) represent a constraint related to the number of data records and log variables that these models can process. Additionally, the quality of the recorded standard logs is important, and poor quality recorded log data will result in higher Vs prediction errors. The method’s advantages outweigh their disadvantages, and the HML and DL models developed can be defined as reference classes or libraries for general use.

Fig. 1
figure 1

Schematic diagram outlining the technique to predict Vs data from a standard suite of well logs by applying deep learning prediction model

Methods

Work flow

A work flow diagram (Fig. 2) summarizes the sequence of construction and evaluation steps involved in applying the DL and HML algorithms to predict VS and establish the prediction accuracy achieved. The process sequence begins with compiling a dataset and statistically assessing the value distribution of each of the component data variables. The maximum and minimum values for each data variable (attribute) are used to normalize the variable values so that they fall within the range of −1 and + 1. Normalization is achieved using Eq. (1) and is important because it avoids scaling biases in the learning processes adopted by the DL and HML algorithms (Kamali et al. 2022).

$$x_{i}^{l} = \left( {\frac{{x_{i}^{l} - x{\text{min}}^{l} }}{{x{\text{max}}^{l} - x{\text{min}}^{l} }}} \right)*2 - 1$$
(1)

where:

Fig. 2
figure 2

Workflow schematic for comparing the VS prediction performance of HML and DL algorithms

\(x_{i}^{l}\) = the value of attribute \(l\) for data record i;

\(x{\text{min}}^{l}\) = the minimum value of the attribute \(l\). among all the data records in the dataset; and,

\(x{\text{max}}^{l}\) = the maximum value of the attribute \(l\) among all the data records in the dataset.

The normalized data records are then assigned to either a training subset or a testing subset. Trial and error tests indicate that an approximate 70%:30% split of data records between training and testing subsets works well for most reasonably sized datasets. The testing subset of data records is held independently of the training subset and is not involved in the algorithms’ training processes. A K-fold method is used to sample the training subset for validation purposes. Statistical measures of accuracy are then used to assess the VS prediction performance of each DL and HML algorithm evaluated to establish their relative VS prediction capabilities.

Machine-learning (ML) algorithms

ML algorithms are now usefully applied to solve many oils and gas operational and prediction challenges including drilling, reservoir performance, and geomechanical characterization (Gullu 2017; Ashraf et al. 2020; Ashraf et al. 2021; Ranaee et al. 2021). ML algorithms are well suited to evaluating problems involving multiple variables with nonlinear relationships and complex value distributions (Gullu 2017; Hazbeh et al. 2021b). Recently, some researchers work on the shear wave velocity based on machine learning algorithms. Artificial neural network (ANN), extreme learning machine (ELM), support vector machine (SVM) and other algorithms based mainly on regression / correlation relationships have been successfully applied to progressively improve the prediction performance of variables relevant to the petroleum industry (Farsi et al. 2021b). Some of the researchers work on the Vs based on the ML work (Weijun et al. 2017; Azadpour et al. 2020; Zhang et al. 2020; Zhang et al. 2021; Miah 2021; Olayiwola et al. 2021; Zhong et al. 2021; Ebrahimi et al. 2022).

Single machine-learning (SML) algorithms

Extreme learning machine (ELM)

ELM is a rapidly executed feed-forward neural network (Huang et al. 2006). It can be usefully applied to reduce learning time, improve accuracy, and increase generalizability (Huang et al. 2006; Huang et al. 2011; Huang 2014; Wang et al. 2014; Cheng and Xiong 2017; Naveshki et al. 2021; Zhang et al. 2022). ELM differs from an ANN, utilizing back-propagation or other optimization algorithms, in that all the ELM’s internal learning parameters are randomly determined. This saves computational time as during ELM training, the parameters associated with the hidden layer (weights and biases) do not need to be adjusted. The output weights are determined by the inverse Moore–Penrose function applied to the hidden layer to output matrix (Yeom and Kwak 2017). The structure of a simple ELM (with a single hidden layer) is shown in Fig. 3.

Fig. 3
figure 3

Schematic architecture of Extreme learning machine (ELM) with a single hidden layer. Modfied with permission from ref. (Abad et al. 2021a)

ELM performance for complex problems can be improved by introducing more than one hidden layer. The multi-layer ELM algorithm is configured as follows:

  • Step 1: Determine the number of hidden layers (l) and neurons in each layer.

  • Step 2: Assuming (X, Y) = (xi, yi) = (i = 1,2, 3, …, Q) as training data; where X is the matrix of input variable values for each data record and Y is the output variable vector including all data records.

  • Step 3: Each hidden layer has n neurons and an activation function g (x). Weights between layers i and (i-1) and biases applied to layer i are randomly generated.

  • Step 4: Calculate \({\text{W}}_{{{\text{IE}}}} = \left[ {\text{B W}} \right],{\text{ X}}_{{\text{E}}} = \left[ {1{\text{ X}}} \right]^{{\text{T}}}\).

  • Step 5: Calculate the H matrix with Eq. (2):

    $${\text{H}} = {\text{g}}\left( {{\text{W}}_{{{\text{IE}}}} {\text{X}}_{{\text{E}}} } \right)$$
    (2)
  • Step 6: If i is less than l, calculate Eq. (3) and return to step three. Otherwise go to the next step.

    $${\text{X}} = {\text{H}}^{{\text{T}}} , {\text{i}} = {\text{i}} + 1$$
    (3)
  • Step 7: The output weights are calculated based on the Moore–Penrose inverse by applying Eq. (4):

    $${\upbeta } = {\text{pinv}}\left( {{\text{H}}^{{\text{T}}} } \right) \times {\text{Y}}$$
    (4)
  • Step 8: The output prediction is calculated with Eq. (5).:

    $$\hat{Y} = \left( {H^{T} \times \beta } \right)^{T}$$
    (5)
Genetic algorithm (GA)

GA is an evolutionary algorithm developed in the 1960’s and inspired by the principles of genetics, involving functions that mimic inheritance, mutation, selection, and combination. It establishes an initial population of randomly generated artificial “chromosomes” (Mohamadian et al. 2021). Each chromosome is evaluated through several evolutionary iterations with a cost function, which is progressively minimized. To determine the attributes of the next generation of “chromosomes” the value of the current generation is ranked (elitism) and only the best performing ones are “selected” to participate in reproduction. Crossover and mutation operations, with an assigned degree of randomness, are then involved in producing the next generation. The degree of randomness helps the GA from avoiding being trapped at local minima, enabling it to thoroughly explore the feasible solution space. Figure 4 illustrates the GA process in the form of a flowchart.

Fig. 4
figure 4

Flowchart showing the execution sequence of a genetic algorithm (GA) optimizer

Particle swarm optimization (PSO) algorithm

Figure 5 illustrates the PSO algorithm in the form of a flowchart. PSO searches the feasible solution space using a population (swarm) of particles, the adjusted movements of which are inspired by those of flocks of birds or shoals of fish. The positions of the initial population are set randomly in the search space, which is defined by the minimum and maximum values of the decision variables. The particle is moved in different directions and at different speeds between the lower limit (Vmin) and the upper limit (Vmax) from one iteration to the next. The designated positions of each particle is recorded and their best historical individual position is stored as a “personal best” (Pb) and used in partially determining the movements going forward.

Fig. 5
figure 5

Flowchart showing the execution sequence of a particle swarm optimizer (PSO). Modfied with permission from ref. (Rashidi et al. 2021)

The position of all particles is evaluated by the objective function (cost function), and the particle with the lowest cost function value is identified in each iteration as the best global position (Gb). In each iteration, a new velocity (Vi (t + 1)) for each particle (i) is calculated based on the previous velocity (Vi (t)) and the distance of the particle’s current position (xi (t)) in the solution space compared to its best historical personal position and the best global position achieved by the swarm so far (Eq. 6). Subsequently, the new position of each particle (xi (t + 1)) is calculated based on its prevailing position and the new calculated velocity (Eq. (7)).

$${\text{V}}_{{\text{i}}} \left( {{\text{t}} + 1} \right) = {\text{wV}}_{{\text{i}}} \left( {\text{t}} \right) + {\text{c}}_{1} {\text{r}}_{1} \left( {{\text{Pb}}_{{\text{i}}} \left( {\text{t}} \right) - {\text{x}}_{{\text{i}}} \left( {\text{t}} \right)} \right) + {\text{c}}_{2} {\text{r}}_{2} \left( {{\text{G}}_{{\text{b}}} \left( {\text{t}} \right) - {\text{x}}_{{\text{i}}} \left( {\text{t}} \right)} \right)$$
(6)
$${\text{x}}_{{\text{i}}} \left( {{\text{t}} + 1} \right) = {\text{x}}_{{\text{i}}} \left( {\text{t}} \right) + {\text{V}}_{{\text{i}}} \left( {{\text{t}} + 1} \right)$$
(7)

where:

i = 1, 2, …, n, are the number of particles in the swarm;

w = Inertia weight, representing a recurrence value that controls particle velocity (Pedersen and Chipperfield 2010; Jafarizadeh et al. 2022);

c1, c2, are positive-valued personal (cognitive) and collective (social) learning coefficients, respectively (Coello et al. 2007); and,

r1, r2 are random numbers in the range [0,1].

The new position of each particle is then re-evaluated with the cost function. The PSO algorithm is well suited to efficiently explore continuous solution spaces without becoming easily trapped at local minima.

HML algorithm configurations

Multi-layer extreme learning machine (MELM) hybridized with optimizers

MELM performance depends on the number of hidden layers included, and the number of neurons in each of those layers. The MELM structure varies according to the complexities of the dataset (Rashidi et al. 2021). The more complex the problem, the greater the number of hidden layers and neurons. On the other hand, the more layers and neurons involved, the longer the computational time. Therefore, optimizing the MELM structure can lead to a high-precision model with an efficient learning process and relatively short computational requirements. A trial-and-error method can be used to determine the appropriate structures of multilayer ANN and MELM, but this can be very time consuming. Therefore, in this study the PSO algorithm is used to determine the number of MELM hidden layers and number of neurons in each layer. On the other hand, due to the process of randomly selecting of hyperparameters for MELM, different answers may be obtained each time the algorithm is implemented. To solve this problem, the MELM algorithm is combined with the optimizer (GA or PSO) to firstly identify the optimum hyperparameter values (Fig. 6).

Fig. 6
figure 6

Flowchart for implementing the hybrid MELM-PSO and MELM-GA applied to predict VS. Modfied with permission from ref. (Abad et al. 2021a)

GA and PSO optimization algorithms have adjustable hyperparameters (control values) that influence the efficiency of their performance. Trial-and-error methods were used to determine these control values (Tables 3 and 4). A total of 50 iterations of the optimizers were used to identify the optimum number of layers and neurons in the MELM, whereas 200 iterations (Tables 3 and 4) of the optimizers were used to optimize the weights and biases of the MELM-GA and MELM-PSO hybrid models (Abad et al. 2022).

Table 3 GA control parameter values applied in the MELM-GA algorithm
Table 4 PSO control parameter values applied in the MELM-PSO algorithm

The K-fold cross-validation technique was applied, with a tenfold set up, to achieve more stable and reliable VS prediction results in determining the number of MELM layers and neurons. This divides the entire dataset into ten equal portions. The model is then evaluated ten times with each execution using nine portions of the data records as the training subset, and one portion of the data records as the validation subset (Fig. 7). Each of the ten portions is therefore used once as the validation subset.

Fig. 7
figure 7

K-fold cross-validation applied in the training phase (tenfold used) and values obtained then used to evaluate the testing subset

Table 5 shows the provisional VS prediction results for different MELM structures, established by trial and error, using the tenfold cross-validation technique. They indicate that MELM with between 2 and 6 hidden layers and with between 6 and 10 neurons achieves the lowest RMSE for VS predictions. In order to save computational time, the optimizers were therefore constrained to vary MELM layers between values of 2 and 6 and the number of neurons between 6 and 10.

Table 5 The values of RMSE (VS in km/s) for the different number of hidden layers and neurons in the layers for the MELM models developed for VS prediction

Deep learning

Convolutional neural network (CNN)

CNN have demonstrated their capabilities in diverse applications in recent years, including prediction and learning applications related to image recognition (Krizhevsky et al. 2017), reading comprehension (Yu et al. 2018), and reinforcement learning in game strategy (Silver et al. 2016; Abad et al. 2021a). CNN uses convolutional (weight sharing) layers instead of the traditional fully connected layers of neural networks such as ANN and ELM (Abad et al. 2021b). This compresses the layers and neurons of CNN compared to fully connected networks and often enables them to generate higher resolution predictions with less training data records for specific problems.

Figure 8 shows a generic CNN structure. It has several parallel filters acting on the input data records that can be configured to extract different features. The input vector is filtered by each of the CNN filter layers, with each layer producing its own output vector; Therefore, the dimensions of the network increase with the number of filter layers selected. A pooling layer is then used to reduce the dimensions and normalize the selected variables, feeding that data into the concatenate layer. This information is then fed into the dense layer (s) to generate the final output. This dense layer (like the multilayer perceptron neural network) is made up of a number of neurons, the number of which is determined by the user (trial and error) or an optimizer. The model is executed to establish the weights and biases for the neurons in the dense layers that achieve the highest dependent variable prediction accuracy.

Fig. 8
figure 8

Schematic illustration of the structure of a deep learning convolutional neural network (CNN). Modfied with permission from ref. (Abad et al. 2021b)

There are a number of hyperparameters that need to be set when developing a CNN model. For the CNN constructed in this study to predict VS, based on trial-and-error, the number of filters was set to 200, A kernel size (convolutional window length) of 3 was selected, the “relux” activation function was applied and the number of neurons in the dense layer was set to 100.

Statistical measures of prediction accuracy

VS prediction performance comparison between the HLM, DL and empirical models evaluated are conducted by calculating widely used statistical measures of prediction accuracy as expressed in Eqs. 8, 9, 10, 11, 12, 13, 14 and 15.

Percentage deviation (PD) or relative error (RE)

$${\text{PD}}_{i} = \frac{{H_{{\left( {{\text{Measured}}} \right)}} - H_{{\left( {{\text{Predicted}}} \right)}} }}{{H_{{\left( {{\text{Measured}}} \right)}} }}x{ }100$$
(8)

Average percentage deviation (APD):

$${\text{APD}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} PD_{i} }}{n}$$
(9)

Absolute average percentage deviation (AAPD):

$${\text{AAPD}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left| {PD_{i} } \right|}}{n}$$
(10)

Standard Deviation (SD):

$${\text{SD}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {D_{i} - {\text{Dimean}}} \right)^{2} }}{n - 1}}$$
(11)
$${\text{Dimean}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {H_{{{\text{Measured}}_{i} }} - H_{{{\text{Predicted}}_{i} }} } \right)$$
(12)

Mean Square Error (MSE):

$${\text{MSE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {Z_{{{\text{Measured}}_{i} }} - Z_{{{\text{Predicted}}_{i} }} } \right)^{2}$$
(13)

Root Mean Square Error (RMSE):

$${\text{RMSE}} = \sqrt {{\text{MSE}}}$$
(14)

Coefficient of Determination (R2):

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{N} \left( {H_{{{\text{Predicted}}_{i} }} - H_{{{\text{Measured}}_{i} }} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{N} \left( {H_{{{\text{Predicted}}_{i} }} - \frac{{\mathop \sum \nolimits_{I = 1}^{n} H_{{{\text{Measured}}_{i} }} }}{n}} \right)^{2} }}$$
(15)

These indicators of prediction accuracy are best considered together rather than individually as they all reveal complementary information and insight into the prediction performance of the algorithms considered. RMSE is used as the objective function for the HML and DL models, making it the single most important measure, as those algorithms are configured to minimize RMSE.

Data collection and characterization

Marun field description

To predict VS, well log data from three wellbores drilled in the Marun oil field: MN#163, MN#225 and MN#179, are evaluated. This giant oil field is located onshore southwest of Iran (Fig. 9). It was discovered in 1963 and is one of the largest oil fields in the Zagros Basin with two producing oil reservoirs; the Asmari (Oligocene to Early Miocene) and Bangestan (Upper Cretaceous) formations. Collectively, these reservoirs contain in-place oil resources of some 46 billion barrels. In addition, the Khami (Lower Cretaceous) formation forms an underlying natural gas reservoir with some 462 trillion cubic feet of gas-in-place.

Fig. 9
figure 9

Marun oil field located onshore Iran in the Zagros basin. Repoduced with permission from ref (Rashidi et al. 2020)

Data collection and data distribution

Well-log datasets compiled for wells MN#163, MN#225 and MN#179 sample the Asmari carbonate reservoir. Data records from two of the wells (MN#163 and MN#225) were used for supervised training and validation of the DL and HML algorithms in terms of VS prediction accuracy. Data from well MN#179 was then used as an independent testing subset to test the models for VS prediction accuracy with data previously unseen by the trained and validated model.

The well-log variables used as input features for the VS prediction models are gamma ray (GR); compressional-wave velocity (VP); bulk density (RHOB); neutron porosity (NPHI); shallow resistivity (RES-SHT); medium resistivity (RES-MED); deep resistivity (RES-DEP) and caliper (CP). Table 6 statistically summarizes the distributions of the nine variables involved (8 input plus VS as dependent variable) sampled from the Asmari reservoir sections penetrated by the three wells: MN#163 (3793 data records), MN#225 (2829 data records) and MN#179 (2072 data records), constituting 8694 data records in total.

Table 6 Statistical characterization of the data variables constituting the well-log dataset for three Marun-oil-field wells: MN#163, MN#225 and MN#179

The ranges of the data variables covered by the Asmari reservoir well-log samples are substantial (Table 6). For instance, the VS range evaluated extends from 1.40 km/s to 3.15 km/s across the three wells considered. This highlights the lithological variety within the Asmari reservoir including, limestone, dolomite, shale, siltstone, sandstone and evaporite layers.

The best subset of input variables was selected based on evaluation of correlation coefficients between each input variables and the measured VS values. The input variables displaying the highest correlation coefficients were selected for VS modeling. Figure 13 shows that four input variables, VP, GR, RHOB, and NPHI, have the highest correlation coefficients with VS. The HML and DL model were initially built using these four selected features. The impact of the other potential input variables was then evaluated by adding them, one at a time, to the selected feature subset to predict VS. The result of that analysis revealed that by adding the variables RES-DEP and RES-SHT, to the four originally selected features based on correlation coefficient, generated more accurate VS predictions. Therefore, these six features were used to build the HML and DL models finally evaluated.

Results

Identifying the best performing algorithm for VS prediction

Tables 7 and 8 display the VS prediction accuracies based on the training (70%) and validation (30%) subsets, respectively, selected from the 6622 data records available for wells MN#163 and MN#225. This represents the supervised training and learning performance for the HML and DL algorithms. The performance of five empirical relationships used for predicting VS from VP (Table 1) are also shown for each of these data subsets.

Table 7 VS Prediction accuracy statistics for the training subset (~ 70% of available data records) in respect of shear wave velocity (VS; km/s) (for MN#163 and MN#225)
Table 8 VS Prediction accuracy for the validation subset (~ 30% of available data records) in respect of shear wave velocity (VS; km/s) (for MN#163 and MN#225)

Table 9 displays the VS prediction accuracies for the supervised and trained HML and DL algorithms applied to all 6622 data records for wells MN#163 and MN#225. The performance of five empirical relationships (Table 1) are also shown for comparison.

Table 9 VS Prediction accuracy for all data records from wells MN#163 and MN#225, considered collectively

Close inspection of the models’ VS prediction results (Tables 7, 8 and 9) reveals that the DL CNN model achieves exceptionally high VS prediction accuracy when applied to the two subsets and all data records for the two wells involved in supervised learning. (e.g., from Table 9 CNN: RMSE = 0.0456 km/s; AAPD = 1.477%; R2 = 0.9808). The HML models also achieve high VS prediction accuracy, for the two subsets and full supervised learning dataset, but they do not match that of the CNN model. The MELM-PSO model performs slightly better than the MELM-GA model. The recorded VS prediction performance (RMSE) therefore ranks the DL, HML models and empirical equations as follows: CNN > MELM-PSO > MELM-GA > Castagna et al. > Eskandari et al. > Pickett > Brocher > Carroll.

It is very clear from Tables 7, 8 and 9 that the DL and HML models substantially outperform all five of the empirical models used to predict VS using relationships with VP. This outcome highlights the value of using information from a suite of well logs rather than just relying on VP data to predict VS. Figure 10 displays the predicted versus measured VS values for the data records in each subset and the full supervised learning dataset evaluated by the HML and DL models. The superior prediction performance of the DL CNN model is apparent as it involves no substantial outlier predictions. On the other hand, MELM-PSO and MELM-GA models do involve a few substantial outliers (only about 5 data records out of 6622 total data records).

Fig. 10
figure 10

Shear wave velocity (VS) prediction versus measured values for each data record in the training and validation subsets and the full dataset evaluated for the Marun oil field wellhead measurements related to the 6622 data records from Marun oil field (for MN#163 and MN#225)

Figure 11 reveals that the most commonly used empirical models (Table 1) provide workable VS prediction accuracy (R2 ~ 0.86) for this dataset but are substantially less reliable than the DL and HML models. The results in Table 1 show that the RMSE for an empirical equation is substantially greater than the RMSE for the CNN and HML models. The Castagna et al. (1993) relationship performs better than the other empirical models evaluated for the Asmari reservoir (Tables 7, 8 and 9). Figure 12 displays the relative percentage error (PD%) for VS predictions for each of the 6622 data records (wells MN#163 and MN#225) constituting the training and validation subsets. These are displayed sequentially for the high performing DL and HML models. The PD% range for DL model (~ −20% < PDi <  ~ 15%) is substantially better than for the HML models (~ -70% < PDi <  ~ 45%) but for most data records PD% is <  ± 5%. The PD% range for the empirical relationships is much greater, and for most data records PD% is >  ± 15%. The Castagna et al. (1993) model performs better (~ −20% < PDi <  ~ 25%) than other empirical models and the Carroll (1969) relationship performs the worst for the PD accuracy measure (~ −120% < PDi <  ~ −20%).

Fig. 11
figure 11

VS predicted versus VS measured for the five empirical models applied to the full set of supervised learning data records (i.e., for wells MN#163 and MN#225)

Fig. 12
figure 12

VS prediction error (PD%) compared for all 4635 training subset data records and 1987 validation subset data records for the DL and HML models evaluated (for wells MN#163 and MN#225)

A plot of VS RMSE versus iteration number (Fig. 13) for the DL and HML algorithms identifies that all three algorithms converge to highly accurate solutions rapidly. The MELM-PSO and MELM-GA models converge at similar rates and after fewer iterations than the CNN algorithm. Although it takes more iterations, the CNN does achieve the lowest RMSE solutions by outperforming the HML algorithms after 100 iterations.

Fig. 13
figure 13

VS RMSE for the training subset (drawn from wells MN#163 and MN#225) for the DL and HML algorithms during supervised learning

Discussion

Relative influences of the input variables on VS

Spearman’s correlation coefficient (ρ), expressed on a scale of −1 to + 1 (Gauthier 2001), is calculated (Eq. 16) to establish the nonparametric relationships between the input variables and VS.

$$\rho = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {T_{i} - \overline{T}} \right)\left( {Q_{i} - \overline{Q}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {T_{i} - T} \right)^{2} \mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{i} - \overline{Q}} \right)^{2} } }}$$
(16)

where:

Ti = T input variable value of data record i;

\(\overline{T}\) = mean value for variable T;

Qi = Q dependent variable (VS) value of data record i;

\(\overline{Q }\) = mean value for dependent variable Q;

n = number of data records in dataset or subset.

Figure 14 identifies, using the ρ values calculated for all 6622 of the supervised learning datasets, that VP has, as should be expected, the greatest influence on VS. On the other hand, CP has the least influence on VS. The input variables NPHI, GR and RHOB also show substantial influences on VS, whereas the resistivity variables show negligible influences on VS.

Fig. 14
figure 14

VS relationships with input variables assessed based on Spearman’s non-parametric correlation coefficient values calculated for all data records of the supervised learning dataset (from wells MN#163 and MN#225)

Development and generalization of CNN model applied to other marun field wells

The best VS prediction model (DL CNN) established for the Asmari reservoir, trained based on supervised learning using the dataset compiled from wells MN#163 and MN#225, is applied to data previously unseen by the trained and validated model. It does so by evaluating the dataset compiled for Marun oil field well MN#179 (2072 data records; Tables 7, 8 and 9).

The statistical measures of accuracy achieved for these MN#179 data records using the same eight well-log data input variables are listed in Table 8. These results confirm high VS prediction accuracy using the prediction model trained and validated with data from the other two wells. Figure 15 plots the measured versus predicted VS values predicted by the CNN model trained with MN#163 and MN#225 data records and applied to all 2072 data records from wells MN# 179. The prediction performance is very good, confirming its reliability. This makes it suitable for application in other wells drilled into the Asmari reservoir in the Marun oil field for which VS well log data has not been recorded. To apply the trained model to other wells, a standard suite of well logs is required for the wells of interest. Fortunately, such a suite of well logs is available for most of the existing wells in the field (Table 10).

Fig. 15
figure 15

Cross plot of predicted versus measured VS values for the DL CNN model trained with data from wells MN#163 and MN#225 and applied to data records from well MN#179 previously unseen by the trained and validated model

Table 10 VS prediction accuracy of the CNN model (trained with MN#163 and MN#225 dataset), applied to the Asmari reservoir section of Marun Field well MN#179 previously unseen by the trained and validated model

Figure 16 shows the VS prediction performance of the DL CNN model applied to the dataset from well MN#179 in terms of percentage error (PD%) for each data record arranged in order of sample depth through the Asmari reservoir. While most of the PD errors for these data records are is <  ± 5% in the lower 500 samples (equivalent to the lower 100 m of the Asmari section) several PD errors of between 5 and 15% are recorded. These outlying values in the lower part of the Asmari section drilled in well MN#179 are worthy of further analysis, but their prediction accuracy remains within reasonable error limits. The DL CNN model described and evaluated here could be used in a similar way to predict VS in other fields but, of course, it would need to initially be recalibrated with some direct VS measurements from at least one well in each of the fields / reservoirs to which it is applied.

Fig. 16
figure 16

VS prediction error (PD%) for the DL CNN model trained with data from wells MN#163 and MN#225 and applied to data records from well MN#179 previously unseen by the trained and validated model

Recommendations for future research works

Evaluation of the effect of the inclusion of other drilling parameters such as standpipe pressure and mud flow rate as input parameters along with well logging to predict VS can be further investigated. According to the current findings, adding more related input parameters could provide models with higher prediction efficiencies. Involving other optimizers, such as genetic algorithms and firefly algorithms, in the development of a high-performance hybrid predictive model for VS prediction can also be considered in future research work (Choubin et al. 2019; Ghorbani et al. 2020b; Kalbasi et al. 2021; Mohamadian et al. 2022; Rajabi et al. 2022b). The application of the proposed method should be investigated in a wide range of applications, e.g., various energy, ecological and natural research applications (Ghorbani et al. 2017; Ghorbani et al. 2019; Taherei Ghazvinei et al. 2018; Ahmadi et al. 2020; Band et al. 2020a; Band et al. 2020b; Emadi et al. 2020; Lei et al. 2020; Shamshirband et al. 2020; Barjouei et al. 2021; Hazbeh et al. 2021a). From computational fluid, pressure and hydrological modeling to environmental simulation for instance (Ghalandari et al. 2019b; Rezakazemi et al. 2019; Seifi et al. 2020; Farsi et al. 2021a; Mahmoudi et al. 2021) the proposed methodology can be effective. For the future research the comparative analysis with other machine learning methods, e.g., (Asadi et al. 2019; Ghalandari et al. 2019a; Ghorbani et al. 2020c; Joloudari et al. 2020; Mosavi et al. 2020; Sadeghzadeh et al. 2020; Shabani et al. 2020; Abdali et al. 2021; Mosavi and Safaei-Farouji 2021) would be essential to bring an insight into the true potential of the proposed method. To improve the accuracy and the performance of the proposed method further deep learning, ensemble and hybrid methods for instance, those suggest in (Band et al. 2020b; Dehghani et al. 2020; Ghorbani et al. 2020a; Mosavi et al. 2020; Nabipour et al. 2020; Mousavi et al. 2021; Shamsirband and Mehri Khansari 2021) can come to the consideration.

Summary and conclusions

A large dataset of well-log data records compiled for the Asmari reservoir section penetrated by three Marun oil field wells (MN#163, MN#225 and MN#179) onshore Iran is compiled to predict shear wave velocity (VS). The performances of two hybrid machine learning prediction models (MELM-PSO and MELM-GA), one deep learning model (CNN), and commonly used empirical models to predict VS are compared using the compiled dataset. For supervised training of the MELM-PSO, MELM-GA, and CNN models data from two wells (MN#163 and MN#225; 6622 data records split 70%:30% between training and validation subsets) were initially evaluated. To independently test the best-performing trained model (CNN), 2072 data records of MN#179 previously unseen by the trained and validated model were also evaluated.

  • The recorded VS prediction performance (RMSE) ranks the DL, HML models and empirical equations as follows: (Best) CNN > MELM-PSO > MELM-GA > Castagna et al. > Eskandari et al. > Pickett > Brocher > Carroll (Worst).

  • The CNN model delivered the highest VS prediction accuracy based on supervised learning using data records from wells MN#163 and MN#225 (RMSE = 0.0456 km/s; R2 = 0.9808 when applied to all 6622 data records).

  • The hybrid machine learning algorithms MELM-PSO and MELM-GA, also provided highly credible VS predictions RMSE = 0.05 to 0.06 km/s; R2 ~ 0.96 when applied to all 6622 data records), whereas the empirical model achieved VS prediction accuracy of RMSE > 0.11 km/s and R2 < 0.87.

  • Applying the trained and validated CNN model to the previously unseen 2072 data records from the Asmari reservoir penetrated by well MN#179 achieved VS prediction accuracy of RMSE = 0.068 km/s and R2 = 0.97.

  • This impressive prediction performance confirms that the CNN model trained with supervised data from two wells can be applied to accurately predict VS in other Asmari reservoir sections in the Marun oil field from basic well log variables where VS logs have not been recorded.

  • Properly trained deep learning and hybrid machine learning models, such as those evaluated, offer a better method of predicting VS from multiple well-log variables, in a supervised context and with data previously unseen by the trained and validated models, than the commonly used empirical models based solely on VP data.