1 Introduction

Computational simulation is being increasingly used in the field of structural mechanics for design optimisation, reliability, uncertainty quantification or sensitivity analysis. In practical applications, the parameters identification and assessing the properties of mechanical structures with real physical experiments require a large number of simulations of the high-fidelity models. However, the repeated simulations of the parameterised model remain an extremely time-consuming process and involve a huge computational effort. In order to reduce the computational burden associated with predictive modelling of complex engineering problems, the model’s response is often replaced by a simplified mathematical model known as surrogate model, also referred to as meta-model. In general, the surrogate model is built by considering the original model as a black-box model which is evaluated only on a limited number of simulations. Accordingly, the approximated model can be built to resemble the behaviour of the high-fidelity model while preserving accuracy and being computationally cheaper to evaluate. Indeed, particularly over the last decade, different surrogate modelling techniques have been proposed in the literature such as proper orthogonal decomposition (POD) [1,2,3,4,5], polynomial response surfaces [6, 7], polynomial chaos expansion (PCE) [8,9,10], Gaussian process models [11,12,13] and support vector regression (SVR) [14,15,16]. Among all the mentioned surrogate model techniques, SVR has attracted considerable attention in the literature due to its potential ability, efficiency and robustness in approximating the model response in various engineering problems. For instance, SVR was applied for structural crashworthiness design optimization in [17]. Pan et al. [18] utilised SVR as a surrogate model for lightweight vehicle design. The least square support vector regression meta-model technique for sheet metal forming optimization was proposed in [19]. Moustapha et al. applied SVR for nonlinear structural analysis in [20]. More recently, SVR was applied for structural reliability analysis in [21] and for measuring the water quality parameters in [22]. Moreover, a comprehensive comparison between SVR and various meta-model techniques including response surface methodology, kriging, radial basis functions and multivariate adaptive regression splines was examined in [23]. The study shows that SVR outperformed all other meta-model techniques in terms of prediction accuracy and robustness.

In fact, many engineering problems involve non-smooth responses which exhibit discontinuity or sharp changes for certain combinations of the input parameters. This behaviour often occurs in the context of nonlinear finite element method where the output response may behave non-smoothly in the parametric space. Consequently, approximating the non-smooth response of the underlying physical problem by a smooth global surrogate model might lead to large errors and poor prediction capability. This issue has been addressed in the literature in [24,25,26]. To overcome this problem, several attempts have been proposed in the literature based on data mining techniques such as clustering [27] which automatically determines the input domain of similar responses. In addition, the decomposition of the input space was applied in [28] where the different behaviours of the response surface are locally approximated. The multi-element approaches were applied in [29,30,31] where the parametric space is discretised into non-overlapping elements. Thereafter, the surrogate model is constructed element-wise which weakens the non-smoothness influence of the response within each element. However, these techniques might impose additional computations to tackle the non-smooth behaviour of the output. In this paper, the optimisation of SVR parameters is investigated in order to build a single and global surrogate model with high capability to capture the non-smooth responses. The developed SVR model is examined through nonlinear elasto-plastic problem where the output of interest exhibits highly non-smooth behaviour in the inputs parametric space. In this paper, the accuracy and the computational cost of SVR are studied and compared for two types of kernel functions, namely Gaussian and Matèrn 5/2 kernels. More details about the behaviour of different types of kernel functions can be found in [32]. Thereafter, the computational cost of SVR is reduced by applying anisotropic training grid where the training points are only increased in the input parameter where the response behaves highly nonlinear. Finally, this paper provides a smoothing technique of the non-smooth output response using linear regression. The smoothing method can be applied to enhance the accuracy and prediction capability of SVR, in addition to reducing the computational cost.

2 Support vector regression

The main concept of SVR is to generate a surrogate model of the underlying complex physical response which depends only on a given training set \(S=\{{\textbf{x}}_i,y_i\}_{i=1}^{n}\) of n training points \({\textbf{x}}_i\in {\mathbb {R}}^{d}\) with corresponding output \(y_i \in {\mathbb {R}}\). The SVR aims at approximating a linear regression function

$$\begin{aligned} f\left( {\textbf{x}}\right) =\langle {\textbf{w}},\varvec{\phi }\left( {\textbf{x}}\right) \rangle +b , \end{aligned}$$
(1)

where \(\langle \cdot \,,\cdot \rangle \) denotes the inner product, \({\textbf{w}}\) and \(b \in {\mathbb {R}}\) are the regression parameters to be estimated using the training data \({\textbf{x}}\) and \(\varvec{\phi }\left( {\textbf{x}}\right) \) is the transformation of \({\textbf{x}}\) from the input space into the so called feature space F. Here, the vector \({\textbf{x}}\) can be mapped into the feature space by using a kernel function \(K\left( {\textbf{x}}_i\,,{\textbf{x}}_j\right) \) that defines an inner product in this feature space as

$$\begin{aligned} \langle \varvec{\phi }\left( {\textbf{x}}_i\right) ,\varvec{\phi }\left( {\textbf{x}}_j\right) \rangle =K\left( {\textbf{x}}_i,{\textbf{x}}_j\right) . \end{aligned}$$
(2)

Accordingly, the solution vector \({\textbf{w}}\) can be rewritten in its dual representation as [33]

$$\begin{aligned} {\textbf{w}}=\sum _{i=1}^n \left( {\hat{\alpha }}_i-\alpha _i\right) \varvec{\phi }\left( {\textbf{x}}_i\right) =\sum _{i=1}^n {\tilde{\alpha }}_i \varvec{\phi }\left( {\textbf{x}}_i\right) , \end{aligned}$$
(3)

where \({\hat{\alpha }}_i\) and \(\alpha _i\) are the Lagrange multipliers representing the dual variables. Substituting Eqs. (3) and  (2) into Eq. (1), the regression function becomes

$$\begin{aligned} f\left( {\textbf{x}}\right) =\sum _{i=1}^n {\tilde{\alpha }}_i K\left( {\textbf{x}}_i,{\textbf{x}}\right) +b=\tilde{\varvec{\alpha }}^T{\textbf{k}}+b\,, \end{aligned}$$
(4)

with \(\tilde{\varvec{\alpha }}=\left[ {\tilde{\alpha }}_1,\cdots ,{\tilde{\alpha }}_n\right] ^T\) and \({\textbf{k}}=\left[ k_1 ,\cdots , k_n\right] ^T\) where \(k_i=K\left( {\textbf{x}}_i,{\textbf{x}}\right) \). The variables \(\tilde{\varvec{\alpha }}\) and b are determined by means of an optimisation problem. For this purpose, the \(\varepsilon \)-intensive loss function is introduced as follows [34]:

$$\begin{aligned} {\mathcal {L}}^{\varepsilon }\left( {\textbf{x}},y,f\left( {\textbf{x}}\right) \right) =\vert y-f\left( {\textbf{x}}\right) \vert _{\varepsilon } = \textrm{max}\left( 0\,,\vert y-f\left( {\textbf{x}}\right) \vert -\varepsilon \right) . \end{aligned}$$
(5)

Evaluating Eq. (5) at the training data points \({\textbf{x}}_i\) leads to the definition of the slack variables

$$\begin{aligned} \xi _i+{\hat{\xi }}_i={\mathcal {L}}^{\varepsilon }\left( {\textbf{x}}_i,y_i,f\left( {\textbf{x}}_i\right) \right) \,. \end{aligned}$$
(6)

The interpretation of the parameter \(\varepsilon \) and the slack variables \(\xi _i\) and \({\hat{\xi }}_i\) are clarified in Fig. 1.

Fig. 1
figure 1

Nonlinear SVR structural parameters for one input parameter, the regression function \(f\left( x\right) :={\hat{y}}\left( x\right) \) and the \(\varepsilon -\)tube. The blue dots represent the training data points

It can be seen that, SVR aims to construct a function such that the training points are inside a tube of given radius \(\varepsilon \). The slack variables \(\xi \) and \({\hat{\xi }}\) indicate how far a training point lies outside the \(\varepsilon \)-tube. For instance, \({\hat{\xi }}=0\) if the corresponding training point lies above the tube. In contrast, \({\xi }=0\) if a training point is located below the tube. Finally, both slack variables take the value \(\xi ={\hat{\xi }}=0\) if the training points are inside the tube. Here, the points that are not strictly located inside the \(\varepsilon \)-tube are called the support vectors. Typically, the SVR formulation written as an optimisation problem consists of minimising the \(\varepsilon \)-intensive loss function through the calculation of its norm \(\frac{1}{2}\Vert {\textbf{w}}\Vert ^2\) as

$$\begin{aligned} \begin{aligned} \underset{{\textbf{w}},b,\hat{\varvec{\xi }},\varvec{\xi }}{\textrm{min}}\left\{ \frac{1}{2} \Vert {\textbf{w}}\Vert ^2+C\sum _{i=1}^n\left( \xi _i+{\hat{\xi }}_i\right) \right\} , \quad i=1,..,n,\\ \text {subject to} {\left\{ \begin{array}{ll} \langle {\textbf{w}},\varvec{\phi }\left( {\textbf{x}}_i\right) \rangle + b -y_i\le \varepsilon +\xi _i,\\ y_i-\langle {\textbf{w}},\varvec{\phi }\left( {\textbf{x}}_i\right) \rangle - b \le \varepsilon +{\hat{\xi }}_i ,\\ {\hat{\xi }}_i\ge 0,\\ \xi _i \ge 0, \end{array}\right. } \end{aligned} \end{aligned}$$
(7)

where C is a positive weighting constant known as the box constraint.

The primary optimisation problem represented in Eq. (7) can be transformed into its dual form with the aid of the Lagrange multiplier technique as [35, 36]

$$\begin{aligned} \begin{aligned}&\underset{\varvec{\hat{\alpha }},\varvec{\alpha }}{\textrm{max}} \left\{ L\left( {\varvec{\hat{\alpha }},\varvec{\alpha }}\right) \right\} ,&\\&\text {subject to} {\left\{ \begin{array}{ll} 0\le {\hat{\alpha }}_i,\alpha _i \le C,\\ \sum _{i=1}^n\left( {\hat{\alpha }}_i-\alpha _i\right) =0, \end{array}\right. }&i=1,..,n, \end{aligned} \end{aligned}$$
(8)

where

$$\begin{aligned} L=\sum _{i=1}^n\left( {\hat{\alpha }}_i-\alpha _i\right) y_i-\varepsilon \sum _{i=1}^n\left( {\hat{\alpha }}_i +\alpha _i\right) {-}\frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n\left( {\hat{\alpha }}_i-\alpha _i\right) \left( {\hat{\alpha }}_j-\alpha _j\right) K\left( {\textbf{x}}_i,{\textbf{x}}_j\right) , \end{aligned}$$
(9)

and by satisfying the Karush–Kuhn–Tucker complementary conditions

$$\begin{aligned} \alpha _i\left( \langle {\textbf{w}},\varvec{\phi }\left( {\textbf{x}}_i\right) \rangle + b -y_i-\varepsilon -\xi _i\right)&=0,\nonumber \\ {\hat{\alpha }}_i\left( y_i-\langle {\textbf{w}},\varvec{\phi }\left( {\textbf{x}}_i\right) \rangle -b -\varepsilon -{\hat{\xi }}_i\right)&=0,\nonumber \\ \alpha _i{\hat{\alpha }}_i&=0,\nonumber \\ \xi _i{\hat{\xi }}_i&=0,\nonumber \\ \left( \alpha _i-C\right) \xi _i&=0,\; \nonumber \\ \left( {\hat{\alpha }}_i-C\right) {\hat{\xi }}_i&=0. \end{aligned}$$
(10)

The solution of the optimisation problem results in the dual variables \(\tilde{\varvec{\alpha }}=\left[ {\tilde{\alpha }}_1\cdots {\tilde{\alpha }}_n\right] ^T\) with \({\tilde{\alpha }}_i={\hat{\alpha }}_i-\alpha _i\) and an optimal bias term b that satisfies the following equations [37]

$$\begin{aligned} b&=y_i-\sum _{j=1}^n \left( {\hat{\alpha }}_j-\alpha _j\right) \, K\left( {\textbf{x}}_i\,,{\textbf{x}}_j\right) -\varepsilon \quad \text {for}\,{\alpha }_i\in \left( 0,C\right) ,\nonumber \\ b&=y_i-\sum _{j=1}^n \left( {\hat{\alpha }}_j-\alpha _j\right) \, K\left( {\textbf{x}}_i\,,{\textbf{x}}_j\right) +\varepsilon \quad \text {for}\,{{{\hat{\alpha }}}}_i\in \left( 0,C\right) . \end{aligned}$$
(11)

In general, the parameters C and \(\varepsilon \) can be chosen as

$$\begin{aligned} \varepsilon =\frac{\text {iqr}\left( {\textbf{Y}}\right) }{13,49}, \end{aligned}$$
(12)

and

$$\begin{aligned} C=\frac{\text {iqr}\left( {\textbf{Y}}\right) }{1,349}, \end{aligned}$$
(13)

where \(\text {iqr}( {\textbf{Y}})\) is the interquartile range of the response spectrum \({\textbf{Y}}=\left\{ y_1,\cdots ,y_n\right\} \) of the input training data set.

2.1 Error measures

In order to ensure the quality and the prediction accuracy of the surrogate model, two main error measures are considered in this study. First, the overall quality of the SVR approximation is determined by using a relative error measure

$$\begin{aligned} \epsilon _\textrm{rel}=\frac{\sum _{i=1}^{n^*}\epsilon _{i}}{n^*}, \end{aligned}$$
(14)

where \(n^*\) is the number of the test data points and \(\epsilon _{i}\) is the relative absolute error, given by

$$\begin{aligned} \epsilon _{i}=\frac{\vert y_i-{\hat{y}}_i\vert }{\vert {\tilde{y}}\vert }, \end{aligned}$$
(15)

where \(y_i=y\left( {\textbf{x}}^*_i\right) \) is the exact response evaluated at the test data points \({\textbf{x}}^*\), \({\hat{y}}_i={\hat{y}}\left( {\textbf{x}}^*_i\right) \) is the SVR surrogate model evaluated at the same test points, and \({\tilde{y}}\) is the average output obtained from the exact response

$$\begin{aligned} {\tilde{y}}=\frac{\sum _{i=1}^{n^*} y_i}{n^*}. \end{aligned}$$
(16)

Second, a local error estimator known as Chebyshev norm estimator is used to determine the local prediction accuracy of SVR

$$\begin{aligned} \epsilon _{\max }=\text {max}\lbrace e_1,\cdots ,e_{n^*} \rbrace , \end{aligned}$$
(17)

where \(e_i\) is the absolute error defined as the absolute difference between each data point evaluated for the exact response and the SVR surrogate model

$$\begin{aligned} e_i=\vert y_i-{\hat{y}}_i\vert . \end{aligned}$$
(18)

3 Numerical studies

This numerical study aims to construct a global surrogate model using SVR that is capable of capturing highly non-smooth and nonlinear responses. In order to determine the appropriate kernel function K and the optimal values of the SVR parameters C and \(\varepsilon \), a one-dimensional phenomenological elasto-plastic model is firstly considered in this study. The model consists of three inputs parameters, namely the total strain \(\varepsilon \), the hardening parameter k and the yield stress \(\sigma ^y\). Thereafter, the optimised SVR parameters are applied for a two-dimensional four-point bending beam where the response of interest exhibits highly non-smooth behaviour in the parametric space. Furthermore, the computational cost of the SVR is reduced by using anisotropic training grid. In addition, the accuracy of the constructed SVR is improved by smoothing the response surface using linear regression.

3.1 One-dimensional elasto-plastic model

Consider a one-dimensional pure phenomenological elasto-plastic model where the strain \(\varepsilon \) takes a scalar quantity. The total strain \(\varepsilon \) can be additively decomposed into an elastic and plastic strain as

$$\begin{aligned} \varepsilon =\varepsilon ^e + \varepsilon ^p\,. \end{aligned}$$
(19)

The splitting is done by an return-mapping algorithm as described specifically for the one-dimensional case in [38]. In this example, the associated plastic strain \(\varepsilon ^p\) is considered as the quantity of interests by assigning the total strain \(\varepsilon \), the hardening parameter k and the yield stress \( \sigma ^y\) as input parameters denoted by \({\textbf{x}}=\left[ \varepsilon ,\;k,\;\sigma ^y\right] \). The variation ranges of the inputs parameters are given in Table 1. The Young’s modulus is fixed at \(E={210000}\,\hbox {MPa}\).

Table 1 Input parameters for one-dimensional elasto-plastic model

First, Cartesian grids are used as training- and test points, e.g. with eight points in each dimension

$$\begin{aligned} {\mathfrak {X}}_{8\times 8 \times 8}&=\lbrace \varepsilon _i \rbrace _{i=1}^{8} \times \lbrace k_i \rbrace _{i=1}^{8} \times \lbrace \sigma ^y_i \rbrace _{i=1}^{8}. \end{aligned}$$
(20)

The training grid (20) and the separation of the parametric space into an elastic region (\(\varepsilon ^p=0\)) and a plastic region (\(\varepsilon ^p>=0\)) are shown in Fig. 2.

Fig. 2
figure 2

Cartesian grid \({\mathfrak {X}}_{8\times 8 \times 8}\) for the input parameters \({\textbf{x}}_i=\left[ \varepsilon _i,k_i,\sigma ^y_i\right] _{i=1}^{ 512}\;\) (left) and the border surface between \(\varepsilon ^p>0\) and \(\varepsilon ^p=0\;\) (right)

The reference response surface shown in Fig. 3 is obtained from the exact solution by fixing the value of the total strain at \(\varepsilon ={0,005}\) and \(\varepsilon ={0,0028}\), respectively. It can be seen that the exact response appears to have only nonlinear behaviour in Fig. 3a and begins to behave non-smoothly (e.g. \(C^0\)-continuous in the elastic region where \(\varepsilon ^p=0\)) by decreasing the value of the total strain, as shown in Fig. 3b.

Fig. 3
figure 3

Exact response surface \(\varepsilon ^p\) obtained from the exact solution by fixing the value of the total strain \(\varepsilon \)

In order to adopt a kernel function that can capture the non-smooth transition of the response surface in the input parametric space, two kernel functions are applied in this example.

First, the Gaussian kernel function defined as

$$\begin{aligned} K({\textbf{x}}_i,{\textbf{x}}_j)=\exp \left( -\Vert {\textbf{x}}_i^T{\textbf{x}}_j\Vert ^2\right) , \end{aligned}$$
(21)

is applied to build the surrogate model using SVR. The training points are distributed uniformly in the parametric space with ten points in each dimension. The exact response surface and the approximated one obtained from SVR at a fixed value of the total \(\varepsilon ={0.0028}\) are clarified in Fig. 4b.

Fig. 4
figure 4

Exact response surface () and the SVR approximation () using the Gaussian Kernel function at a fixed value of the total strain \(\varepsilon \)

It can be seen that the Gaussian kernel provides a qualitatively sufficient approximation of the exact response surface in the range where \(\varepsilon ^p=0\), and it exhibits a small deviation in the nonlinear region where \(\varepsilon ^p>0\).

Now, the Matèrn 5/2 kernel function defined as

$$\begin{aligned} K({\textbf{x}}_i,{\textbf{x}}_j)=\left( 1+\sqrt{5}\frac{\Vert {\textbf{x}}_i -{\textbf{x}}_j\Vert }{\rho }+\frac{5}{3}\frac{\Vert {\textbf{x}}_i-{\textbf{x}}_j\Vert ^2}{\rho ^2}\right) \exp \left( -\sqrt{5}\frac{\Vert {\textbf{x}}_i-{\textbf{x}}_j\Vert }{\rho }\right) , \end{aligned}$$
(22)

is applied to the same problem and compared with the Gaussian kernel using the error criteria discussed in Sect. 2.1. In both kernels, the correlation length is set to \(\rho =1\), and the optimal values of the SVR parameters, namely the box constraint C and the tube width \(\varepsilon \), are experimentally determined as

$$\begin{aligned} \varepsilon _\textrm{opt}&=\frac{\text {iqr}\left( {\textbf{Y}}\right) }{200} \end{aligned}$$
(23)
$$\begin{aligned} C_\textrm{opt}&=\frac{\text {iqr}\left( {\textbf{Y}}\right) }{0,001}\;. \end{aligned}$$
(24)

The absolute error \(\varepsilon _{\max }\) and the relative error \(\varepsilon _\textrm{rel}\) using the Gaussian and Matèrn kernel functions with respect to the number of training points are shown in Fig. 5.

Fig. 5
figure 5

Accuracy versus the number of training points of the SVR surrogate model using Gaussian and Matèrn kernel functions

It can be seen that the Matèrn kernel function outperforms the Gaussian kernel for both error measures. The Gaussian kernel exhibits high oscillations in the absolute error and an increment in the relative error by increasing the number of the training points (\(n>500\)). On the contrary, the Matèrn kernel provides higher accuracy by increasing the number of the training points for the both errors. However, it shows a slight oscillation for the number of training points (\(n>10^4\)) due to the over-fitting phenomenon.

3.2 Four-point bending beam with elasto-plastic material behaviour

In this numerical study, the SVR is investigated on a nonlinear elasto-plastic analysis of a four-point bending test. The \({200}\,\hbox {mm} \times {20}\,\hbox {mm}\) beam is subjected to displacement-controlled loading at \(x= {60}\,\hbox {mm} \) and \(x= {140}\,\hbox {mm}\), respectively. The beam is supported at \(x= {10}\,\hbox {mm}\) and \(x= {190}\,\hbox {mm}\). The Poisson’s ratio \(\upsilon =0.3\) and a von Mises ideal plasticity yield criterion with a plane stress assumption are considered in this example. The height of non-plastified cross-section \(d^{el}\) is considered as a quantity of interest by assuming the Young’s modulus E, a yield stress \(\sigma ^y\) and the applied displacement \({\bar{u}}\) as inputs parameters. The variation ranges of the inputs parameters are given in Table 2.

Table 2 Input parameters for two-dimensional elasto-plastic model

The contour plot of the equivalent plastic strain \(\varepsilon ^p_{eq}\) obtained from FEM simulation at a fixed value of displacement \({\bar{u}}={0.2}\,\hbox {mm}\) is plotted in Fig. 6.

Fig. 6
figure 6

Equivalent plastic strain \(\varepsilon ^p_{eq}\) (PEEQ) at a fixed value of applied displacement \({\bar{u}}\;(\hbox {mm})\) where \(d^{el}\) denotes the not plastified cross section

The non-plastified cross-section height at the centre of the beam is measured at the centre of gravity (centroid) of the non-plastified elements as

$$\begin{aligned} d^{el}=N_{e}\,\vert _{\varepsilon ^p_{eq}=0}\,l_e, \end{aligned}$$
(25)

where \(N_e\vert _{\varepsilon ^p_{eq}=0}\) is the number of elements with zero centroid equivalent plastic strain and \(l_e\) is the element length. In general, the equivalent plastic strain can be estimated as

$$\begin{aligned} \varepsilon ^p_{eq}:=\int _0^t\sqrt{\frac{2}{3}}\Vert \dot{\varvec{\varepsilon }}^p\Vert \,\textrm{d}\tau =\sqrt{\frac{2}{3}\,\varvec{\varepsilon }^p\cdot \cdot \;\varvec{\varepsilon }^p}\;, \end{aligned}$$
(26)

where \(\varvec{\varepsilon }^p\) denotes the plastic strain tensor. For the sake of higher precision, the equivalent plastic strain is evaluated at the centroid of each element where super convergence is guaranteed, as it is clarified in Fig. 7.

Fig. 7
figure 7

Mesh quantities and its distances at the symmetry edge at the centre of the beam

In this example, the beam height at the centre contains \(N_{e}=100\) finite elements with an element width length equal to \(l_e={0,2}\,\hbox {mm}\). The exact response surface obtained from \(n=8000\) FEM simulations using Cartesian grid distributed equally in the input parametric space is calcified in Fig. 8. It can be observed that the response surface exhibits highly non-smooth and discontinuous in addition to linear and nearly flat behaviour for certain combinations of the input parameters. This discontinuity is mainly caused due to the fact that the non-plastified height of the beam \(d^{el}\) is measured by counting the number of element \(N_e\) with the zero equivalent plastic strain \(\varepsilon ^{p}_{eq}\) along the beam cross section. Consequently, \(d^{el}\) can only change by the element length \(l_e\) for different combination of the input parameters. In order to approximate this response surface, the SVR model is built using the optimal parameters of the box constraint C and the tube width \(\varepsilon \) (see eqs. (23) and (24)) using the Matèrn 5/2 kernel function.

Fig. 8
figure 8

Exact response surface \(d^{el}\) obtained from \(n=8000\) FEM simulations using Cartesian grid at different values of displacement \({\bar{u}}\)

The accuracy of the SVR approximation using the error criteria discussed in Sect. 2.1 is given in Fig. 9. It is evident that the SVR approximation shows higher local error where the response behaves discontinuously, as shown in Fig. 9a. It requires approximately \(n=6860\) training points to obtain \(\epsilon _{\max }=0.44\) absolute error. On the contrary, the global relative error is \(\epsilon _\textrm{rel}=0.017\) for the same number of training points.

Fig. 9
figure 9

Accuracy versus the number of training points of the SVR surrogate model using the Matèrn kernel function

In fact, the response surface shown in Fig. 8 behaves approximately linearly with respect to the yield stress \(\sigma ^y\) and the Young’s modulus E, whereas it shows nonlinearity with respect to the displacement loading \({\bar{u}}\). Therefore, a natural choice to reduce the computational cost of the SVR model is to assign a higher number of training points only in the direction of the input parameter \({\bar{u}}\). The reduction in the training points in specific dimensions can be applied by using anisotropic training grids. For the sake of comparison with the previously used isotropic training grids, the number of training points in the anisotropic grid is reduced by half in the parameter directions \(\sigma ^y\) and E, while it remains at the same number of training points in the direction of \({\bar{u}}\). A comparison between the isotropic and anisotropic training grid is shown Fig. 10.

Fig. 10
figure 10

Anisotropic \({\mathfrak {X}}_{19\times 10 \times 10}\) training grid with \(n=1900\) training points (right) and the isotropic \({\mathfrak {X}}_{19\times 19 \times 19}\) training grid with \(n=6859\) training points (left)

The accuracy of the SVR approximation using the anisotropic training grid in comparison with the isotropic one is given in Fig. 11. It can be seen that the anisotropic training grid provides approximately the same accuracy as the isotropic grid while reducing the computational cost by a factor of four. For instance, it requires only \(n=1900\) training points to achieve \(\epsilon _{\max }\approx {0,4}\,\hbox {mm}\) absolute error and \(\epsilon _\textrm{rel}\approx {0,017}\) relative error, whereas the isotropic grid requires \(n=6890\) training points to obtain the same accuracy.

Fig. 11
figure 11

Accuracy versus the number of training points of the SVR surrogate model using anisotropic training grid

The computational time required for SVR surrogate model to obtain the target absolute error \(\epsilon _{\max }\approx {0,43}\,\hbox {mm}\) using anisotropic training grid is given in Table 3.

Table 3 Computational cost of SVR surrogate model using anisotropic training grid

Table 3 shows that after constructing the SVR surrogate model, the computational time is reduced to 0.18 s compared with 2 s computation time to run a single FEM simulation. This allows a reduction in the computational cost by a factor of 11.

3.2.1 Smoothing the response surface

In fact, the non-plastified height of the beam \(d^{el}\) discussed in Sect. 3.2 is mainly measured by counting the number of finite elements \(N_e\) with zero centroid equivalent plastic strain \(\varepsilon _{eq}^p=0\) using Eq. (25). This mesh dependency causes the response surface to behave discontinuously in the parametric space since the non-plastified height \(d^e\) can only change by the element length \(l_e\). This discontinuous behaviour can be weakened by estimating the non-plastified height of the beam depending on the number of integration points with zero equivalent plastic strain. However, this might only weaken the non-smoothness without relieving the mesh dependency of the response surface \(d^{el}\). Alternatively, this section provides a measure of the equivalent plastic strain \(\varepsilon _{eq}^{p}\) at the centre of the beam depending on the state of all integration points across the beam cross section, including the ones with \(\varepsilon _{eq}^p>0\) by using two linear regression functions, defined as

$$\begin{aligned} y^{+}({{\tilde{\varepsilon }}}^{p}_{eq})=\alpha _{0}+\alpha _{1}x^{+}, \end{aligned}$$
(27)

and

$$\begin{aligned} y^{-}({{\tilde{\varepsilon }}}^{p}_{eq})=\beta _{0}+\beta _{1}x^{-}, \end{aligned}$$
(28)

where the signs \((+,-)\) indicate the positive and negative global y-coordinates of the integration points, respectively, and \(\alpha \) and \(\beta \) are the linear combination coefficients to be estimated. Once the linear combinations coefficients are estimated using least squares method, the non-plastified height \(d^{el}\) can be simply calculated as

$$\begin{aligned} d^{el}= y^{+}({{\tilde{\varepsilon }}}^{p}_{eq}=0)-y^{-}({{\tilde{\varepsilon }}}^{p}_{eq}=0). \end{aligned}$$
(29)

An example of the regression functions along the beam height is given in Fig. 12.

Fig. 12
figure 12

Linear regression smoothing technique for measuring the equivalent plastic strain \(\varepsilon _{eq}^{p}\)

Fig. 13
figure 13

Different measures of the equivalent plastic strain \(\varepsilon ^{p}_{eq}\) to estimate the quantity of interest \(d^{el}_i\) using \(n=8000\) combination of the input parameters sorted in descending order

A comparison of the response surface obtained from \(n=8000\) combinations of the input parameters using different measures of the equivalent plastic strain \(\varepsilon ^{p}_{eq}\), namely at the centroid of the element, integration points and by using linear regression, is plotted in Fig. 13.

Figure 13c shows that using the linear regression to measure the equivalent plastic strain provides a smooth transition of the quantity of interest \(d^{el}\) in the parametric space unlike the centroid and integration points measures presented in Figs. 13a and b, respectively. The SVR approximation of the non-plastified height \(d^{el}\) is given in Fig. 14.

First, it can be observed from the global relative error \(\epsilon _\textrm{rel}\) , plotted in Fig. 14b, that the smoothed curve using linear regression provides a significant improvement in the accuracy of the SVR approximation compared with the centre point and the integration points. On the contrary, it exhibits approximately identical local accuracy with the integration points, as shown in Fig. 14a. This is mainly due to the fact that the quantity of interest \(d^{el}\) remains constant at \(d^{el}={20}\,\hbox {mm}\) when the beam is in a fully elastic state for different combination of the input parameters. Thereafter, it exhibits \(C^{0}\) transition in the parametric space where the cross-section height begins to be partially plastified where \(d^{el}<{20}\,\hbox {mm}\), as clarified in Fig. 13.

Fig. 14
figure 14

Accuracy versus the number of training points of the SVR surrogate model using different measures of the equivalent plastic strain \(\varepsilon ^{p}_{eq}\) to estimate the quantity of interest \(d^{el}\)

4 Conclusion

This study has presented a global surrogate modelling for mechanical systems with nonlinear elasto-plastic material behaviour based on support vector regression (SVR). First, SVR is applied to a purely phenomenological one-dimensional material model. In the numerical study, it is shown that the Matèrn 5/2 kernel function with appropriately optimised parameters tends to have more accurate representation of the output (the plastic strain \(\varepsilon ^{p}\)). It outperforms the Gaussian kernel. The optimal values of SVR parameters such as the box constraint C and tube width \(\varepsilon \) are determined in the first numerical study. The results obtained from the one-dimensional elasto-plastic example are therefore applied to continuum elasto-plasticity within the framework of the finite element method. The four-point bending with elasto-plastic material behaviour is chosen as a technical application. The quantity of interest is considered as the non-plastified cross-sectional height \(d^{el}\) by representing the displacement magnitude \({\bar{u}}\), the Young’s modulus E and the yield stress \(\sigma ^y\) as inputs parameters. It is shown that the post-processing and the element size of the finite element mesh have a significant impact on the accuracy of the SVR approximation. Consequently, it causes the quantity of interest to behave discontinuously in the parametric space. To overcome this problem, the quantity of interest \(d^{el}\) is smoothed over the input space by using linear regression. The results show that SVR can provide relatively a sufficient approximation of a highly nonlinear and non-smooth responses surfaces while the computational cost is reduced by a factor of 11. Finally, it is worth mentioning that only three input parameters are considered in this paper. However, the accuracy and the performance of the SVR surrogate model might be affected for high-dimensional non-smooth engineering problems.