Granular Elastic Network Regression with Stochastic Gradient Descent

He, Linjie; Chen, Yumin; Zhong, Caiming; Wu, Keshou

doi:10.3390/math10152628

Open AccessArticle

Granular Elastic Network Regression with Stochastic Gradient Descent

¹

College of Computer Science and Technology, Xiamen University of Technology, Xiamen 361024, China

²

College of Science and Technology, Ningbo University, Ningbo 315211, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(15), 2628; https://doi.org/10.3390/math10152628

Submission received: 6 June 2022 / Revised: 8 July 2022 / Accepted: 25 July 2022 / Published: 27 July 2022

(This article belongs to the Special Issue Soft Computing and Uncertainty Learning with Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Linear regression is the use of linear functions to model the relationship between a dependent variable and one or more independent variables. Linear regression models have been widely used in various fields such as finance, industry, and medicine. To address the problem that the traditional linear regression model is difficult to handle uncertain data, we propose a granule-based elastic network regression model. First we construct granules and granular vectors by granulation methods. Then, we define multiple granular operation rules so that the model can effectively handle uncertain data. Further, the granular norm and the granular vector norm are defined to design the granular loss function and construct the granular elastic network regression model. After that, we conduct the derivative of the granular loss function and design the granular elastic network gradient descent optimization algorithm. Finally, we performed experiments on the UCI datasets to verify the validity of the granular elasticity network. We found that the granular elasticity network has the advantage of good fit compared with the traditional linear regression model.

Keywords:

granular computing; granular regression; elastic network; regression

MSC:

68T01

1. Introduction

In the 1970s, Zadeh, an American automatic control expert, introduced the concept of fuzzy sets [1]. Fuzzy sets and fuzzy logic have attracted a large number of scholars worldwide to research on the topic. Hobbs proposed the concept of granularity [2], and which is consistent with human cognition, memory, reasoning, and logic. After this, Zadeh introduced the concept of fuzzy information granulation [3]. He stated that information granulation is the division of a whole into many information granules, after which those granules are studied. In 1999, T.Y. Lin proposed granular computing [4] on this base and applied it to the field of data mining [5]. Skowron proposed a granular calculation method based on rough set theory [6]. Liu [7] constructed an approach to information granule reasoning from the perspective of fuzzy logic. Then, in 2001, Yao defined the neighborhood system [8] and studied the connection between hierarchical granulation and rough set approximation. In 2002, Miao proposed a knowledge-based approach to granular computing [9]. Then Miao [10] reconstructed the granular computing structure from the perspective of set theory. Wang et al. applied granular computing to big data information processing [11]. Chen [12] et al. presented a PS-tree based rough set feature selection method. In 2014, Qian reduced the computation time of the attribute approximate reduction problem by using parallel granular computation [13]. After that the neighborhood system granular structure, distance, and metric were proposed by Chen [14]. Chen further introduced the rough granular KNN algorithm. Li [15] combined the boosting algorithm with the granular KNN algorithm based on Chen’s work [14] to further improve the performance of the granular KNN algorithm. Although different scholars have different understandings of the definition of granules, the structural idea of granules has attracted more and more scholars to conduct research. Thus granular computing has been applied to a variety of fields, including information systems [16], image processing [17], cognitive systems [18], attribute selection [19,20], multi-attribute decision making [21], and other fields [22,23,24,25,26].

Machine learning is a general data processing technique that is divided into unsupervised learning and supervised learning. Unsupervised learning includes PCA, local linear embedding, K-means clustering, and other algorithms. Supervised learning mainly includes K-nearest neighbor, decision tree, linear regression, logistic regression, support vector machine, etc. Regression models are supervised learning modalities including linear regression, ridge regression, lasso regression, and elastic network regression. Linear regression [27] simulates the relationship between a dependent variable and one or more independent variables using the linear function. By averaging the weights by adding L2 penalty, ridge regression [28] can improve the fitting ability of regression models in pathological data. The lasso [29] regression can suppress the overfitting of linear regression to some extent, and the lasso regression obtains a sparse model by adding L1 regularization to make some weight coefficients zero. In contrast, elastic networks combine ridge regression and lasso regression to both sparse the weights and shrink the weight coefficients [30]; however, these traditional regression models struggle to deal with uncertainty data and set data. Granular computing is an effective tool to deal with uncertain data; therefore, we construct a granular-based regression model of elastic networks. The contributions of this paper are as follows.

First, we propose a granulation approach based on feature distance and define various granulation operations and granularity measure approaches. Secondly, we construct the granular elastic network regression system and give the loss function of granular elastic network. Similarly, we optimize the granular elastic network regression equation by stochastic gradient descent. Finally, we conducted experiments on the UCI datasets to verify the effectiveness of the granular elastic network and found that the granular elastic network has a better fitting performance than the traditional regression models.

This article consists of the following parts. The first chapter briefly introduces the granular computations and regression algorithms. Then, we define the granularity approach in Section 2. In Section 3, we propose the granular elastic network regression and further propose the granular elastic network optimizer and its optimization algorithm. After that, we show the experimental results in Section 4. Finally, we conclude the whole paper in Section 5.

2. Granulation

Fuzzy sets are an effectual tool for dealing with uncertain information. For the information system

U = (X, C, D)

,

X = {x_{1}, x_{2}, x_{3}, \dots, x_{n}}

is the sample set,

C = {c_{1}, c_{2}, c_{3}, \dots, c_{m}}

is the feature set corresponding to the sample, and D is the decision set corresponding to the sample. For the given sample

x \in X

, where the single feature

c \in C

,

v (x, c) \in [0, 1]

denotes the value of the sample x normalized on the feature c.

Definition 1.

In the data set

U = (X, C, D)

, samples

x_{1}, x_{2} \in X

, and a single feature

c \in C

. The distance between

x_{1}

and

x_{2}

on the feature c is:

s_{c} (x_{1}, x_{2}) = | v (x_{1}, c) - v (x_{2}, c) | .

(1)

Definition 2.

For the data set

U = (X, C, D)

, we granulate the sample

x \in X

on the feature

c \in C

. The sample x is granulated on feature c and forms a granule, which is defined as feature c to form a fuzzy granule, which is defined as:

g_{c} (x) = {g_{c} {(x)}_{j}}_{j = 1}^{k} = {r_{j}}_{j = 1}^{k} = {r_{1}, r_{2}, . . ., r_{k}},

(2)

w h e r e r_{j} = S_{c} (x, x_{j}) = | v (x, c) - v (x_{j}, c) | .

(3)

r_{j}

is the distance between sample x and sample

x_{j}

on feature c. It is easy to know from Definition 1 that

r_{j} = s_{c} (x, x_{j}) \in [0, 1]

. We define

g_{c} (x)

as the granule and

g_{c} {(x)}_{j}

as the

j t h

granule kernels of the granule

g_{c} (x)

, and the granule consists of the granule kernels.

Definition 3.

For the data set

U = (X, C, D)

, there is any

x \subseteq X

and any feature subset

B = {b_{1}, b_{2}, b_{3}, . . ., b_{m}} \subseteq X

. The granular vector of x on the feature subset B is:

G_{B} (x) = {(g_{b 1} (x), g_{b 2} (x), g_{b 3} (x), \dots, g_{b m} (x))}^{T} .

(4)

g_{b m} (x)

is the granule of x on the feature

b_{m}

. The granular vector is made up of granules, and the granules are consisted of granular kernels. So the granular vector can also be represented by the granular kernel matrix as follows.

G (x) = [\begin{matrix} g_{1} {(x)}_{1} & g_{1} {(x)}_{2} & \begin{matrix} \dots & g_{1} {(x)}_{k} \end{matrix} \\ g_{2} {(x)}_{1} & g_{2} {(x)}_{2} & \begin{matrix} \dots & g_{2} {(x)}_{k} \end{matrix} \\ \begin{matrix} \dots \\ g_{m} {(x)}_{1} \end{matrix} & \begin{matrix} \dots \\ g_{m} {(x)}_{2} \end{matrix} & \begin{matrix} \begin{matrix} \dots \\ \dots \end{matrix} & \begin{matrix} \dots \\ g_{m} {(x)}_{k} \end{matrix} \end{matrix} \end{matrix}] = [\begin{matrix} r_{11} & r_{12} & \begin{matrix} \dots & r_{1 k} \end{matrix} \\ r_{21} & r_{22} & \begin{matrix} \dots & r_{2 k} \end{matrix} \\ \begin{matrix} \dots \\ r_{m 1} \end{matrix} & \begin{matrix} \dots \\ r_{m 2} \end{matrix} & \begin{matrix} \begin{matrix} \dots \\ \dots \end{matrix} & \begin{matrix} \dots \\ r_{m k} \end{matrix} \end{matrix} \end{matrix}]

(5)

The granular vector

g (x)

is a set consisting of granules, while the elements of a conventional vector are real numbers.

Definition 4.

For the data set

U = (X, C, D)

, there is any

x \subseteq X

and the decision set

d_{x} \subseteq D

. The extended decision granule of x is:

Y (x) = {d_{x}, d_{x}, . . ., d_{x}},

(6)

where the number of the elements is the size of X.

Example 1.

The following is an example of granulation.

As shown in Table 1, the sample set

X = {x_{1}, x_{2}, x_{3}, x_{4}}

is granulated on feature a as:

g_{a} (x_{1}) = {0, 0.65, 0.6, 0.3}, g_{a} (x_{2}) = {0.65, 0, 0.05, 0.35}, g_{a} (x_{3}) = {0.6, 0.05, 0, 0.3}, g_{a} (x_{4}) = {0.3, 0.35, 0.3, 0} .

(7)

The sample set

X = {x_{1}, x_{2}, x_{3}, x_{4}}

is granulated on feature b as:

g_{b} (x_{1}) = {0, 0.1, 0.5, 0.2}, g_{b} (x_{2}) = {0.1, 0, 0.6, 0.3}, g_{b} (x_{3}) = {0.5, 0.6, 0, 0.3}, g_{b} (x_{4}) = {0.2, 0.3, 0.3, 0} .

(8)

The sample set

X = {x_{1}, x_{2}, x_{3}, x_{4}}

is granulated on feature c as:

g_{c} (x_{1}) = {0, 0.1, 0.3, 0.5}, g_{c} (x_{2}) = {0.1, 0, 0.4, 0.4}, g_{c} (x_{3}) = {0.3, 0.4, 0, 0.8}, g_{c} (x_{4}) = {0.5, 0.4, 0.8, 0} .

(9)

The feature granular vectors of the sample

X = {x_{1}, x_{2}, x_{3}, x_{4}}

are:

G_{{a, b, c}} (x_{1}) = {(g_{a} (x_{1}), g_{b} (x_{1}), g_{c} (x_{1}))}^{T} = {({0, 0.65, 0.6, 0.3}, {0, 0.1, 0.5, 0.2}, {0, 0.1, 0.3, 0.5})}^{T},

(10)

G_{{a, b, c}} (x_{2}) = {(g_{a} (x_{2}), g_{b} (x_{2}), g_{c} (x_{2}))}^{T} = {({0.65, 0, 0.05, 0.35}, {0.1, 0, 0.6, 0.3}, {0.1, 0, 0.4, 0.4})}^{T},

(11)

G_{{a, b, c}} (x_{3}) = {(g_{a} (x_{3}), g_{b} (x_{3}), g_{c} (x_{3}))}^{T} = {({0.5, 0.6, 0, 0.3}, {0.5, 0.6, 0, 0.3}, {0.3, 0.4, 0, 0.8})}^{T},

(12)

G_{{a, b, c}} (x_{4}) = {(g_{a} (x_{4}), g_{b} (x_{4}), g_{c} (x_{4}))}^{T} = {({0.3, 0.35, 0.3, 0}, {0.2, 0.3, 0.3, 0}, {0.5, 0.4, 0.8, 0})}^{T} .

(13)

The decision set

Y = {x_{1}, x_{2}, x_{3}, x_{4}}

is extended and granularized as

Y {x_{1}} = {22.5, 22.5, 22.5, 22.5},

(14)

Y {x_{2}} = {33, 33, 33, 33},

(15)

Y {x_{3}} = {45, 45, 45, 45},

(16)

Y {x_{4}} = {10.05, 10.05, 10.05, 10.05} .

(17)

3. Granular Elastic Network Regression

3.1. Granular Operations and Measures

To construct the granular regression system, in this subsection, we give the granular operators and measures.

Definition 5.

Let

g_{a} (x)

,

g_{b} (x)

be the two granules of sample x on features

a, b

, respectively. The addition, subtraction, multiplication, and division operations for the granules are:

g_{a} (x) + g_{b} (x) = {g_{a} {(x)}_{j} + g_{b} {(x)}_{j}}_{j = 1}^{k},

(18)

g_{a} (x) - g_{b} (x) = {g_{a} {(x)}_{j} - g_{b} {(x)}_{j}}_{j = 1}^{k},

(19)

g_{a} (x) * g_{b} (x) = {g_{a} {(x)}_{j} * g_{b} {(x)}_{j}}_{j = 1}^{k},

(20)

g_{a} (x) / g_{b} (x) = {g_{a} {(x)}_{j} / g_{b} {(x)}_{j}}_{j = 1}^{k} .

(21)

Definition 6.

Let

g_{a} (x)

,

g_{a} (y)

be the two granules of samples x and y on features a. The addition, subtraction, multiplication, and division operations for the granules are:

g_{a} (x) + g_{a} (y) = {g_{a} {(x)}_{j} + g_{a} {(y)}_{j}}_{j = 1}^{k},

(22)

g_{a} (x) - g_{a} (y) = {g_{a} {(x)}_{j} - g_{a} {(y)}_{j}}_{j = 1}^{k},

(23)

g_{a} (x) * g_{a} (y) = {g_{a} {(x)}_{j} * g_{a} {(y)}_{j}}_{j = 1}^{k},

(24)

g_{a} (x) / g_{a} (y) = {g_{a} {(x)}_{j} / g_{a} {(y)}_{j}}_{j = 1}^{k} .

(25)

The result of four operations with two granules is also a granule. The operations in Definition 5 are granular operations for granules of the same sample on different features, and the operations in Definition 6 are granular operations for granules of different samples on the same feature.

Definition 7.

Let the granular vector be

G (x_{j}) = {(g_{1} {(x)}_{j}, g_{2} {(x)}_{j}, g_{3} {(x)}_{j}, \dots, g_{m} {(x)}_{j})}^{T}, G (x_{i}) = {(g_{1} {(x)}_{i}, g_{2} {(x)}_{i}, g_{3} {(x)}_{i}, \dots, g_{m} {(x)}_{i})}^{T}

, the dot product of these two granular vectors is defined as:

G (x_{i}) \cdot G (x_{j}) = g_{1} (x_{i}) * g_{1} (x_{j}) + g_{2} (x_{i}) * g_{2} (x_{j}) + g_{3} (x_{i}) * g_{3} (x_{j}) + \dots + g_{m} (x_{i}) * g_{m} (x_{j}) .

(26)

The dot product of two granular vectors outcomes to a granule.

Definition 8.

Let the granule be

g (x) = {r_{j}}_{j = 1}^{k}

, whose size is:

s i z e (g (x)) = | g (x) | = \sum_{j = 1}^{k} r_{j}

(27)

Definition 9.

Let the granule be

g (x) = {r_{j}}_{j = 1}^{k}

, and its granular norms as follows:

(a). The 1-norm of the granule:

N o r m - 1 (g (x)) = | {| g (x) | |}_{1} = \sum_{j = 1}^{k} | r_{j} |,

(28)

(b). The 2-norm of the granule:

N o r m - 2 (g (x)) = | {| g (x) | |}_{2} = \sqrt{\sum_{j = 1}^{k} {r_{j}}^{2}},

(29)

(c). The p-norm of the granule:

N o r m - p (g (x)) = {| | g (x) | |}_{p} = {(\sum_{j = 1}^{k} {r_{j}}^{p})}^{\frac{1}{p}} .

(30)

Definition 10.

Let the granular vector be

g (x) = {r_{j}}_{j = 1}^{k}

, and its granular vector norms as follows:

(a). The 11-norm of the granular vector:

N o r m - 11 (G {(x}_{i})) = {| | G (x_{i}) | |}_{11} = | \sum_{j = 1}^{i} \sum_{j = 1}^{k} | r_{j} | |,

(31)

(b). The 12-norm of the granular vector:

N o r m - 12 (G {(x}_{i})) = {| | G (x_{i}) | |}_{12} = \sum_{j = 1}^{i} \sqrt{\sum_{j = 1}^{k} {r_{j}}^{2}},

(32)

(c). The 21-norm of the granular vector:

N o r m - 21 (G {(x}_{i})) = {| | G (x_{i}) | |}_{21} = \sqrt{\sum_{j = 1}^{i} {(\sum_{j = 1}^{k} r_{j})}^{2}},

(33)

(d). The 22-norm of the granular vector:

N o r m - 22 (G {(x}_{i})) = {| | G (x_{i}) | |}_{22} = \sqrt{\sum_{j = 1}^{i} \sqrt{\sum_{j = 1}^{k} {r_{j}}^{2}}} .

(34)

3.2. Granular Elasticity Regression Model

In the previous subsection, we defined several granular operations and granular norms. In the following, we construct our granular elastic network regression model with the defined granular operations.

Definition 11.

In the regression data set

U = (X, C, D)

,

x \in X

is granularized and expanded according to Definition 1 as

G (x) = (g_{1} (x), g_{2} (x), g_{3} (x), \dots, g_{m} (x), 1)

, and the weight granular vector is

{W = (w}_{1}, w_{2}, w_{3}, \dots, w_{m}, b)

. The granular regression equation is defined as:

R e g (x) = W \cdot G (x) = w_{1} * g_{1} (x) + w_{2} * g_{2} (x) + \dots + w_{m} * g_{m} (x) + 1 * b .

(35)

Similar to the traditional linear regression weight randomization, the granular regression network also uses the normalized distribution to randomly initialize the granular weights.

In the traditional regression model, lasso regression can make the weights sparse, while ridge regression can shrink the weight coefficients to make the weights average. By combining the L1 penalty with the L2 penalty, the elastic network is able to both sparse the weights and shrink the weight coefficients. Similarly, in the granular regression system the granular elastic network uses a granular vector 11-norm penalty and a granular vector 22-norm penalty. The loss function of granular elastic network regression is defined as follows.

Definition 12.

In the regression data set U = (X,C,D), the weight granular vector is

{W = (w}_{1}, w_{2}, \dots, w_{m}, b)

, the granular vector is

G (x) = (g_{1} (x), g_{2} (x), g_{3} (x), \dots, g_{m} (x), 1)

and the decision granular vector is

G (y) = (g_{1} (y), g_{2} (y), g_{3} (y), \dots, g_{m} (y), 0)

. The granular loss function is defined as

G (e) = \frac{1}{2} {| | W \cdot G (x) - \frac{1}{m} G (y) | |}_{2}^{2} + α {| | W | |}_{11} + \frac{1}{2} β {| | W | |}_{22}^{2},

(36)

where

W \cdot G (x) - \frac{1}{m} G (y)

is a granule.

We have the derivative operation for

G (e)

:

\frac{\partial G (e)}{\partial w} = - 1 * G (x) * (\frac{1}{m} G (y) - W \cdot G (x)) + α * s i g n (W) + β * W .

(37)

3.3. Granular Elastic Network Optimization Algorithm

In order to optimize the granular regression elastic network to attain better regression results, we propose the granular elastic regression optimizer in the following:

W_{c} = \underset{W_{c}}{arg min} G (e) = \underset{W_{c}}{arg min} \frac{1}{2} {| | W \cdot G (x) - \frac{1}{m} G (y) | |}_{2}^{2} + α {| | W | |}_{11} + \frac{1}{2} β {| | W | |}_{22}^{2},

(38)

We use stochastic gradient descent to solution the granular elasticity network parameters. To obtain the granular regression error, we define the root mean square error

R M S E

of the granules expressed as follows:

G R M S E = \frac{1}{n} \sqrt{\sum_{i = 1}^{n} {||W_{c} • G (x_{i}) - \frac{1}{m} G (y_{i})||}_{2}^{2}}

(39)

The details of the stochastic gradient descent optimized granular elastic network regression algorithm are shown in Algorithm 1. In the training algorithm, m represents the maximum number of training iterations.

η

is the learning rate.

Algorithm 1 Granular elastic network optimization algorithm

Input: The training set is

U = (X, Y)

, where is the feature vector,

x_{i} \in X

, is the regression decision vector

y_{i} \in Y, i = 1, 2, . . ., n

; the learning rate is

η

, and the number of iterations m

Output: Granular weight matrix W.

(1) The sample set X is granularized over the feature set

(X, C)

to obtain

G (X)

(2) The decision set Y is extended granularly as

G (Y)

(3) Construct a granular elastic network and randomly initialize the granular weights in the network

(4) i = 0

(5) for i to m

(6)

{G R M S E}^{i} = \frac{1}{n} \sqrt{\sum_{i = 1}^{n} {||W_{c} • G (x_{i}) - \frac{1}{m} G (y_{i})||}_{2}^{2}}

(7)

{W_{c}}^{i + 1} = {W_{c}}^{i} - η \frac{\partial G (e)}{\partial w}

(8)

G {R M S E}^{i + 1} = \frac{1}{n} \sqrt{\sum_{i = 1}^{n} {||{W_{c}}^{i + 1} • G (x_{i}) - \frac{1}{m} G (y_{i})||}_{2}^{2}}

(9)

Δ R M S E = G {R M S E}^{i} - {G R M S E}^{i + 1}

(10) if

Δ R M S E > ξ

(11) break

4. Experimental Analysis

Multiple UCI datasets including Estates, Concrete, QSAR fish, Airfoil, Energy, Yacht, etc., are used for comparison experiments. Further, in order to comprehensively analyze the proposed method, we select the Estates, Concrete, and QSAR fish datasets for hyperparameter analysis and fitting analysis. The Concrete dataset consists of 8 conditional attributes and 1 decision attribute, the Estates dataset consists of 9 conditional attributes and 1 decision attribute, and the QSAR fish dataset consists of 6 conditional attributes and 1 decision attribute. We randomly select 80% of the samples from the dataset as the training set and the remaining 20% as the test set. The random number seed is fixed to ensure that the results of each selection are the same.

We measure the model by mean absolute error

(M A E)

and the coefficient of determination

(R^{2})

.

M A E

is expressed as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} ||y_{i}^{'} - y_{i}||,

(40)

where

y_{i}^{'}

denotes the predicted value and

y_{i}

denotes the true value. The smaller the

M A E

, indicates the smaller the average prediction error of the model. In order to represent

R^{2}

, we need to represent the concepts of sum of squares for regression

S S R

, sum of squares for error

S S E

, and sum of squares for total

S S T

.

S S R

is expressed as follows:

S S R = \sum_{i = 1}^{n} {(y_{i}^{'} - \bar{y_{i}})}^{2}, w h e r e \bar{y_{i}} = \frac{\sum_{i = 1}^{n} y_{i}}{n} .

(41)

S S E

is represented as:

S S E = \sum_{i = 1}^{n} {(y_{i}^{'} - y_{i})}^{2} .

(42)

S S T

is defined as:

S S T = S S R + S S E = \sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2} .

(43)

R^{2}

is calculated by

R^{2} = 1 - \frac{S S E}{S S T} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i}^{'} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}} .

(44)

For the test set, the larger

R^{2}

model indicates the better fit, the greater interpretation of the dependent variable by the variables and the high percentage of the total variation caused by the independent variables. The maximum value of

R^{2}

is 1.

4.1. Convergence Analysis

We select two kinds of learning rate for convergence analysis. First of all, we set the learning rate of the algorithm to 0.001 and the loss function to MSE. Figure 1, Figure 2 and Figure 3 show the training processes of the granular elastic network on the Concrete, Estate, and QSAR datasets when learning the rate is 0.001, respectively. The granular elastic network converges on all three datasets. When the number of iteration rounds is 800, the granular elastic network reaches convergence on the Estate and QSAR datasets with smooth loss curves. For the Concrete data set, the granular elastic network also converges, but the loss curve is more volatile.

Then, the learning rate is adjusted to 0.005. Figure 4, Figure 5 and Figure 6 show the training processes of granular elastic network when the learning rate is 0.005. For Concrete dataset, it can be seen from Figure 1 and Figure 4 that the convergence curve when the learning rate is 0.005 is smoother than that when the learning rate is 0.001. However, for QSAR data set and Estate data set, the training curves are tortuous when the learning rate is 0.005, but they are able to convergence.

4.2. α and β Penalty Coefficient Proportional Impacts

In this subsection we show the effect of the

α

and

β

ratios on the different test sets. To facilitate the analysis, we set

α + β = 1

, and the L11 ratio

= \frac{α}{α + β}

. L11 ratio indicates the ratio of L11 penalty to L11 and L22 combined penalty. Smaller L11 ratio means smaller L11 penalty to L11 and L22 combined penalty. Larger L11 ratio means smaller L22 penalty to L11 and L22 combined penalty.

For the Concrete test set, the effect of L11 ratio on the model is shown in Figure 7. It can be seen from Figure 7 that the MAE is minimum when the L11 ratio = 0.5. When L11 ratio = 0.8, the

M A E

is maximum and reaches about 4.894. From Figure 8, it can be seen that when L11 ratio = 0.9, the

M A E

of the estate test set is minimum. When L11 ratio = 0.6, the

M A E

of the estate test set is the smallest. For the QSAR test set, as shown in Figure 9, the

M A E

reaches a minimum of about 0.799 at L11 ratio = 0.8, while the

M A E

of the QSAR test set reaches a maximum of about 0.831 at L11 ratio = 0.2.

4.3. Fitting Analysis

To better show the regression results, we selected 50 samples from the test set to plot the regression fitting curves, as shown in Figure 10, Figure 11 and Figure 12. Figure 10 represents the fitting curves of the granular elastic network in the Concrete test set. From Figure 10, we can see that the granular elastic network has good regression accuracy in the first 15 samples, but the regression error is larger in the last 10 samples. For the Estate test set, the first 5 samples are fitted very accurately, but samples 9 to 14 are fitted poorly. Sample 35 to final sample 50 are also fitted with less error. Comparing Figure 10 and Figure 12, it can be seen that the granular elastic network does not fit as well on the QSAR dataset as it does on the Concrete dataset.

4.4. Comparison of Granular Elastic Network Regression and Classical Regression Algorithms

In order to illustrate the effectiveness of the proposed method, the comparative experiments of granular elastic network and several classical regression algorithms in several UCI data sets are shown below. We use mean absolute error

(M A E)

and the coefficient of determination

(R^{2})

to measure model fitting performance. For ridge regression, lasso regression, elastic network regression, and granular elastic network regression all 5-fold cross-validation are performed on the training set to extract regular coefficients with the best results, and then substituted into the test set for testing. We bold the best results in Table 2 and Table 3.

Table 2 shows the

M A E

results on the test sets of granular elastic network and traditional regression networks. Smaller MAE means smaller regression error. Except for the Daily demand dataset and the QSAR dataset, the granular elastic network has a smaller mean absolute error

(M A E)

than the classical regression algorithm. Especially in the Concrete, Energy and Yacht datasets, the MAE of the granular elastic network is particularly small compared to the classical regression algorithms.

The coefficient of determination

(R^{2})

of the granular elastic network and the traditional regression algorithms on the test sets are shown in Table 3. A higher coefficient of determination

(R^{2})

means better regression fitting performance. The maximum value of

R^{2}

is 1. In the vast majority of datasets, the

R^{2}

of the proposed algorithm is better than the classical regression algorithms. In the Yacht dataset, even though the

R^{2}

of some traditional algorithms is negative, the

R^{2}

of the granular elastic network is as high as about 0.93.

In general, the granular elastic network regression algorithm proposed in this paper has better regression performance than the classical regression algorithm.

5. Conclusions and Discussion

Traditional regression algorithms can only be mostly applied only to real-valued data, but not to fuzzy and aggregated domains. We design relevant granulation methods and granulation algorithms for fuzzy and ensemble data. Then, a novel granular regression system is proposed in this paper and two regularization penalties are added to it. Similarly, the proposed granular elastic network optimization algorithm is applied to granular elastic network learning. Finally, we conducted experiments on the UCI datasets and compared it with several traditional regression algorithms. The experimental results show that the granular elastic network proposed in this paper has the advantage of smaller regression error and better fit compared to traditional regression algorithms.

As the granulation method in this paper is global granulation, it is difficult to process big data in this granular regression method. We will propose a new granulation approach to apply granular regression to big data. In future work, we will apply the granular regression system to the nonlinear regression system. We will also propose new granulation methods to further improve the regression performance.

Author Contributions

Conceptualization, L.H. and Y.C.; methodology, L.H.; software, L.H.; validation, L.H.; formal analysis, L.H. and C.Z.; investigation, L.H.; resources, C.Z. and K.W.; data curation, L.H. and Y.C.; writing—original draft preparation, L.H. and Y.C.; visualization, L.H.; supervision, Y.C. and K.W.; project administration, Y.C. and K.W.; funding acquisition, Y.C. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (No. 61976183, No. 62172242), the Natural Science Foundation of Fujian Province (No. 2019J01850).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zadeh, L.A. Fuzzy sets and information granularity. Adv. Fuzzy Set Theory Appl. 1979, 11, 3–18. [Google Scholar]
Hobbs, J.R. Granularity. In Proceedings of the IJCAI, Los Angeles, CA, USA, 18–23 August 1985; pp. 432–435. [Google Scholar]
Zadeh, L.A. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 1997, 90, 111–127. [Google Scholar] [CrossRef]
Lin, T.Y. Data Mining: Granular Computing Approach. Lect. Notes Comput. Sci. 1999, 1574, 24–33. [Google Scholar]
Lin, T.Y.; Zadeh, L.A. Special issue on granular computing and data mining. Int. J. Intell. Syst. 2004, 19, 565–566. [Google Scholar] [CrossRef]
Bazan, J.G.; Nguyen, H.S.; Nguyen, S.H.; Synak, P.; Wróblewski, J. Rough Set Algorithms in Classification Problem. In Rough Set Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2000; pp. 49–88. [Google Scholar]
Liu, Q.; Jiang, S.L. Reasoning about information granules based on rough logic. Lect. Notes Comput. Sci. 2002, 2475, 139–143. [Google Scholar]
Yao, Y. Information granulation and rough set approximation. Int. J. Intell. Syst. 2001, 16, 87–104. [Google Scholar] [CrossRef] [Green Version]
Miao, D.Q.; Fan, S. The calculation of knowledge granulation and its application. Syst. Eng.-Theory Pract. 2002, 22, 48–56. [Google Scholar]
Miao, D.Q.; Xu, F.F.; Yao, Y.Y.; Wei, L. Set-theoretic formulation of granular computing. Chin. J. Comput. 2012, 35, 351–363. [Google Scholar] [CrossRef]
Wang, G.Y.; Zhang, Q.H.; Ma, X.A.; Yang, Q.S. Granular computing models for knowledge uncertainty. J. Softw. 2011, 22, 676–694. [Google Scholar] [CrossRef]
Chen, Y.; Miao, D.Q.; Wang, R.; Wu, K. A rough set approach to feature selection based on power set tree. Knowl.-Based Syst. 2011, 24, 275–281. [Google Scholar] [CrossRef]
Qian, J.; Miao, D.; Zhang, Z.; Yue, X. Parallel attribute reduction algorithms using MapReduce. Inform. Sci. 2014, 279, 671–690. [Google Scholar] [CrossRef]
Chen, Y.M.; Qin, N.; Li, W.; Xu, F.F. Granule structures, distances and measures in neighborhood systems. Knowl.-Based Syst. 2019, 165, 268–281. [Google Scholar] [CrossRef]
Li, W.; Chen, Y.M.; Song, Y.P. Boosted K-nearest neighbor classifiers based on fuzzy granules. Knowl.-Based Syst. 2020, 195, 105–606. [Google Scholar] [CrossRef]
Chiaselotti, G.; Gentile, T.; Infusino, F. Granular computing on information tables: Families of subsets and operators. Inform. Sci. 2018, 442, 72–102. [Google Scholar] [CrossRef]
Lei, T.; Jia, X.; Zhang, Y.; Liu, S.; Meng, H.; Nandi, A.K. Superpixel-Based Fast Fuzzy C-Means Clustering for Color Image Segmentation. IEEE Trans. Fuzzy Syst. 2019, 27, 1753–1766. [Google Scholar] [CrossRef] [Green Version]
Fujita, H.; Gaeta, A.; Loia, V.; Orciuoli, F. Resilience analysis of critical infrastructures: A cognitive approach based on granular computing. IEEE Trans. Cybern. 2019, 49, 1835–1848. [Google Scholar] [CrossRef]
Chen, Y.M.; Miao, D.Q.; Wang, R.Z. A rough set approach to feature selection based on ant colony optimization. Pattern Recognit. Lett. 2010, 31, 226–233. [Google Scholar] [CrossRef]
Hu, Q.H.; Yu, D.R.; Liu, J.F.; Wu, C.X. Neighborhood rough set based heterogeneous feature subset selection. Inf. Sci. 2008, 178, 3577–3594. [Google Scholar] [CrossRef]
Li, D.F. TOPSIS-based nonlinear-programming methodology for multiattribute decision making with interval-valued intuitionistic fuzzy sets. IEEE Trans. Fuzzy Syst. 2018, 26, 391. [Google Scholar] [CrossRef]
Lin, G.P.; Liang, J.Y.; Li, J.J. A fuzzy multigranulation decision-theoretic approach to multi-source fuzzy information systems. Knowl.-Based Syst. 2016, 91, 102–113. [Google Scholar] [CrossRef]
Mendel, J.M.; Bonissone, P.P. Critical Thinking about Explainable AI (XAI) for Rule-Based Fuzzy Systems. IEEE Trans. Fuzzy Syst. 2021, 29, 3579–3593. [Google Scholar] [CrossRef]
Cosme, L.B.; Caminhas, W.M.; Dangelo, M.F.; Palhares, R.M. A novel fault-prognostic approach based on interacting multiple model filters and fuzzy systems. IEEE Trans. Ind. Electron. 2018, 66, 519–528. [Google Scholar] [CrossRef]
Wang, L.; Dong, J. Adaptive Fuzzy Consensus Tracking Control for Uncertain Fractional-Order Multiagent Systems With Event-Triggered Input. IEEE Trans. Fuzzy Syst. 2022, 30, 310–320. [Google Scholar] [CrossRef]
Hu, H.; Pang, L.; Tian, D.; Shi, Z. Perception granular computing in visual haze-free task. Expert Syst. Appl. 2014, 41, 2729–2741. [Google Scholar] [CrossRef]
Naseem, I.; Togneri, R.; Bennamoun, M. Linear regression for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 2106–2112. [Google Scholar] [CrossRef]
Chen, Y.R.; Rezapour, A.; Tzeng, W.G. Privacy-preserving ridge regression on distributed data. Inf. Sci. 2018, 451, 34–49. [Google Scholar] [CrossRef]
Tibshirani, R.J. Regression shrinkage and selection via the LASSO. J. R. Statist. Soc. 1996, 73, 273–282. [Google Scholar] [CrossRef]
Zhang, Z.; Lai, Z.; Xu, Y.; Shao, L.; Wu, J.; Xie, G.S. Discriminative elastic-net regularized linear regression. IEEE Trans. Image Process. 2017, 26, 1466–1481. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Training process for Concrete dataset when learning the rate is 0.001.

Figure 2. Training process for Estate dataset when learning the rate is 0.001.

Figure 3. Training process for QSAR dataset when learning the rate is 0.001.

Figure 4. Training process for Concrete dataset when learning the rate is 0.005.

Figure 5. Training process for Estate dataset when learning the rate is 0.005.

Figure 6. Training process for QSAR dataset when learning the rate is 0.005.

Figure 7. Impacts of L11 ratio on Concrete dataset.

Figure 8. Impacts of L11 ratio on Estate dataset.

Figure 9. Impacts of L11 ratio on QSAR dataset.

Figure 10. The regression fitting curve for Concrete test set.

Figure 11. The regression fitting curve for Estate test set.

Figure 12. The regression fitting curve for QSAR test set.

Table 1. A data set.

X	a	b	c	D
$x_{1}$	1	0.5	0.6	22.5
$x_{2}$	0.35	0.4	0.5	33
$x_{3}$	0.4	1	0.9	45
$x_{4}$	0.7	0.7	0.1	10.05

Table 2. The

M A E

results for granular elastic network regression and classical regression algorithms.

Table 2. The

M A E

results for granular elastic network regression and classical regression algorithms.

Datasets	Linear Regression	Ridge	Lasso	Elastic Network	Granular Elastic Network
Concrete	8.1142	8.0997	8.0997	8.0978	4.3481
Estate	6.2929	6.2205	6.2284	6.2346	5.1689
QSAR	0.777	0.769	0.7765	0.7727	0.7911
Airfoil	3.5452	3.5472	3.5477	3.5621	3.2883
Ale	0.223	0.2217	0.2179	0.2169	0.1471
Slump	2.7413	2.7503	2.7543	2.7615	1.9253
Daliy demand	71.8545	80.9481	71.8734	77.0038	85.7092
Boston	3.9358	3.926	3.917	3.8874	3.0508
Energy	2.2216	2.2439	2.2582	2.3377	0.9997
Yacht	8.3883	7.581	8.3399	7.462	1.9873

Table 3. The coefficient of determination results for granular elastic network regression and classical regression algorithms.

Datasets	Linear Regression	Ridge	Lasso	Elastic Network	Granular Elastic Network
Concrete	0.3937	0.3904	0.3929	0.3906	0.8842
Estate	0.5289	0.4995	0.5128	0.5059	0.6303
QSAR	0.3637	0.331	0.3626	0.3499	0.4113
Airfoil	0.1676	0.1621	0.1598	0.1257	0.3566
Ale	0.1009	0.0937	0.1011	0.0829	0.7212
Slump	0.791	0.7865	0.7926	0.7816	0.9125
Daliy demand	0.5698	0.4637	0.5692	0.4466	0.3353
Boston	0.3531	0.3455	0.3437	0.3029	0.5521
Energy	0.893	0.8917	0.8906	0.8749	0.9844
Yacht	0.2779	−0.1795	0.2705	−1.3873	0.9299

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, L.; Chen, Y.; Zhong, C.; Wu, K. Granular Elastic Network Regression with Stochastic Gradient Descent. Mathematics 2022, 10, 2628. https://doi.org/10.3390/math10152628

AMA Style

He L, Chen Y, Zhong C, Wu K. Granular Elastic Network Regression with Stochastic Gradient Descent. Mathematics. 2022; 10(15):2628. https://doi.org/10.3390/math10152628

Chicago/Turabian Style

He, Linjie, Yumin Chen, Caiming Zhong, and Keshou Wu. 2022. "Granular Elastic Network Regression with Stochastic Gradient Descent" Mathematics 10, no. 15: 2628. https://doi.org/10.3390/math10152628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Granular Elastic Network Regression with Stochastic Gradient Descent

Abstract

1. Introduction

2. Granulation

3. Granular Elastic Network Regression

3.1. Granular Operations and Measures

3.2. Granular Elasticity Regression Model

3.3. Granular Elastic Network Optimization Algorithm

4. Experimental Analysis

4.1. Convergence Analysis

4.2. α and β Penalty Coefficient Proportional Impacts

4.3. Fitting Analysis

4.4. Comparison of Granular Elastic Network Regression and Classical Regression Algorithms

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI