A comprehensive evaluation of random vector functional link networks

doi:10.1016/j.ins.2015.09.025

Information Sciences

Volumes 367–368, 1 November 2016, Pages 1094-1105

https://doi.org/10.1016/j.ins.2015.09.025 Get rights and content

Abstract

With randomly generated weights between input and hidden layers, a random vector functional link network is a universal approximator for continuous functions on compact sets with fast learning property. Though it was proposed two decades ago, the classification ability of this family of networks has not been fully investigated yet. Through a very comprehensive evaluation by using 121 UCI datasets, the effect of bias in the output layer, direct links from the input layer to the output layer and type of activation functions in the hidden layer, scaling of parameter randomization as well as the solution procedure for the output weights are investigated in this work. Surprisingly, we found that the direct link plays an important performance enhancing role in RVFL, while the bias term in the output neuron had no significant effect. The ridge regression based closed-form solution was better than those with Moore–Penrose pseudoinverse. Instead of using a uniform randomization in [ $-$ 1,+1] for all datasets, tuning the scaling of the uniform randomization range for each dataset enhances the overall performance. Six commonly used activation functions were investigated in this work and we found that hardlim and sign activation functions degenerate the overall performance. These basic conclusions can serve as general guidelines for designing RVFL networks based classifiers.

Introduction

Single layer feedforward neural networks (SLFN) have been widely applied to solve problems such as classification and regression because of their universal approximation capability [14], [17], [20], [31]. Conventional methods for training SLFN are back-propagation based learning algorithms [7], [10]. These iterative methods suffer from slow convergence, getting trapped in a local minimum and being sensitivity to learning rate setting. Random Vector Functional Link Networks (RVFL), shown in Fig. 1, which is a randomized version of the functional link neural network network [8], [25], shows that actual values of the weights from the input layer to hidden layer can be randomly generated in a suitable domain and kept fixed in the learning stage. Independently developed method in [35] also belongs to the family of randomized methods for training artificial neural networks with randomized input layer weights. This method [35] does not have direct links between the inputs and the outputs whereas RVFL has highly beneficial direct links.

RVFL was proposed in [28]. Learning and generalization characteristics of RVFL were discussed in [26]. In [17], Igelnik and Pao proved that the RVFL network is a universal approximator for a continuous function on a bounded finite dimensional set with a closed-form solution. From then on, RVFL has been employed to solve problems in diverse domains. A dynamic step-wise updating algorithm was proposed to update the output weights of the RVFL on-the-fly in [5] for both a new added pattern and a new added enhancement node. The RVFL network was investigated in [37] in the context of modeling and control. They [37] suggested to combine unsupervised placement of network nodes to the input data density with subsequent supervised or reinforcement learning of the linear parameters of the approximator. Modeling conditional probabilities with RVFL was reported in [15].

RVFL can also be combined with other learning methods. In [6], RVFL was combined with statistical hypothesis testing and self-organization of a number of enhancement nodes to generate a new learning system called a statistical self-organizing learning system (SSOLS) for remote sensing applications. In [16], expectation maximization was combined with RVFL to improve its performance. RVFL has also been investigated in ensemble learning framework. In [1], decorrelated RVFL ensemble was introduced based on the negative correlation learning. RVFL based multi-source data ensemble for clinker free lime content estimation in rotary kiln sintering processes can be found [21]. RVFL has also been widely applied to solve real-life problems. In [30], the authors reported the performance of a holistic-styled word-based approach to off-line recognition of English language script. Radial basis function neural net and RVFL were combined. Their approach, named as density-based random-vector functional-link net (DBRVFLN), was helpful in improving the performance of the word recognition. In [29], RVFL was used in MPEG-4 coder. In [38] RVFL was applied for pedestrian detection based on combination of multi-feature. In [39], RVFL was combined with Adaboost in the pedestrian detection system. In [23], the authors investigated the performance of hardware implementation methods for RVFL. In [34], distributed learning of RVFL was proposed where training data is distributed under a decentralized information structure.

Consider an RVFL as demonstrated in Fig. 1. As mentioned before, the weights a_ij from the input to the enhancement nodes are randomly generated such that the activation functions $g (a_{j}^{T} x + b_{j})$ are not all saturated. Following the approach in [1], all the weights are generated with the a uniform distribution within $[- S, + S]$ in this work, where S is a scale factor to be determined during the parameter tuning stage for each dataset. For RVFL, only the output weights β need to the determined by solving the following problem: $t_{i} = d_{i}^{T} β, i = 1, 2, \dots, P$ where P is the number of data samples, t is the target and d is the vector version of the concatenation of the original features as well as the random features.¹ Directly solving the problem in Eq. (1) may lead to over-fitting. In practice, a regularization on the solution such as regularized least square or preference of the solution with smaller norm [3] can be adopted to obtain the solution. RVFL can be roughly divided into 2 classes based on the algorithm to obtain the output weights. One is iterative RVFL, which obtains the output weights in an iterative manner based on the gradient of the error function. The other one is closed-form based RVFL, which obtains the output weights in a single-step. The present work focuses on the closed-form based RVFL because of its efficiency. A straightforward solution within a single learning step can be achieved by the pseudo-inverse [17], [27], among which Moore–Penrose pseudoinverse, $β = D^{+} T,$ where D and T are the matrix versions of the features and targets by stacking the features and targets of all data samples, is most commonly used. Another alternative is the L2 norm regularized least square (or ridge regression), which solves the following problem: $\sum_{i} {(t_{i} - d_{i}^{T} β)}^{2} + λ {∥ β ∥}^{2}; i = 1, 2, \dots P .$ The solution is given by $β = D {(D^{T} D + λ I)}^{- 1} T,$ where λ is the regularization parameter to be tuned.

Though there are many RVFL variants in the literature, some core features of RVFL remain unchanged. In this work, We choose the closed-from based RVFL and the following issues are investigated by using 121 UCI datasets as done in [11].

1.
Effect of direct links from the input layer to the output layer.
2.
Effect of the bias in the output neuron.
3.
Performance of 6 commonly used activation functions as summarized in Table 1.
4.
Performance of Moore–Penrose pseudoinverse and ridge regression (or regularized least square solutions) for the computation of the output weights.
5.
Effect of range for randomly generated parameters in hidden neurons.

Issues $1 - 4$ in the above list are discussed in Section 2.3 while issue 5 is discussed in Section 2.5.

Section snippets

Datasets

All 121 datasets are from the UCI repository [22]. The details of the datasets are summarized in Table 2.

We follow the same procedure as in [11]. Randomized stratified sampling is employed to make sure one training and one test set are generated (each with 50% of the available patterns), where each class has the same number of training and test patterns. Parameter tuning is performed on this couple of sets to identify parameters with the best performance on the test set. There are two

Concluding remarks

In this work we presented extensive and comprehensive evaluation of variants of RVFL with closed-form solution by using 121 UCI datasets [11]. The conclusion of our investigations are as follows:

1.
the effect of the direct links from the input layer to the output layer. It turns out that the direct links lead to better performance than those without in all cases as seen in Table 3.
2.
the effect of the bias in the output layer. It turns out that the bias term in the output neurons only has mixed

Acknowledgment

The authors would like to thank the Guest Editors and the reviewers for their valuable comments. In particular, authors thank the managing Guest Editor Associate Professor Dianhui Wang for suggesting us to investigate the scaling of randomization. Results presented in Section 2.5 show overall performance enhancement due to tuning the scaling of randomization.

References (39)

M. Alhamdoosh et al.
Fast decorrelated neural network ensembles with random weights
Inf. Sci.
(2014)
K. Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989)
D. Husmeier et al.
Neural networks for predicting conditional probability densities: improved training scheme combining EM and RVFL
Neural Netw.
(1998)
M. Leshno et al.
Multilayer feedforward networks with a nonpolynomial activation function can approximate any function
Neural Netw.
(1993)
Y.-H. Pao et al.
Learning and generalization characteristics of the random vector functional-link net
Neurocomputing
(1994)
Y.-H. Pao et al.
The functional link net and learning optimal control
Neurocomputing
(1995)
G.H. Park et al.
Intelligent rate control for MPEG-4 coders
Eng. Appl. Artif. Intell.
(2000)
G.H. Park et al.
Unconstrained word-based approach for off-line script recognition using density-based random-vector functional-link net
Neurocomputing
(2000)
S. Scardapane et al.
Distributed learning for random vector functional-link networks
Inf. Sci.
(2015)
S. An et al.
Face recognition using kernel ridge regression
Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition
(2007)

P.L. Bartlett

The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

IEEE Trans. Inf. Theor.

(1998)

L. Breiman

Arcing classifier (with discussion and a rejoinder by the author)

The Ann. Stat.

(1998)

C.P. Chen et al.

A rapid learning and dynamic stepwise updating algorithm for flat neural networks and the application to time-series prediction

IEEE Trans. Syst. Man Cybern. Part B: Cybern.

(1999)

H.-M. Chi et al.

A statistical self-organizing learning system for remote sensing classification

IEEE Trans. Geosci. Remote Sens.

(2005)

L. Cun et al.

Handwritten digit recognition with a back-propagation network

Advances in Neural Information Processing Systems

(1990)

S. Dehuri et al.

A comprehensive survey on functional link neural networks and an adaptive pso–bp learning for cflnn

Neural Comput. Appl.

(2010)

J. Demšar

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.

(2006)

J.S. Denker et al.

Neural network recognizer for hand-written zip code digits

Advances in Neural Information Processing Systems

(1989)

M. Fernández-Delgado et al.

Do we need hundreds of classifiers to solve real world classification problems?

J. Mach. Learn. Res.

(2014)

Cited by (353)

A novel broad learning system integrated with restricted Boltzmann machine and echo state network for time series forecasting
2024, Engineering Applications of Artificial Intelligence
Time series data prediction is crucial in system control, social management, and economic production. For the complex features of time series data and the massive amount of arithmetic in deep learning, a novel network model is proposed based on the broad learning architecture to handle time series prediction tasks. The model utilizes the restricted Boltzmann machine (RBM) in the mapping layer to learn feature information from the input data. Simultaneously, it employs the echo state network (ESN) as the fundamental unit in the enhancement layer to fit the learned feature information from the mapping layer. The proposed model's predictive performance has been validated on air quality index, PM2.5, and electric power datasets. Compared with the benchmark model, it has the advantage of training speed and can reduce prediction error by up to 36%, proving its effectiveness in time series prediction tasks.
Domain-incremental learning without forgetting based on random vector functional link networks
2024, Pattern Recognition
Incremental learning is a paradigm that extends knowledge by learning from new data, often used to add new classes to an existing model or to learn a new domain. It imposes strict limitations on the model’s access to data from previous tasks, making it similar to the human learning process. The main challenge of incremental learning is catastrophic forgetting, where previous knowledge is severely forgotten while learning new tasks. In this work, we propose a novel approach for domain-incremental learning. Inspired by the Normal Equation, we accumulate the Gram Matrix from each task’s hidden layer output to update a simplified RVFL model. This algorithm achieves performance comparable to joint training while strictly adhering to privacy restrictions. With issues such as forgetting, storage requirements and privacy protection be addressed, this algorithm has the potential to play a crucial role in the field of edge computing and other related fields.
Ship order book forecasting by an ensemble deep parsimonious random vector functional link network
2024, Engineering Applications of Artificial Intelligence
Efficient forecasting of ship order books holds immense significance in the maritime industry, enabling companies to optimize their operations, allocate resources effectively, and make informed decisions. However, volatile characteristics within historical order books pose challenges in achieving reliable, intelligent, and precise forecasts. This paper presents a novel ensemble deep random vector functional link (edRVFL) algorithm to anticipate future ship order book dynamics. The edRVFL leverages deep feature extraction and ensemble learning to enhance forecasting performance. To further elevate its capabilities, we introduce a discontinuous and parsimonious embedding strategy, which deviates from the conventional dense collection of continuous time steps used in vanilla edRVFL. This parsimonious embedding approach limits the model’s complexity and boosts its generalization ability. We extensively evaluate the proposed method using ship order book data, and comparative studies demonstrate its superiority over alternative approaches. Our proposed edRVFL offers a promising solution for accurate and efficient ship order book forecasting, making it a valuable asset in the maritime industry’s decision-making processes. The source codes utilized in this research are openly available on GitHub at the following link: https://github.com/crkkkaa/Ship-order-book-forecasting-by-an-ensemble-deep-parsimonious-random-vector-functional-link-network-.
Prediction of California bearing ratio and modified proctor parameters using deep neural networks and multiple linear regression: A case study of granular soils
2024, Case Studies in Construction Materials
The California Bearing Ratio (CBR) and modified proctor parameters belong to the soil geotechnical properties used to assess soil behavior. Direct measurement of these properties can be quite time-consuming in large-scale applications or when immediate results are required. Therefore, significant research efforts have been made in the literature to develop indirect methods for their estimation. However, some gaps in the state-of-the-art can be highlighted in these topics, such as the deficiency in computational models to calculate the maximum dry unit weight ( $γ_{d (\max)}$ ), optimum moisture content ( $w_{opt}$ ) and CBR, and the lack of methods that consider their intrinsic influence on each other. Hence, in this investigation, mathematical and computational models were created to obtain the above-mentioned variables from the soil grain size distribution. The mathematical model was based on Multiple Linear Regression (MLR) correlations. Meanwhile, the computational model was constructed from a custom-made Deep Neural Networks (DNNs) architecture. Subsequently, the accuracy of these models was validated with an experimental case study. The results demonstrated that the proposed methods in this study are more precise than previous approaches in the literature. Accordingly, the main contribution of this manuscript to the industry is the formation of models with high exactness to predict the $γ_{d (\max)}$ , $w_{opt}$ and CBR of granular soils.
Diagnosis of breast cancer using flexible pinball loss support vector machine
2024, Applied Soft Computing
Breast cancer is a common disease that affects feminine health, making it an active area of research. Also, support vector machine with pinball loss (pin-SVM) is an efficient classification algorithm to address noise sensitivity and re-sampling instability. The pinball loss function uses a loss parameter $τ \in [0, 1]$ which corresponds to the quantile level. However, the non-negativity condition on $τ$ is not necessary, and it can be extended to the negative values for an improvement in classification accuracy. Also, instead of a positive loss parameter $τ$ , two positive parameters, $τ_{1}$ and $τ_{2}$ are used in literature, which improve the generalization performance of the pin-SVM. Taking motivation from the aforementioned observations, in this paper, we propose an innovative loss function, termed the flexible pinball loss, which extends the parameters $τ_{1}$ and $τ_{2}$ to encompass negative values. This extension enables the function to take $τ_{1}$ and $τ_{2}$ values from $- 1$ to 1 while preserving convexity. Subsequently, we integrate the proposed flexible pinball loss function into the support vector machine framework and propose a novel model named flexible pinball loss support vector machine (FP-SVM) for the prediction of breast cancer. FP-SVM provides loss to both incorrectly and correctly classified samples, leveraging the parameters $τ_{1}$ and $τ_{2}$ , respectively. Importantly, FP-SVM strategically traverses the maximum solution path, ensuring the preservation of convexity within the optimization problem. The proposed FP-SVM outperforms the baseline models in terms of accuracy, which is empirically supported by numerical experiments on 30 UCI and KEEL benchmark datasets. Furthermore, to show the efficacy of the proposed FP-SVM in real-world application, we performed experiments on publicly available breast cancer dataset (BreakHis), and the results demonstrate that the proposed FP-SVM outperforms the baseline models.
A regularized orthogonal activated inverse-learning neural network for regression and classification with outliers
2024, Neural Networks
A novel regularized orthogonal activated inverse-learning (ROAIL) neural network is proposed and investigated for reducing the impact of outliers in regression and classification fields. The proposed ROAIL network does not require extensive iterative computations. Instead, it can achieve the desired results with a single step of computation, allowing for the efficient acquisition of network weights. By extending the Gegenbauer polynomials to a multi-variate version, and integrating the $ℓ_{2}$ regularization and Welsch loss function into the orthogonal activated inverse-learning framework, two forms of ROAIL are obtained, i.e., $ℓ_{2}$ norm ROAIL ( $ℓ_{2}$ -ROAIL) and Welsch-ROAIL (W-ROAIL). $ℓ_{2}$ -ROAIL neural network is proposed to minimize the empirical and structural risk simultaneously since taking the structural risk as a part of loss function can effectively reduce the complexity of the model and thus improve the generalization ability. W-ROAIL neural network further improves the robustness of the $ℓ_{2}$ -ROAIL neural network by replacing the original two-norm in loss function with Welsch function. The Welsch function can determine the weights of each sample according to its output error, and influence of outliers could be weakened since the weights of outliers would be reduced. Both regression and classification experiments show that W-ROAIL neural network has strong ability to suppress the influence of outliers.

View all citing articles on Scopus

View full text

A comprehensive evaluation of random vector functional link networks

Abstract

Introduction

Section snippets

Datasets

Concluding remarks

Acknowledgment

Inf. Sci.

Neural Netw.

Neural Netw.

Neural Netw.

Neurocomputing

Neurocomputing

Eng. Appl. Artif. Intell.

Neurocomputing

Inf. Sci.

Face recognition using kernel ridge regression

Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition

The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

IEEE Trans. Inf. Theor.

Arcing classifier (with discussion and a rejoinder by the author)

The Ann. Stat.

A rapid learning and dynamic stepwise updating algorithm for flat neural networks and the application to time-series prediction

IEEE Trans. Syst. Man Cybern. Part B: Cybern.

A statistical self-organizing learning system for remote sensing classification

IEEE Trans. Geosci. Remote Sens.

Handwritten digit recognition with a back-propagation network

Advances in Neural Information Processing Systems

A comprehensive survey on functional link neural networks and an adaptive pso–bp learning for cflnn

Neural Comput. Appl.

Statistical comparisons of classifiers over multiple data sets

J. Mach. Learn. Res.

Neural network recognizer for hand-written zip code digits

Advances in Neural Information Processing Systems

Do we need hundreds of classifiers to solve real world classification problems?

J. Mach. Learn. Res.