Elsevier

Information Sciences

Volumes 367–368, 1 November 2016, Pages 1094-1105
Information Sciences

A comprehensive evaluation of random vector functional link networks

https://doi.org/10.1016/j.ins.2015.09.025Get rights and content

Abstract

With randomly generated weights between input and hidden layers, a random vector functional link network is a universal approximator for continuous functions on compact sets with fast learning property. Though it was proposed two decades ago, the classification ability of this family of networks has not been fully investigated yet. Through a very comprehensive evaluation by using 121 UCI datasets, the effect of bias in the output layer, direct links from the input layer to the output layer and type of activation functions in the hidden layer, scaling of parameter randomization as well as the solution procedure for the output weights are investigated in this work. Surprisingly, we found that the direct link plays an important performance enhancing role in RVFL, while the bias term in the output neuron had no significant effect. The ridge regression based closed-form solution was better than those with Moore–Penrose pseudoinverse. Instead of using a uniform randomization in [1,+1] for all datasets, tuning the scaling of the uniform randomization range for each dataset enhances the overall performance. Six commonly used activation functions were investigated in this work and we found that hardlim and sign activation functions degenerate the overall performance. These basic conclusions can serve as general guidelines for designing RVFL networks based classifiers.

Introduction

Single layer feedforward neural networks (SLFN) have been widely applied to solve problems such as classification and regression because of their universal approximation capability [14], [17], [20], [31]. Conventional methods for training SLFN are back-propagation based learning algorithms [7], [10]. These iterative methods suffer from slow convergence, getting trapped in a local minimum and being sensitivity to learning rate setting. Random Vector Functional Link Networks (RVFL), shown in Fig. 1, which is a randomized version of the functional link neural network network [8], [25], shows that actual values of the weights from the input layer to hidden layer can be randomly generated in a suitable domain and kept fixed in the learning stage. Independently developed method in [35] also belongs to the family of randomized methods for training artificial neural networks with randomized input layer weights. This method [35] does not have direct links between the inputs and the outputs whereas RVFL has highly beneficial direct links.

RVFL was proposed in [28]. Learning and generalization characteristics of RVFL were discussed in  [26]. In [17], Igelnik and Pao proved that the RVFL network is a universal approximator for a continuous function on a bounded finite dimensional set with a closed-form solution. From then on, RVFL has been employed to solve problems in diverse domains. A dynamic step-wise updating algorithm was proposed to update the output weights of the RVFL on-the-fly in [5] for both a new added pattern and a new added enhancement node. The RVFL network was investigated in [37] in the context of modeling and control. They [37] suggested to combine unsupervised placement of network nodes to the input data density with subsequent supervised or reinforcement learning of the linear parameters of the approximator. Modeling conditional probabilities with RVFL was reported in  [15].

RVFL can also be combined with other learning methods. In [6], RVFL was combined with statistical hypothesis testing and self-organization of a number of enhancement nodes to generate a new learning system called a statistical self-organizing learning system (SSOLS) for remote sensing applications. In [16], expectation maximization was combined with RVFL to improve its performance. RVFL has also been investigated in ensemble learning framework. In [1], decorrelated RVFL ensemble was introduced based on the negative correlation learning. RVFL based multi-source data ensemble for clinker free lime content estimation in rotary kiln sintering processes can be found  [21]. RVFL has also been widely applied to solve real-life problems. In [30], the authors reported the performance of a holistic-styled word-based approach to off-line recognition of English language script. Radial basis function neural net and RVFL were combined. Their approach, named as density-based random-vector functional-link net (DBRVFLN), was helpful in improving the performance of the word recognition. In [29], RVFL was used in MPEG-4 coder. In [38] RVFL was applied for pedestrian detection based on combination of multi-feature. In  [39], RVFL was combined with Adaboost in the pedestrian detection system. In [23], the authors investigated the performance of hardware implementation methods for RVFL. In [34], distributed learning of RVFL was proposed where training data is distributed under a decentralized information structure.

Consider an RVFL as demonstrated in Fig. 1. As mentioned before, the weights aij from the input to the enhancement nodes are randomly generated such that the activation functions g(ajTx+bj) are not all saturated. Following the approach in  [1], all the weights are generated with the a uniform distribution within [S,+S] in this work, where S is a scale factor to be determined during the parameter tuning stage for each dataset. For RVFL, only the output weights β need to the determined by solving the following problem: ti=diTβ,i=1,2,,Pwhere P is the number of data samples, t is the target and d is the vector version of the concatenation of the original features as well as the random features.1 Directly solving the problem in Eq. (1) may lead to over-fitting. In practice, a regularization on the solution such as regularized least square or preference of the solution with smaller norm [3] can be adopted to obtain the solution. RVFL can be roughly divided into 2 classes based on the algorithm to obtain the output weights. One is iterative RVFL, which obtains the output weights in an iterative manner based on the gradient of the error function. The other one is closed-form based RVFL, which obtains the output weights in a single-step. The present work focuses on the closed-form based RVFL because of its efficiency. A straightforward solution within a single learning step can be achieved by the pseudo-inverse [17], [27], among which Moore–Penrose pseudoinverse, β=D+T, where D and T are the matrix versions of the features and targets by stacking the features and targets of all data samples, is most commonly used. Another alternative is the L2 norm regularized least square (or ridge regression), which solves the following problem: i(tidiTβ)2+λβ2;i=1,2,P.The solution is given by β=D(DTD+λI)1T, where λ is the regularization parameter to be tuned.

Though there are many RVFL variants in the literature, some core features of RVFL remain unchanged. In this work, We choose the closed-from based RVFL and the following issues are investigated by using 121 UCI datasets as done in [11].

  • 1.

    Effect of direct links from the input layer to the output layer.

  • 2.

    Effect of the bias in the output neuron.

  • 3.

    Performance of 6 commonly used activation functions as summarized in Table 1.

  • 4.

    Performance of Moore–Penrose pseudoinverse and ridge regression (or regularized least square solutions) for the computation of the output weights.

  • 5.

    Effect of range for randomly generated parameters in hidden neurons.

Issues 14 in the above list are discussed in Section 2.3 while issue 5 is discussed in Section 2.5.

Section snippets

Datasets

All 121 datasets are from the UCI repository [22]. The details of the datasets are summarized in Table 2.

We follow the same procedure as in [11]. Randomized stratified sampling is employed to make sure one training and one test set are generated (each with 50% of the available patterns), where each class has the same number of training and test patterns. Parameter tuning is performed on this couple of sets to identify parameters with the best performance on the test set. There are two

Concluding remarks

In this work we presented extensive and comprehensive evaluation of variants of RVFL with closed-form solution by using 121 UCI datasets [11]. The conclusion of our investigations are as follows:

  • 1.

    the effect of the direct links from the input layer to the output layer. It turns out that the direct links lead to better performance than those without in all cases as seen in Table 3.

  • 2.

    the effect of the bias in the output layer. It turns out that the bias term in the output neurons only has mixed

Acknowledgment

The authors would like to thank the Guest Editors and the reviewers for their valuable comments. In particular, authors thank the managing Guest Editor Associate Professor Dianhui Wang for suggesting us to investigate the scaling of randomization. Results presented in Section 2.5 show overall performance enhancement due to tuning the scaling of randomization.

References (39)

  • P.L. Bartlett

    The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

    IEEE Trans. Inf. Theor.

    (1998)
  • L. Breiman

    Arcing classifier (with discussion and a rejoinder by the author)

    The Ann. Stat.

    (1998)
  • C.P. Chen et al.

    A rapid learning and dynamic stepwise updating algorithm for flat neural networks and the application to time-series prediction

    IEEE Trans. Syst. Man Cybern. Part B: Cybern.

    (1999)
  • H.-M. Chi et al.

    A statistical self-organizing learning system for remote sensing classification

    IEEE Trans. Geosci. Remote Sens.

    (2005)
  • L. Cun et al.

    Handwritten digit recognition with a back-propagation network

    Advances in Neural Information Processing Systems

    (1990)
  • S. Dehuri et al.

    A comprehensive survey on functional link neural networks and an adaptive pso–bp learning for cflnn

    Neural Comput. Appl.

    (2010)
  • J. Demšar

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • J.S. Denker et al.

    Neural network recognizer for hand-written zip code digits

    Advances in Neural Information Processing Systems

    (1989)
  • M. Fernández-Delgado et al.

    Do we need hundreds of classifiers to solve real world classification problems?

    J. Mach. Learn. Res.

    (2014)
  • Cited by (353)

    View all citing articles on Scopus
    View full text