Elsevier

Neurocomputing

Volume 70, Issues 1–3, December 2006, Pages 489-501
Neurocomputing

Extreme learning machine: Theory and applications

https://doi.org/10.1016/j.neucom.2005.12.126Get rights and content

Abstract

It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1

Introduction

Feedforward neural networks have been extensively used in many fields due to their ability: (1) to approximate complex nonlinear mappings directly from the input samples; and (2) to provide models for a large class of natural and artificial phenomena that are difficult to handle using classical parametric techniques. On the other hand, there lack faster learning algorithms for neural networks. The traditional learning algorithms are usually far slower than required. It is not surprising to see that it may take several hours, several days, and even more time to train neural networks by using traditional methods.

From a mathematical point of view, research on the approximation capabilities of feedforward neural networks has focused on two aspects: universal approximation on compact input sets and approximation in a finite set of training samples. Many researchers have explored the universal approximation capabilities of standard multilayer feedforward neural networks. Hornik [7] proved that if the activation function is continuous, bounded and nonconstant, then continuous mappings can be approximated in measure by neural networks over compact input sets. Leshno [17] improved the results of Hornik [7] and proved that feedforward networks with a nonpolynomial activation function can approximate (in measure) continuous functions. In real applications, the neural networks are trained in finite training set. For function approximation in a finite training set, Huang and Babri [11] shows that a single-hidden layer feedforward neural network (SLFN) with at most N hidden nodes and with almost any nonlinear activation function can exactly learn N distinct observations. It should be noted that the input weights (linking the input layer to the first hidden layer) and hidden layer biases need to be adjusted in all these previous theoretical research works as well as in almost all practical learning algorithms of feedforward neural networks.

Traditionally, all the parameters of the feedforward networks need to be tuned and thus there exists the dependency between different layers of parameters (weights and biases). For past decades, gradient descent-based methods have mainly been used in various learning algorithms of feedforward neural networks. However, it is clear that gradient descent-based learning methods are generally very slow due to improper learning steps or may easily converge to local minima. And many iterative learning steps may be required by such learning algorithms in order to obtain better learning performance.

It has been shown [23], [10] that SLFNs (with N hidden nodes) with randomly chosen input weights and hidden layer biases (and such hidden nodes can thus be called random hidden nodes) can exactly learn N distinct observations. Unlike the popular thinking and most practical implementations that all the parameters of the feedforward networks need to be tuned, one may not necessarily adjust the input weights and first hidden layer biases in applications. In fact, some simulation results on artificial and real large applications in our work [16] have shown that this method not only makes learning extremely fast but also produces good generalization performance.

In this paper, we first rigorously prove that the input weights and hidden layer biases of SLFNs can be randomly assigned if the activation functions in the hidden layer are infinitely differentiable. After the input weights and the hidden layer biases are chosen randomly, SLFNs can be simply considered as a linear system and the output weights (linking the hidden layer to the output layer) of SLFNs can be analytically determined through simple generalized inverse operation of the hidden layer output matrices. Based on this concept, this paper proposes a simple learning algorithm for SLFNs called extreme learning machine (ELM) whose learning speed can be thousands of times faster than traditional feedforward network learning algorithms like back-propagation (BP) algorithm while obtaining better generalization performance. Different from traditional learning algorithms the proposed learning algorithm not only tends to reach the smallest training error but also the smallest norm of weights. Bartlett's [1] theory on the generalization performance of feedforward neural networks states for feedforward neural networks reaching smaller training error, the smaller the norm of weights is, the better generalization performance the networks tend to have. Therefore, the proposed learning algorithm tends to have good generalization performance for feedforward neural networks.

As the new proposed learning algorithm can be easily implemented, tends to reach the smallest training error, obtains the smallest norm of weights and the good generalization performance, and runs extremely fast, in order to differentiate it from the other popular SLFN learning algorithms, it is called the extreme learning machine in the context of this paper.

This paper is organized as follows. Section 2 rigorously proves that the input weights and hidden layer biases of SLFNs can be randomly assigned if the activation functions in the hidden layer are infinitely differentiable. Section 3 further proposes the new ELM learning algorithm for single-hidden layer feedforward neural networks (SLFNs). Performance evaluation is presented in Section 4. Discussions and conclusions are given in Section 5. The Moore–Penrose generalized inverse and the minimum norm least-squares solution of a general linear system which play an important role in developing our new ELM learning algorithm are briefed in the Appendix.

Section snippets

Single hidden layer feedforward networks (SLFNs) with random hidden nodes

For N arbitrary distinct samples (xi,ti), where xi=[xi1,xi2,,xin]TRn and ti=[ti1,ti2,,tim]TRm, standard SLFNs with N˜ hidden nodes and activation function g(x) are mathematically modeled asi=1N˜βigi(xj)=i=1N˜βig(wi·xj+bi)=oj,j=1,,N,where wi=[wi1,wi2,,win]T is the weight vector connecting the ith hidden node and the input nodes, βi=[βi1,βi2,,βim]T is the weight vector connecting the ith hidden node and the output nodes, and bi is the threshold of the ith hidden node. wi·xj denotes the

Proposed extreme learning machine (ELM)

Based on Theorems 2.1 and 2.2 we can propose in this section an extremely simple and efficient method to train SLFNs.

Performance evaluation

In this section, the performance of the proposed ELM learning algorithm3 is compared with the popular algorithms of feedforward neural networks like the conventional BP algorithm and support vector machines (SVMs) on quite a few benchmark real problems in the function approximation and classification areas. All the simulations for the BP and ELM algorithms are carried out in MATLAB 6.5 environment running in a Pentium 4, 1.9 GHZ CPU.

Discussions and conclusions

This paper proposed a simple and efficient learning algorithm for single-hidden layer feedforward neural networks (SLFNs) called extreme learning machine (ELM), which has also been rigorously proved in this paper. The proposed ELM has several interesting and significant features different from traditional popular gradient-based learning algorithms for feedforward neural networks:

  • (1)

    The learning speed of ELM is extremely fast. In our simulations, the learning phase of ELM can be completed in

Guang-Bin Huang received the B.Sc. degree in applied mathematics and M.Eng. degree in computer engineering from Northeastern University, PR China, in 1991 and 1994, respectively, and Ph.D. degree in electrical engineering from Nanyang Technological University, Singapore in 1999. During undergraduate period, he also concurrently studied in Wireless Communication Department of Northeastern University, PR China.

From June 1998 to May 2001, he worked as Research Fellow in Singapore Institute of

References (26)

  • K. Hornik

    Approximation capabilities of multilayer feedforward networks

    Neural Networks

    (1991)
  • M. Leshno et al.

    Multilayer feedforward networks with a nonpolynomial activation function can approximate any function

    Neural Networks

    (1993)
  • P.L. Bartlett

    The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

    IEEE Trans. Inf. Theory

    (1998)
  • C. Blake, C. Merz, UCI repository of machine learning databases, in:...
  • R. Collobert et al.

    A parallel mixtures of SVMs for very large scale problems

    Neural Comput.

    (2002)
  • S. Ferrari et al.

    Smooth function approximation using neural networks

    IEEE Trans. Neural Networks

    (2005)
  • Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: International Conference on Machine Learning,...
  • S. Haykin

    Neural Networks: A Comprehensive Foundation

    (1999)
  • C.-W. Hsu et al.

    A comparison of methods for multiclass support vector machines

    IEEE Trans. Neural Networks

    (2002)
  • G.-B. Huang, Learning capability of neural networks, Ph.D. Thesis, Nanyang Technological University, Singapore,...
  • G.-B. Huang

    Learning capability and storage capacity of two-hidden-layer feedforward networks

    IEEE Trans. Neural Networks

    (2003)
  • G.-B. Huang et al.

    Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation functions

    IEEE Trans. Neural Networks

    (1998)
  • G.-B. Huang et al.

    Classification ability of single hidden layer feedforward neural networks

    IEEE Trans. Neural Networks

    (2000)
  • Cited by (11669)

    View all citing articles on Scopus

    Guang-Bin Huang received the B.Sc. degree in applied mathematics and M.Eng. degree in computer engineering from Northeastern University, PR China, in 1991 and 1994, respectively, and Ph.D. degree in electrical engineering from Nanyang Technological University, Singapore in 1999. During undergraduate period, he also concurrently studied in Wireless Communication Department of Northeastern University, PR China.

    From June 1998 to May 2001, he worked as Research Fellow in Singapore Institute of Manufacturing Technology (formerly known as Gintic Institute of Manufacturing Technology) where he has led/implemented several key industrial projects. From May 2001, he has been working as an Assistant Professor in the Information Communication Institute of Singapore (ICIS), School of Electrical and Electronic Engineering, Nanyang Technological University. His current research interests include machine learning, computational intelligence, neural networks, and bioinformatics. He serves as an Associate Editor of Neurocomputing. He is a senior member of IEEE.

    Qin-Yu Zhu received the B.Eng. degree from Shanghai Jiao Tong University, China in 2001. He is currently a Ph.D. student with Information Communication Institute of Singapore, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. His research interests include neural networks and evolutionary algorithms. He has published a number of papers in international journals and conferences.

    Chee-Kheong Siew is currently an associate professor in the School of EEE, Nanyang Technological University (NTU). From 1995 to 2005, he served as the Head of Information Communication Institute of Singapore (ICIS) after he managed the transfer of ICIS to NTU and rebuilt the institute in the university environment. He obtained his B.Eng. in Electrical Engineering from University of Singapore in 1979 and M.Sc. in Communication Engineering, Imperial College in 1987. After six years in the industry, he joined NTU in 1986 and was appointed as the Head of the Institute in 1996. His current research interests include neural networks, packet scheduling, traffic shaping, admission control, service curves and admission control, QoS framework, congestion control and multipath routing. He is a member of IEEE.

    1

    For the preliminary idea of the ELM algorithm, refer to “Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks”, Proceedings of International Joint Conference on Neural Networks (IJCNN2004), Budapest, Hungary, 25–29 July, 2004.

    View full text