ABSTRACT
Autonomous construction of deep neural network (DNNs) is desired for data streams because it potentially offers two advantages: proper model's capacity and quick reaction to drift and shift. While self-organizing mechanism of DNNs remains an open issue, this task is even more challenging to be developed for standard multi-layer DNNs than that using the different-depth structures, because addition of a new layer results in information loss of previously trained knowledge. A Neural Network with Dynamically Evolved Capacity (NADINE) is proposed in this paper. NADINE features a fully open structure where its network structure, depth and width, can be automatically evolved from scratch in the online manner and without the use of problem-specific thresholds. NADINE is structured under a standard MLP architecture and the catastrophic forgetting issue during the hidden layer addition phase is resolved using the proposal of soft-forgetting and adaptive memory methods. The advantage of NADINE, namely elastic structure and online learning trait, is numerically validated using nine data stream classification and regression problems where it demonstrates performance's improvement over prominent algorithms in all problems. In addition, it is capable of dealing with data stream regression and classification problems equally well.
- Andri Ashfahani and Mahardhika Pratama. 2019. Autonomous Deep Learning: Continual Learning Approach for Dynamic Environments. In In SIAM International Conference on Data Mining.Google Scholar
- P. Baldi, Paul D. Sadowski, and Daniel Whiteson. 2014. Searching for exotic particles in high-energy physics with deep learning. Nature communications 5 (2014), 4308.Google Scholar
- Andrea Coraddu, Luca Oneto, Alessandro Ghio, Stefano Savio, Davide Anguita, and Massimo Figari. 2014. Machine Learning Approaches for Improving Condition? Based Maintenance of Naval Propulsion Plants. Journal of Engineering for the Maritime Environment --, -- (2014), --.Google Scholar
- G. Ditzler and R. Polikar. 2013. Incremental Learning of Concept Drift from Streaming Imbalanced Data. IEEE Trans. on Knowl. and Data Eng. 25, 10 (Oct. 2013), 2283--2301.Google ScholarDigital Library
- R. Elwell and R. Polikar. 2011. Incremental Learning of Concept Drift in Nonstationary Environments. Trans. Neur. Netw. 22, 10 (Oct. 2011), 1517--1531.Google ScholarDigital Library
- I. Frias-Blanco, J. d. Campo-Avila, G. Ramos-Jimenez, R. Morales-Bueno, A. Ortiz- Diaz, and Y. Caballero-Mota. 2015. Online and Non-Parametric Drift Detection Methods Based on Hoeffdings Bounds. IEEE Transactions on Knowledge and Data Engineering 27, 3 (March 2015), 810--823. https://doi.org/10.1109/TKDE.2014. 2345382Google ScholarDigital Library
- Joao Gama. 2010. Knowledge Discovery from Data Streams (1st ed.). Chapman & Hall/CRC. 8] João Gama, Ricardo Fernandes, and Ricardo Rocha. 2006. Decision Trees for Mining Data Streams. Intell. Data Anal. 10, 1 (Jan. 2006), 23--45.Google Scholar
- João Gama, Indre Zliobaite, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A Survey on Concept Drift Adaptation. ACM Comput. Surv. 46, 4, Article 44 (March 2014), 37 pages.Google ScholarDigital Library
- Young Hun Jung, Jack Goetz, and Ambuj Tewari. [n. d.]. Online multiclass boosting. In Advances in Neural Information Processing Systems 30.Google Scholar
- James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 2016. Overcoming catastrophic forgetting in neural networks. http://arxiv.org/abs/1612.00796 cite arxiv:1612.00796.Google Scholar
- David Lopez-Paz and Marc' Aurelio Ranzato. [n. d.]. Gradient Episodic Memory for Continual Learning. In Advances in Neural Information Processing Systems 30.Google Scholar
- Guido F. Montúfar, Razvan Pascanu, KyungHyun Cho, and Yoshua Bengio. 2014. On the Number of Linear Regions of Deep Neural Networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada. 2924--2932. http://papers.nips.cc/paper/ 5422-on-the-number-of-linear-regions-of-deep-neural-networksGoogle Scholar
- Masud Moshtaghi, James C. Bezdek, Christopher Leckie, Shanika Karunasekera, and Marimuthu Palaniswami. 2015. Evolving Fuzzy Rules for Anomaly Detection in Data Streams. IEEE Trans. Fuzzy Systems 23, 3 (2015), 688--700. https://doi. org/10.1109/TFUZZ.2014.2322385Google ScholarDigital Library
- Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.Google ScholarDigital Library
- C Oza Nikunj and January Russell Stuart. 2001. Online bagging and boosting. Jaakkola Tommi and Richardson Thomas, editors. In Eighth International Workshop on Artificial Intelligence and Statistics. 105--112.Google Scholar
- Ali Pesaranghader, Herna Viktor, and Eric Paquet. 2018. Reservoir of diverse adaptive learners and stacking fast hoeffding drift detection methods for evolving data streams. Machine Learning (01 Jun 2018). https://doi.org/10.1007/ s10994-018--5719-zGoogle Scholar
- M. Pratama, W. Pedrycz, and E. Lughofer. 2018. Evolving Ensemble Fuzzy Classifier. IEEE Transactions on Fuzzy Systems (2018), 1--1.Google Scholar
- "Andrei A. Rusu"; "Neil C. Rabinowitz". 2016. Progressive Neural Networks. (2016).Google Scholar
- D. Sahoo, Q. D. Pham, J. Lu, and S. C. Hoi. 2017. Online Deep Learning: Learning Deep Neural Networks on the Fly. arXiv preprint arXiv:1711.03705 abs/1711.03705 (2017).Google Scholar
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized Experience Replay. In International Conference on Learning Representations. Puerto Rico.Google Scholar
- Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou. 2018. Overcoming Catastrophic Forgetting with Hard Attention to the Task. In Proceedings of the 35th International Conference on Machine Learning. 4548--4557.Google Scholar
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Rupesh K Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino Gomez, and Jürgen Schmidhuber. [n. d.]. Compete to Compute. In Advances in Neural Information Processing Systems 26.Google Scholar
- Salvatore J. Stolfo,Wei Fan,Wenke Lee, Andreas Prodromidis, and Philip K. Chan. 2000. Cost-based Modeling for Fraud and Intrusion Detection: Results from the JAM Project. In In Proceedings of the 2000 DARPA Information Survivability Conference and Exposition. IEEE Computer Press, 130--144.Google Scholar
- Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, and Geoffrey J. Gordon. 2019. An Empirical Study of Example Forgetting during Deep Neural Network Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=BJlxm30cKmGoogle Scholar
- D. H. Wolpert. 2016. The Power of Depth for Feed-forward Neural Networks. Journal of Machine Learning Research 49 (2016), 1--39.Google Scholar
- Lu Yingwei, N. Sundararajan, and P. Saratchandran. 1997. A Sequential Learning Scheme for Function Approximation Using Minimal Radial Basis Function Neural Networks. Neural Comput. 9, 2 (Feb. 1997), 461--478.Google ScholarDigital Library
- Jaehong Yoon, Eunho Yang, Jeongtae Lee, and Sung Ju Hwang. 2018. Lifelong Learning with Dynamically Expandable Networks. ICLR.Google Scholar
- Guanyu Zhou, Kihyuk Sohn, and Honglak Lee. 2012. Online incremental feature learning with denoising autoencoders. Journal of Machine Learning Research 22 (2012), 1453--1461.Google Scholar
- Automatic Construction of Multi-layer Perceptron Network from Streaming Examples
Recommendations
Extraction of voltage harmonics using multi-layer perceptron neural network
This paper presents a harmonic extraction algorithm using artificial neural networks for Dynamic Voltage Restorers (DVRs). The suggested algorithm employs a feed forward Multi Layer Perceptron (MLP) Neural Network with error back propagation learning to ...
Policing function in ATM network using multi-layer neural network
LCN '96: Proceedings of the 21st Annual IEEE Conference on Local Computer NetworksArtificial neural networks provide an attractive alternative in performing the policing function at the user network interface (UNI) of an asynchronous transfer mode (ATM) network. In order to guarantee quality of service (QOS) for the established ...
Learning to Learn and Compositionality with Deep Recurrent Neural Networks: Learning to Learn and Compositionality
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningDeep neural network representations play an important role in computer vision, speech, computational linguistics, robotics, reinforcement learning and many other data-rich domains. In this talk I will show that learning-to-learn and compositionality are ...
Comments