An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise

Zhang, Peng; Zhu, Xingquan; Shi, Yong; Wu, Xindong

doi:10.1007/978-3-642-01307-2_109

Peng Zhang²³,
Xingquan Zhu²⁴,
Yong Shi^23,25 &
…
Xindong Wu²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5476))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3248 Accesses
13 Citations

Abstract

Recent years have witnessed a large body of research work on mining concept drifting data streams, where a primary assumption is that the up-to-date data chunk and the yet-to-come data chunk share identical distributions, so classifiers with good performance on the up-to-date chunk would also have a good prediction accuracy on the yet-to-come data chunk. This “stationary assumption”, however, does not capture the concept drifting reality in data streams. More recently, a “learnable assumption” has been proposed and allows the distribution of each data chunk to evolve randomly. Although this assumption is capable of describing the concept drifting in data streams, it is still inadequate to represent real-world data streams which usually suffer from noisy data as well as the drifting concepts. In this paper, we propose a Realistic Assumption which asserts that the difficulties of mining data streams are mainly caused by both concept drifting and noisy data chunks. Consequently, we present a new Aggregate Ensemble (AE) framework, which trains base classifiers using different learning algorithms on different data chunks. All the base classifiers are then combined to form a classifier ensemble through model averaging. Experimental results on synthetic and real-world data show that AE is superior to other ensemble methods under our new realistic assumption for noisy data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fan, W.: Systematic data selection to mine concept-drifting data streams. In: Proc. of KDD 2004, pp. 128–137 (2004)
Google Scholar
Kolter, J., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: Proc. of ICML 2005, pp. 449–456 (2005)
Google Scholar
Scholz, M., Klinkenberg, R.: An ensemble classifier for drifting concepts. In: Proc. of ECML/PKDD 2005 Workshop on Knowledge Discovery in Data Streams, pp. 53–64 (2005)
Google Scholar
Wang, H., Fan, W., Yu, P., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proc. of KDD 2003, pp. 226–235 (2003)
Google Scholar
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23, 69–101 (1996)
Google Scholar
Street, W., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proc. of KDD 2001, pp. 377–382 (2001)
Google Scholar
Wang, H., et al.: Suppressing model overfitting in mining concept-drifting data streams. In: Proc. of KDD 2006, pp. 736–741 (2006)
Google Scholar
Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from data streams. In: Proc. of ICDM 2007, pp. 757–762 (2007)
Google Scholar
Gao, J., Fan, W., Han, J.: On appropriate assumptions to mine data streams: Analysis and Practice. In: Proc. of ICDM 2007, pp. 143–152 (2007)
Google Scholar
Zhang, P., Zhu, X., Shi, Y.: Categorizing and mining concept drifting data streams. In: Proc. of KDD 2008, pp. 812–820 (2008)
Google Scholar
Yang, Y., Wu, X., Zhu, X.: Combining proactive and reactive predictions of data streams. In: Proc. of KDD 2005, pp. 710–715 (2005)
Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proc. of KDD 2000, pp. 71–80 (2000)
Google Scholar
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proc. of KDD 2001, pp. 97–106 (2001)
Google Scholar
Witten, I., Frank, E.: Data mining: practical machine learning tools and techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository, Irvine, CA (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

FEDS Center, Chinese Academy of Sciences, Beijing, 100190, China
Peng Zhang & Yong Shi
Dept. of Computer Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, 33431, USA
Xingquan Zhu
College of Inform. Sci. & Tech., Univ. of Nebraska at Omaha, Omaha, NE 68182, USA
Yong Shi
Dept. of Computer Science, University of Vermont,Burlington, Vermont, 05405, USA
Xindong Wu

Authors

Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xingquan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sirindhorn International Institute of Technology, Thammasat University, 131 Moo 5 Tiwanont Road, 12000, Bangkadi, Muang, Pathumthani, Thailand
Thanaruk Theeramunkong
Dept. of Computer Engineering, Faculty of Engineering, Chulalongkorn University, 10330, Bangkok, Thailand
Boonserm Kijsirikul
Faculty of Science & Engineering, York University, 355 Lumbers Building, 4700 Keele Street, M3J 1P3, Toronto, Ontario, Canada
Nick Cercone
School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, 923-1292, Ishikawa, Japan
Tu-Bao Ho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, P., Zhu, X., Shi, Y., Wu, X. (2009). An Aggregate Ensemble for Mining Concept Drifting Data Streams with Noise. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_109

Download citation

DOI: https://doi.org/10.1007/978-3-642-01307-2_109
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics