Characterizing concept drift

Webb, Geoffrey I.; Hyde, Roy; Cao, Hong; Nguyen, Hai Long; Petitjean, Francois

doi:10.1007/s10618-015-0448-4

Characterizing concept drift

Published: 15 April 2016

Volume 30, pages 964–994, (2016)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

6087 Accesses
262 Citations
27 Altmetric
2 Mentions
Explore all metrics

Abstract

Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A random forest guided tour

Article 19 April 2016

A survey on semi-supervised learning

Article Open access 15 November 2019

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Aggarwal CC (2009) Data streams: an overview and scientific applications. Springer, Berlin, pp 377–397. doi:10.1007/978-3-642-02788-8_14
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, VLDB Endowment, 29:81–92
Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342
Google Scholar
Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and Applied Mathematics, pp 633–634
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, 6:77–86
Bartlett PL, Ben-David S, Kulkarni SR (2000) Learning changing concepts by exploiting the structure of change. Mach Learn 41(2):153–174
Article MATH Google Scholar
Bifet A, Gama J, Pechenizkiy M, Zliobaite I (2011) Handling concept drift: importance, challenges and solutions. PAKDD-2011 Tutorial, Shenzhen, China
Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: Advances in intelligent data analysis VIII, Springer, 249–260
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010a) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Bifet A, Holmes G, Pfahringer B (2010b) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases, Springer, pp 135–150
Bose RJC, van der Aalst WMP, Zliobaite I, Pechenizkiy M (2011) Handling concept drift in process mining. In: Haralambos M, Colette R (eds) Advanced information systems engineering., Lecture notes in computer science, Springer, Berlin, pp 391–405. doi:10.1007/978-3-642-21640-4_30
Brzezinski D (2014a) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. Neural Netw Learn Syst IEEE Trans 25(1):81–94. doi:10.1109/TNNLS.2013.2251352
Article Google Scholar
Brzeziński D (2010) Mining data streams with concept drift. Master’s thesis, Poznan University of Technology
Brzezinski D, Stefanowski J (2014b) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. Neural Netw Learn Syst IEEE Trans 25(1):81–94
Article Google Scholar
Brzezinski D, Stefanowski J (2014c) Prequential AUC for classifier evaluation and drift detection in evolving data streams. In: Proceedings of the 3rd international workshop on new frontiers in mining complex patterns, Nancy
Cieslak DA, Chawla NV (2009) A framework for monitoring classifiers performance: when and why failure occurs? Knowl Inform Syst 18(1):83–108 ISSN 0219-1377
Article Google Scholar
Dongre PB, Malik LG (2014) A review on real time data stream classification and adapting to various concept drift scenarios. In: Advance computing conference (IACC), 2014 IEEE international, pp 533–537, doi:10.1109/IAdCC.2014.6779381
Dries Anton, Rückert Ulrich (2009) Adaptive concept drift detection. Stat Anal Data Min 2(5–6):311–327
Article MathSciNet Google Scholar
Gaber Mohamed Medhat, Zaslavsky Arkady, Krishnaswamy Shonali (2005) Mining data streams: a review. ACM Sigmod Rec 34(2):18–26
Article MATH Google Scholar
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37. doi:10.1145/2523813 ISSN 0360–0300
Article MATH Google Scholar
Gama J, Rodrigues P (2009) An overview on mining data streams, volume 206 of studies in computational intelligence. Springer, Berlin. doi:10.1007/978-3-642-01091-0_2
Google Scholar
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In Ana LC, Bazzan, Sofiane L (ed), Advances in artificial intelligence SBIA
Gama J, Medas P, G Castillo, Rodrigues P (2004) Learning with drift detection. Advances in artificial intelligence—SBIA 2004. Springer, New York, pp 286–295
Chapter Google Scholar
Gomes JB, Menasalvas E, Sousa PAC (2011) Learning recurring concepts from data streams with a context-aware ensemble. In: Proceedings of the 2011 ACM symposium on applied computing, SAC ’11, ACM, New York, pp 994–999. doi:10.1145/1982185.1982403
Hoens TR, Chawla NV, Polikar R (2011) Heuristic updatable weighted random subspaces for non-stationary environments. In Diane JC, Jian P, Wei W, Osmar RZ, Xindong W (ed), IEEE international conference on data mining, ICDM-11, IEEE, pp 241–250
Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1(1):89–101. doi:10.1007/s13748-011-0008-0
Article Google Scholar
Huang DTJ, Koh YS, Gillian D, Pears R (2013) Tracking drift types in changing data streams. In: Hiroshi M, Wu Z, Cao L, Zaiane O, Min Y, Wei W (eds) Advanced data mining and applications. Lecture notes in computer science. Springer, Berlin, pp 72–83. doi:10.1007/978-3-642-53914-5_7
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD-01, ACM, pp 97–106
Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. ACM SIGMOD Rec 35(1):14–19
Article Google Scholar
Kelly MG, Hand DJ, Adams NM (1999) The impact of changing populations on classifier performance. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-99, New York, ACM, pp 367–371. doi:10.1145/312129.312285
Kosina Petr, Gama João, Sebastião Raquel (2010) Drift severity metric. European Conference on Artificial Intelligence, ECAI 2010:1119–1120
Krempl G, Zliobaite I, Brzezinski D, Hullermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. In: ACM SIGKDD explorations newsletter, vol 16–1, pp 1–10
Kuh A, Petsche T, Rivest RL (1991) Learning time-varying concepts. In: Advances in neural information processing systems, pp 183–189
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Kuncheva LI (2004) Classifier ensembles for changing environments. In: Multiple Classifier Systems. Springer, pp 1–15
Masud MM, Gao J, Khan L, Han J, Thuraisingham B (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874
Article Google Scholar
Michalski RS (1983) A theory and methodology of inductive learning. Springer, New York
Google Scholar
Minku FL, Yao X (2009) Using diversity to handle concept drift in on-line learning. In: International joint conference on neural networks, IJCNN-09, IEEE, pp 2125–2132
Minku LL, White AP, Xin Y (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742. doi:10.1109/TKDE.2009.156 ISSN 1041–4347
Article Google Scholar
Moreno-Torres Jose G, Raeder Troy, Alaiz-Rodrguez Rocio, Chawla Nitesh V, Herrera Francisco (2012) A unifying view on dataset shift in classification. Pattern Recognit 45(1):521–530 ISSN 0031-3203
Article Google Scholar
Narasimhamurthy A, Kuncheva L (2007) A framework for generating data to simulate changing environments. In: Proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications, ACTA Press, 549: p 389
Nguyen H-L, Woon Y-K, Ng W-K, Wan L (2012) Heterogeneous ensemble for feature drifts in data streams. In: Advances in knowledge discovery and data mining. Springer, pp 1–12
Nguyen H-L, Woon Y-K, Ng W-K (2014) A survey on data stream clustering and classification. Knowl Inf Syst pp 1–35
Nishida Kyosuke, Yamauchi K (2007) Detecting concept drift using statistical testing. In: Discovery Science, Springer, pp 264–269
Oza NC, Russell S (2001) Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, Morgan Kaufmann pp 105–112
Pfahringer B, Holmes G, Kirkby R (2007) New options for Hoeffding trees. In: Mehmet O, John T (eds) AI 2007: advances in artificial intelligence, 4830th edn., Lecture notes in computer scienceSpringer, New York, pp 90–99. doi:10.1007/978-3-540-76928-6_11
Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. The MIT Press, Cambridge
Google Scholar
Shaker A, Hullermeier E (2015) Recovery analysis for adaptive learning from non-stationary data streams. In: Neurocomputing, ScienceDirect, pp 250–264
Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, pp 187–198
Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, The University of Dublin, Trinity College, Department of Computer Science, Dublin
Wetzel L (2009) Types and tokens. In: Zalta EN (ed) The Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/archives/spr2014/entries/types-tokens/
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-03, New York, ACM, pp 226–235. doi:10.1145/956750.956778
Wang H, Fan W, Yu PS, Han J (2003b) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-03, ACM, pp 226–235
Wang S, Minku LL, Ghezzi D, Caltabiano D, Tino P, Yao X (2013) Concept drift detection for online class imbalance learning. In: The 2013 international joint conference on neural Network, IJCNN-13, IEEE, pp 1–10
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101. doi:10.1007/BF00116900 ISSN 0885–6125
Google Scholar
Zhang P, Zhu X, Shi Y (2008) Categorizing and mining concept drifting data streams. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD-08, ACM, pp 812–820. doi:10.1145/1401890.1401987
Zliobaite I (2010) Learning under concept drift: an overview. Technical report
Zliobaite I (2014) Controlled permutation for testing adaptive learning models. Knowledge and information systems, vol 39. Springer, London, pp 565–578
Google Scholar

Download references

Acknowledgments

We are grateful to David Albrecht, Mark Carman, Bart Goethals, Nayyar Zaidi and the anonymous reviewers for valuable comments and suggestions. This research has been supported by the Australian Research Council under grant DP140100087 and Asian Office of Aerospace Research and Development, Air Force Office of Scientific Research under contract FA2386-15-1-4007.

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Clayton, VIC, 3800, Australia
Geoffrey I. Webb, Roy Hyde & Francois Petitjean
McLaren Applied Technologies Pte Ltd APAC, Suntec Tower One, Singapore, 038987, Singapore
Hong Cao & Hai Long Nguyen

Authors

Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar
Roy Hyde
View author publications
You can also search for this author in PubMed Google Scholar
Hong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Long Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Francois Petitjean
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoffrey I. Webb.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Webb, G.I., Hyde, R., Cao, H. et al. Characterizing concept drift. Data Min Knowl Disc 30, 964–994 (2016). https://doi.org/10.1007/s10618-015-0448-4

Download citation

Received: 01 March 2015
Accepted: 10 December 2015
Published: 15 April 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10618-015-0448-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Characterizing concept drift

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Characterizing concept drift

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation