Abstract
Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.
Similar content being viewed by others
References
Aggarwal CC (2009) Data streams: an overview and scientific applications. Springer, Berlin, pp 377–397. doi:10.1007/978-3-642-02788-8_14
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, VLDB Endowment, 29:81–92
Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342
Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and Applied Mathematics, pp 633–634
Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, 6:77–86
Bartlett PL, Ben-David S, Kulkarni SR (2000) Learning changing concepts by exploiting the structure of change. Mach Learn 41(2):153–174
Bifet A, Gama J, Pechenizkiy M, Zliobaite I (2011) Handling concept drift: importance, challenges and solutions. PAKDD-2011 Tutorial, Shenzhen, China
Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: Advances in intelligent data analysis VIII, Springer, 249–260
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010a) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Bifet A, Holmes G, Pfahringer B (2010b) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases, Springer, pp 135–150
Bose RJC, van der Aalst WMP, Zliobaite I, Pechenizkiy M (2011) Handling concept drift in process mining. In: Haralambos M, Colette R (eds) Advanced information systems engineering., Lecture notes in computer science, Springer, Berlin, pp 391–405. doi:10.1007/978-3-642-21640-4_30
Brzezinski D (2014a) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. Neural Netw Learn Syst IEEE Trans 25(1):81–94. doi:10.1109/TNNLS.2013.2251352
Brzeziński D (2010) Mining data streams with concept drift. Master’s thesis, Poznan University of Technology
Brzezinski D, Stefanowski J (2014b) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. Neural Netw Learn Syst IEEE Trans 25(1):81–94
Brzezinski D, Stefanowski J (2014c) Prequential AUC for classifier evaluation and drift detection in evolving data streams. In: Proceedings of the 3rd international workshop on new frontiers in mining complex patterns, Nancy
Cieslak DA, Chawla NV (2009) A framework for monitoring classifiers performance: when and why failure occurs? Knowl Inform Syst 18(1):83–108 ISSN 0219-1377
Dongre PB, Malik LG (2014) A review on real time data stream classification and adapting to various concept drift scenarios. In: Advance computing conference (IACC), 2014 IEEE international, pp 533–537, doi:10.1109/IAdCC.2014.6779381
Dries Anton, Rückert Ulrich (2009) Adaptive concept drift detection. Stat Anal Data Min 2(5–6):311–327
Gaber Mohamed Medhat, Zaslavsky Arkady, Krishnaswamy Shonali (2005) Mining data streams: a review. ACM Sigmod Rec 34(2):18–26
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37. doi:10.1145/2523813 ISSN 0360–0300
Gama J, Rodrigues P (2009) An overview on mining data streams, volume 206 of studies in computational intelligence. Springer, Berlin. doi:10.1007/978-3-642-01091-0_2
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In Ana LC, Bazzan, Sofiane L (ed), Advances in artificial intelligence SBIA
Gama J, Medas P, G Castillo, Rodrigues P (2004) Learning with drift detection. Advances in artificial intelligence—SBIA 2004. Springer, New York, pp 286–295
Gomes JB, Menasalvas E, Sousa PAC (2011) Learning recurring concepts from data streams with a context-aware ensemble. In: Proceedings of the 2011 ACM symposium on applied computing, SAC ’11, ACM, New York, pp 994–999. doi:10.1145/1982185.1982403
Hoens TR, Chawla NV, Polikar R (2011) Heuristic updatable weighted random subspaces for non-stationary environments. In Diane JC, Jian P, Wei W, Osmar RZ, Xindong W (ed), IEEE international conference on data mining, ICDM-11, IEEE, pp 241–250
Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1(1):89–101. doi:10.1007/s13748-011-0008-0
Huang DTJ, Koh YS, Gillian D, Pears R (2013) Tracking drift types in changing data streams. In: Hiroshi M, Wu Z, Cao L, Zaiane O, Min Y, Wei W (eds) Advanced data mining and applications. Lecture notes in computer science. Springer, Berlin, pp 72–83. doi:10.1007/978-3-642-53914-5_7
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD-01, ACM, pp 97–106
Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. ACM SIGMOD Rec 35(1):14–19
Kelly MG, Hand DJ, Adams NM (1999) The impact of changing populations on classifier performance. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-99, New York, ACM, pp 367–371. doi:10.1145/312129.312285
Kosina Petr, Gama João, Sebastião Raquel (2010) Drift severity metric. European Conference on Artificial Intelligence, ECAI 2010:1119–1120
Krempl G, Zliobaite I, Brzezinski D, Hullermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. In: ACM SIGKDD explorations newsletter, vol 16–1, pp 1–10
Kuh A, Petsche T, Rivest RL (1991) Learning time-varying concepts. In: Advances in neural information processing systems, pp 183–189
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Kuncheva LI (2004) Classifier ensembles for changing environments. In: Multiple Classifier Systems. Springer, pp 1–15
Masud MM, Gao J, Khan L, Han J, Thuraisingham B (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874
Michalski RS (1983) A theory and methodology of inductive learning. Springer, New York
Minku FL, Yao X (2009) Using diversity to handle concept drift in on-line learning. In: International joint conference on neural networks, IJCNN-09, IEEE, pp 2125–2132
Minku LL, White AP, Xin Y (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742. doi:10.1109/TKDE.2009.156 ISSN 1041–4347
Moreno-Torres Jose G, Raeder Troy, Alaiz-Rodrguez Rocio, Chawla Nitesh V, Herrera Francisco (2012) A unifying view on dataset shift in classification. Pattern Recognit 45(1):521–530 ISSN 0031-3203
Narasimhamurthy A, Kuncheva L (2007) A framework for generating data to simulate changing environments. In: Proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications, ACTA Press, 549: p 389
Nguyen H-L, Woon Y-K, Ng W-K, Wan L (2012) Heterogeneous ensemble for feature drifts in data streams. In: Advances in knowledge discovery and data mining. Springer, pp 1–12
Nguyen H-L, Woon Y-K, Ng W-K (2014) A survey on data stream clustering and classification. Knowl Inf Syst pp 1–35
Nishida Kyosuke, Yamauchi K (2007) Detecting concept drift using statistical testing. In: Discovery Science, Springer, pp 264–269
Oza NC, Russell S (2001) Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, Morgan Kaufmann pp 105–112
Pfahringer B, Holmes G, Kirkby R (2007) New options for Hoeffding trees. In: Mehmet O, John T (eds) AI 2007: advances in artificial intelligence, 4830th edn., Lecture notes in computer scienceSpringer, New York, pp 90–99. doi:10.1007/978-3-540-76928-6_11
Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. The MIT Press, Cambridge
Shaker A, Hullermeier E (2015) Recovery analysis for adaptive learning from non-stationary data streams. In: Neurocomputing, ScienceDirect, pp 250–264
Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, pp 187–198
Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, The University of Dublin, Trinity College, Department of Computer Science, Dublin
Wetzel L (2009) Types and tokens. In: Zalta EN (ed) The Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/archives/spr2014/entries/types-tokens/
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-03, New York, ACM, pp 226–235. doi:10.1145/956750.956778
Wang H, Fan W, Yu PS, Han J (2003b) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-03, ACM, pp 226–235
Wang S, Minku LL, Ghezzi D, Caltabiano D, Tino P, Yao X (2013) Concept drift detection for online class imbalance learning. In: The 2013 international joint conference on neural Network, IJCNN-13, IEEE, pp 1–10
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101. doi:10.1007/BF00116900 ISSN 0885–6125
Zhang P, Zhu X, Shi Y (2008) Categorizing and mining concept drifting data streams. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD-08, ACM, pp 812–820. doi:10.1145/1401890.1401987
Zliobaite I (2010) Learning under concept drift: an overview. Technical report
Zliobaite I (2014) Controlled permutation for testing adaptive learning models. Knowledge and information systems, vol 39. Springer, London, pp 565–578
Acknowledgments
We are grateful to David Albrecht, Mark Carman, Bart Goethals, Nayyar Zaidi and the anonymous reviewers for valuable comments and suggestions. This research has been supported by the Australian Research Council under grant DP140100087 and Asian Office of Aerospace Research and Development, Air Force Office of Scientific Research under contract FA2386-15-1-4007.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Responsible editor: Johannes Fürnkranz.
Rights and permissions
About this article
Cite this article
Webb, G.I., Hyde, R., Cao, H. et al. Characterizing concept drift. Data Min Knowl Disc 30, 964–994 (2016). https://doi.org/10.1007/s10618-015-0448-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-015-0448-4