Skip to main content
Log in

Characterizing concept drift

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Most machine learning models are static, but the world is dynamic, and increasing online deployment of learned models gives increasing urgency to the development of efficient and effective mechanisms to address learning in the context of non-stationary distributions, or as it is commonly called concept drift. However, the key issue of characterizing the different types of drift that can occur has not previously been subjected to rigorous definition and analysis. In particular, while some qualitative drift categorizations have been proposed, few have been formally defined, and the quantitative descriptions required for precise and objective understanding of learner performance have not existed. We present the first comprehensive framework for quantitative analysis of drift. This supports the development of the first comprehensive set of formal definitions of types of concept drift. The formal definitions clarify ambiguities and identify gaps in previous definitions, giving rise to a new comprehensive taxonomy of concept drift types and a solid foundation for research into mechanisms to detect and address concept drift.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Aggarwal CC (2009) Data streams: an overview and scientific applications. Springer, Berlin, pp 377–397. doi:10.1007/978-3-642-02788-8_14

  • Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Proceedings of the 29th international conference on very large data bases, VLDB Endowment, 29:81–92

  • Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342

    Google Scholar 

  • Babcock B, Datar M, Motwani R (2002) Sampling from a moving window over streaming data. In: Proceedings of the thirteenth annual ACM-SIAM symposium on discrete algorithms, Society for Industrial and Applied Mathematics, pp 633–634

  • Baena-Garcıa M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, 6:77–86

  • Bartlett PL, Ben-David S, Kulkarni SR (2000) Learning changing concepts by exploiting the structure of change. Mach Learn 41(2):153–174

    Article  MATH  Google Scholar 

  • Bifet A, Gama J, Pechenizkiy M, Zliobaite I (2011) Handling concept drift: importance, challenges and solutions. PAKDD-2011 Tutorial, Shenzhen, China

  • Bifet A, Gavaldà R (2009) Adaptive learning from evolving data streams. In: Advances in intelligent data analysis VIII, Springer, 249–260

  • Bifet A, Holmes G, Kirkby R, Pfahringer B (2010a) MOA: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  • Bifet A, Holmes G, Pfahringer B (2010b) Leveraging bagging for evolving data streams. In: Machine learning and knowledge discovery in databases, Springer, pp 135–150

  • Bose RJC, van der Aalst WMP, Zliobaite I, Pechenizkiy M (2011) Handling concept drift in process mining. In: Haralambos M, Colette R (eds) Advanced information systems engineering., Lecture notes in computer science, Springer, Berlin, pp 391–405. doi:10.1007/978-3-642-21640-4_30

  • Brzezinski D (2014a) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. Neural Netw Learn Syst IEEE Trans 25(1):81–94. doi:10.1109/TNNLS.2013.2251352

    Article  Google Scholar 

  • Brzeziński D (2010) Mining data streams with concept drift. Master’s thesis, Poznan University of Technology

  • Brzezinski D, Stefanowski J (2014b) Reacting to different types of concept drift: the accuracy updated ensemble algorithm. Neural Netw Learn Syst IEEE Trans 25(1):81–94

    Article  Google Scholar 

  • Brzezinski D, Stefanowski J (2014c) Prequential AUC for classifier evaluation and drift detection in evolving data streams. In: Proceedings of the 3rd international workshop on new frontiers in mining complex patterns, Nancy

  • Cieslak DA, Chawla NV (2009) A framework for monitoring classifiers performance: when and why failure occurs? Knowl Inform Syst 18(1):83–108 ISSN 0219-1377

    Article  Google Scholar 

  • Dongre PB, Malik LG (2014) A review on real time data stream classification and adapting to various concept drift scenarios. In: Advance computing conference (IACC), 2014 IEEE international, pp 533–537, doi:10.1109/IAdCC.2014.6779381

  • Dries Anton, Rückert Ulrich (2009) Adaptive concept drift detection. Stat Anal Data Min 2(5–6):311–327

    Article  MathSciNet  Google Scholar 

  • Gaber Mohamed Medhat, Zaslavsky Arkady, Krishnaswamy Shonali (2005) Mining data streams: a review. ACM Sigmod Rec 34(2):18–26

    Article  MATH  Google Scholar 

  • Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37. doi:10.1145/2523813 ISSN 0360–0300

    Article  MATH  Google Scholar 

  • Gama J, Rodrigues P (2009) An overview on mining data streams, volume 206 of studies in computational intelligence. Springer, Berlin. doi:10.1007/978-3-642-01091-0_2

    Google Scholar 

  • Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In Ana LC, Bazzan, Sofiane L (ed), Advances in artificial intelligence SBIA

  • Gama J, Medas P, G Castillo, Rodrigues P (2004) Learning with drift detection. Advances in artificial intelligence—SBIA 2004. Springer, New York, pp 286–295

    Chapter  Google Scholar 

  • Gomes JB, Menasalvas E, Sousa PAC (2011) Learning recurring concepts from data streams with a context-aware ensemble. In: Proceedings of the 2011 ACM symposium on applied computing, SAC ’11, ACM, New York, pp 994–999. doi:10.1145/1982185.1982403

  • Hoens TR, Chawla NV, Polikar R (2011) Heuristic updatable weighted random subspaces for non-stationary environments. In Diane JC, Jian P, Wei W, Osmar RZ, Xindong W (ed), IEEE international conference on data mining, ICDM-11, IEEE, pp 241–250

  • Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance: an overview. Prog Artif Intell 1(1):89–101. doi:10.1007/s13748-011-0008-0

    Article  Google Scholar 

  • Huang DTJ, Koh YS, Gillian D, Pears R (2013) Tracking drift types in changing data streams. In: Hiroshi M, Wu Z, Cao L, Zaiane O, Min Y, Wei W (eds) Advanced data mining and applications. Lecture notes in computer science. Springer, Berlin, pp 72–83. doi:10.1007/978-3-642-53914-5_7

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, KDD-01, ACM, pp 97–106

  • Jiang N, Gruenwald L (2006) Research issues in data stream association rule mining. ACM SIGMOD Rec 35(1):14–19

    Article  Google Scholar 

  • Kelly MG, Hand DJ, Adams NM (1999) The impact of changing populations on classifier performance. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-99, New York, ACM, pp 367–371. doi:10.1145/312129.312285

  • Kosina Petr, Gama João, Sebastião Raquel (2010) Drift severity metric. European Conference on Artificial Intelligence, ECAI 2010:1119–1120

  • Krempl G, Zliobaite I, Brzezinski D, Hullermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M, Stefanowski J (2014) Open challenges for data stream mining research. In: ACM SIGKDD explorations newsletter, vol 16–1, pp 1–10

  • Kuh A, Petsche T, Rivest RL (1991) Learning time-varying concepts. In: Advances in neural information processing systems, pp 183–189

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

  • Kuncheva LI (2004) Classifier ensembles for changing environments. In: Multiple Classifier Systems. Springer, pp 1–15

  • Masud MM, Gao J, Khan L, Han J, Thuraisingham B (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

  • Michalski RS (1983) A theory and methodology of inductive learning. Springer, New York

    Google Scholar 

  • Minku FL, Yao X (2009) Using diversity to handle concept drift in on-line learning. In: International joint conference on neural networks, IJCNN-09, IEEE, pp 2125–2132

  • Minku LL, White AP, Xin Y (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742. doi:10.1109/TKDE.2009.156 ISSN 1041–4347

    Article  Google Scholar 

  • Moreno-Torres Jose G, Raeder Troy, Alaiz-Rodrguez Rocio, Chawla Nitesh V, Herrera Francisco (2012) A unifying view on dataset shift in classification. Pattern Recognit 45(1):521–530 ISSN 0031-3203

    Article  Google Scholar 

  • Narasimhamurthy A, Kuncheva L (2007) A framework for generating data to simulate changing environments. In: Proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications, ACTA Press, 549: p 389

  • Nguyen H-L, Woon Y-K, Ng W-K, Wan L (2012) Heterogeneous ensemble for feature drifts in data streams. In: Advances in knowledge discovery and data mining. Springer, pp 1–12

  • Nguyen H-L, Woon Y-K, Ng W-K (2014) A survey on data stream clustering and classification. Knowl Inf Syst pp 1–35

  • Nishida Kyosuke, Yamauchi K (2007) Detecting concept drift using statistical testing. In: Discovery Science, Springer, pp 264–269

  • Oza NC, Russell S (2001) Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, Morgan Kaufmann pp 105–112

  • Pfahringer B, Holmes G, Kirkby R (2007) New options for Hoeffding trees. In: Mehmet O, John T (eds) AI 2007: advances in artificial intelligence, 4830th edn., Lecture notes in computer scienceSpringer, New York, pp 90–99. doi:10.1007/978-3-540-76928-6_11

  • Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. The MIT Press, Cambridge

    Google Scholar 

  • Shaker A, Hullermeier E (2015) Recovery analysis for adaptive learning from non-stationary data streams. In: Neurocomputing, ScienceDirect, pp 250–264

  • Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, pp 187–198

  • Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, The University of Dublin, Trinity College, Department of Computer Science, Dublin

  • Wetzel L (2009) Types and tokens. In: Zalta EN (ed) The Stanford Encyclopedia of Philosophy. http://plato.stanford.edu/archives/spr2014/entries/types-tokens/

  • Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-03, New York, ACM, pp 226–235. doi:10.1145/956750.956778

  • Wang H, Fan W, Yu PS, Han J (2003b) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, KDD-03, ACM, pp 226–235

  • Wang S, Minku LL, Ghezzi D, Caltabiano D, Tino P, Yao X (2013) Concept drift detection for online class imbalance learning. In: The 2013 international joint conference on neural Network, IJCNN-13, IEEE, pp 1–10

  • Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101. doi:10.1007/BF00116900 ISSN 0885–6125

    Google Scholar 

  • Zhang P, Zhu X, Shi Y (2008) Categorizing and mining concept drifting data streams. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD-08, ACM, pp 812–820. doi:10.1145/1401890.1401987

  • Zliobaite I (2010) Learning under concept drift: an overview. Technical report

  • Zliobaite I (2014) Controlled permutation for testing adaptive learning models. Knowledge and information systems, vol 39. Springer, London, pp 565–578

    Google Scholar 

Download references

Acknowledgments

We are grateful to David Albrecht, Mark Carman, Bart Goethals, Nayyar Zaidi and the anonymous reviewers for valuable comments and suggestions. This research has been supported by the Australian Research Council under grant DP140100087 and Asian Office of Aerospace Research and Development, Air Force Office of Scientific Research under contract FA2386-15-1-4007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffrey I. Webb.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Webb, G.I., Hyde, R., Cao, H. et al. Characterizing concept drift. Data Min Knowl Disc 30, 964–994 (2016). https://doi.org/10.1007/s10618-015-0448-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-015-0448-4

Keywords

Navigation