Skip to main content

Advertisement

Log in

Mining data streams with concept drifts using genetic algorithm

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Recent research shows that rule based models perform well while classifying large data sets such as data streams with concept drifts. A genetic algorithm is a strong rule based classification algorithm which is used only for mining static small data sets. If the genetic algorithm can be made scalable and adaptable by reducing its I/O intensity, it will become an efficient and effective tool for mining large data sets like data streams. In this paper a scalable and adaptable online genetic algorithm is proposed to mine classification rules for the data streams with concept drifts. Since the data streams are generated continuously in a rapid rate, the proposed method does not use a fixed static data set for fitness calculation. Instead, it extracts a small snapshot of the training example from the current part of data stream whenever data is required for the fitness calculation. The proposed method also builds rules for all the classes separately in a parallel independent iterative manner. This makes the proposed method scalable to the data streams and also adaptable to the concept drifts that occur in the data stream in a fast and more natural way without storing the whole stream or a part of the stream in a compressed form as done by the other rule based algorithms. The results of the proposed method are comparable with the other standard methods which are used for mining the data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aguilar-Ruiz JS, Riquelme JC, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern B 33(2): 324–331

    Article  Google Scholar 

  • Araujo DLA, Lopes HS, Freitas AA (1999) A parallel genetic algorithm for rule discovery in large databases. In: Proceedings of IEEE systems, man and cybernetics conference, vol 3, Tokyo, pp 940–945

  • Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona

  • Bacardit J, Krasnogor N (2006a) Empirical evaluation of ensemble techniques for a Pittsburgh learning classifier system. In: 9th International workshop on learning classifier systems (IWLCS 2006), Lecture Notes in Artificial Intelligence. Springer

  • Bacardit J, Krasnogor N (2006b) Biohel: Bioinformatics-oriented hierarchical evolutionary learning. Nottingham eprints, University of Nottingham

    Google Scholar 

  • Bacardit J, Stout M, Hirst JD, Sastry K, LloráX, Krasnogor N (2007) Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 346–353

  • Bacardit J, Burke EK, Krasnogor N (2009) Improving the scalability Of rule-based evolutionary learning. Memet Comput 1(1): 55–67

    Article  Google Scholar 

  • Dehuri S, Mall R (2006) Predictive and comprehensible rule discovery using a multi-objective genetic algorithm. Knowl Based Syst 19: 413–421

    Article  Google Scholar 

  • De Jong KA, Spears WM (1991) Learning concept classification rules using genetic algorithms. In: Proceedings of the international joint conference on artificial intelligence. Morgan Kaufmann, pp 651–656

  • Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings KDD 2000. ACM Press, New York, pp 71–80

  • Freitas AA (2002) Data mining and Knowledge discovery with evolutionary algorithms. Springer, New York

    MATH  Google Scholar 

  • Gao J, Ding B, Fan W, Han J, Yu PS (2008) Classifying data streams with skewed class distributions and concept drifts. In: IEEE internet computing, special issue on data stream management(IEEEIC), Nov/Dec 2008, pp 37–49

  • Guan S-U, Zhu F (2005) An incremental approach to genetic-algorithms-based classification. IEEE Trans Syst Man Cybern B Cybern 35(2)

  • Guan SU, ZhuCollard F (2005) An incremental approach to genetic-algorithms based classification. IEEE Trans Syst Man Cybern B 35(2): 227–239

    Article  Google Scholar 

  • Hashemi S, Yang Y, Mirzamomen Z, Kangavari M (2009) Adapted one-versus-all decision trees for data stream classification. IEEE Trans Knowl Data Eng 21(5): 624–637

    Article  Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings KDD 2001. ACM Press, New York, pp 97–106

  • Janikow CZ (1993) A knowledge-intensive genetic algorithm for supervised learning. Mach Learn 13(2–3): 189–228

    Article  Google Scholar 

  • Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: IEEE international conference on data mining

  • Kwedlo W, Kretowski M (1998, Sep 23–26) Discovery of decision rules from databases: an evolutionary approach. In: 2nd European symposium, PKDD’98, Nantes

  • Lazarescu M, Venkatesh S, Hung BH (2004) Using multiple windows to track concept drift. Intell Data Anal J 8(1): 29–59

    Google Scholar 

  • Noda E, Freitas AA, Lopes HS (1999) Discovering interesting prediction rule with a genetic algorithm. In: Proceedings of the 1999 congress on evolutionary computation, vol 2

  • Rivera W (2004) Scalable parallel genetic algorithms. Artif Intell Rev 16: 153–168

    Article  Google Scholar 

  • Shi X-J, Lei H (2008) A genetic algorithm-based approach for classification rule discovery. In: International conference on information management, innovation management and industrial engineering, vol 1, pp 175–178

  • To C, Vohradsky J (2007) Binary classification using parallel genetic algorithm. In: IEEE Congress on Evolutionary Computation 2007, pp 1281–1287

  • Tsymbal A (2004) The problem of concept drift: definitions and related work. Department of Computer Science, Trinity College Dublin, Tech. Rep. TCD-CS-2004-15

  • Venturini G (1993) A supervised inductive algorithm with genetic search for learning attributes based concepts. In: Brazdil PB (ed) Machine learning: ECML-93—Proceedings of the European conference on machine learning. Springer, Berlin, pp 280–296

    Google Scholar 

  • Verma A, Llorá X, Goldberg DE, Campbell RH (2009) Scaling genetic algorithms using MapReduce. In: 9th International conference on intelligent systems design and applications, pp 13–18

  • Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235

  • Wang P, Wang H, Wu X, Wang W, Shi B (2007) A low-granularity classifier for data streams with concept drifts and biased class distribution. IEEE Trans Knowl Data Eng 19(9)

  • Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1): 69–101

    Google Scholar 

  • Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2): 149–175

    Article  Google Scholar 

  • Xue Z, Guo Y (2007, March) Improved cultural algorithm based on genetic algorithm. In: IEEE international conference on integration technology, 2007 (ICIT ’07), pp 117–122

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Periasamy Vivekanandan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vivekanandan, P., Nedunchezhian, R. Mining data streams with concept drifts using genetic algorithm. Artif Intell Rev 36, 163–178 (2011). https://doi.org/10.1007/s10462-011-9209-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-011-9209-y

Keywords

Navigation