Abstract
Recent research shows that rule based models perform well while classifying large data sets such as data streams with concept drifts. A genetic algorithm is a strong rule based classification algorithm which is used only for mining static small data sets. If the genetic algorithm can be made scalable and adaptable by reducing its I/O intensity, it will become an efficient and effective tool for mining large data sets like data streams. In this paper a scalable and adaptable online genetic algorithm is proposed to mine classification rules for the data streams with concept drifts. Since the data streams are generated continuously in a rapid rate, the proposed method does not use a fixed static data set for fitness calculation. Instead, it extracts a small snapshot of the training example from the current part of data stream whenever data is required for the fitness calculation. The proposed method also builds rules for all the classes separately in a parallel independent iterative manner. This makes the proposed method scalable to the data streams and also adaptable to the concept drifts that occur in the data stream in a fast and more natural way without storing the whole stream or a part of the stream in a compressed form as done by the other rule based algorithms. The results of the proposed method are comparable with the other standard methods which are used for mining the data streams.
Similar content being viewed by others
References
Aguilar-Ruiz JS, Riquelme JC, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern B 33(2): 324–331
Araujo DLA, Lopes HS, Freitas AA (1999) A parallel genetic algorithm for rule discovery in large databases. In: Proceedings of IEEE systems, man and cybernetics conference, vol 3, Tokyo, pp 940–945
Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona
Bacardit J, Krasnogor N (2006a) Empirical evaluation of ensemble techniques for a Pittsburgh learning classifier system. In: 9th International workshop on learning classifier systems (IWLCS 2006), Lecture Notes in Artificial Intelligence. Springer
Bacardit J, Krasnogor N (2006b) Biohel: Bioinformatics-oriented hierarchical evolutionary learning. Nottingham eprints, University of Nottingham
Bacardit J, Stout M, Hirst JD, Sastry K, LloráX, Krasnogor N (2007) Automated alphabet reduction method with evolutionary algorithms for protein structure prediction. In: Proceedings of the 9th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 346–353
Bacardit J, Burke EK, Krasnogor N (2009) Improving the scalability Of rule-based evolutionary learning. Memet Comput 1(1): 55–67
Dehuri S, Mall R (2006) Predictive and comprehensible rule discovery using a multi-objective genetic algorithm. Knowl Based Syst 19: 413–421
De Jong KA, Spears WM (1991) Learning concept classification rules using genetic algorithms. In: Proceedings of the international joint conference on artificial intelligence. Morgan Kaufmann, pp 651–656
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings KDD 2000. ACM Press, New York, pp 71–80
Freitas AA (2002) Data mining and Knowledge discovery with evolutionary algorithms. Springer, New York
Gao J, Ding B, Fan W, Han J, Yu PS (2008) Classifying data streams with skewed class distributions and concept drifts. In: IEEE internet computing, special issue on data stream management(IEEEIC), Nov/Dec 2008, pp 37–49
Guan S-U, Zhu F (2005) An incremental approach to genetic-algorithms-based classification. IEEE Trans Syst Man Cybern B Cybern 35(2)
Guan SU, ZhuCollard F (2005) An incremental approach to genetic-algorithms based classification. IEEE Trans Syst Man Cybern B 35(2): 227–239
Hashemi S, Yang Y, Mirzamomen Z, Kangavari M (2009) Adapted one-versus-all decision trees for data stream classification. IEEE Trans Knowl Data Eng 21(5): 624–637
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings KDD 2001. ACM Press, New York, pp 97–106
Janikow CZ (1993) A knowledge-intensive genetic algorithm for supervised learning. Mach Learn 13(2–3): 189–228
Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: IEEE international conference on data mining
Kwedlo W, Kretowski M (1998, Sep 23–26) Discovery of decision rules from databases: an evolutionary approach. In: 2nd European symposium, PKDD’98, Nantes
Lazarescu M, Venkatesh S, Hung BH (2004) Using multiple windows to track concept drift. Intell Data Anal J 8(1): 29–59
Noda E, Freitas AA, Lopes HS (1999) Discovering interesting prediction rule with a genetic algorithm. In: Proceedings of the 1999 congress on evolutionary computation, vol 2
Rivera W (2004) Scalable parallel genetic algorithms. Artif Intell Rev 16: 153–168
Shi X-J, Lei H (2008) A genetic algorithm-based approach for classification rule discovery. In: International conference on information management, innovation management and industrial engineering, vol 1, pp 175–178
To C, Vohradsky J (2007) Binary classification using parallel genetic algorithm. In: IEEE Congress on Evolutionary Computation 2007, pp 1281–1287
Tsymbal A (2004) The problem of concept drift: definitions and related work. Department of Computer Science, Trinity College Dublin, Tech. Rep. TCD-CS-2004-15
Venturini G (1993) A supervised inductive algorithm with genetic search for learning attributes based concepts. In: Brazdil PB (ed) Machine learning: ECML-93—Proceedings of the European conference on machine learning. Springer, Berlin, pp 280–296
Verma A, Llorá X, Goldberg DE, Campbell RH (2009) Scaling genetic algorithms using MapReduce. In: 9th International conference on intelligent systems design and applications, pp 13–18
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–235
Wang P, Wang H, Wu X, Wang W, Shi B (2007) A low-granularity classifier for data streams with concept drifts and biased class distribution. IEEE Trans Knowl Data Eng 19(9)
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1): 69–101
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2): 149–175
Xue Z, Guo Y (2007, March) Improved cultural algorithm based on genetic algorithm. In: IEEE international conference on integration technology, 2007 (ICIT ’07), pp 117–122
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vivekanandan, P., Nedunchezhian, R. Mining data streams with concept drifts using genetic algorithm. Artif Intell Rev 36, 163–178 (2011). https://doi.org/10.1007/s10462-011-9209-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-011-9209-y