ABSTRACT
Associative classification is a well-known technique for structured data classification. Most previous works on associative classification use support based pruning for rule extraction, and usually set the threshold value to 1%. This threshold allows rule extraction to be tractable and on the average yields a good accuracy. We believe that this threshold may be not accurate in some cases, since the class distribution in the dataset is not taken into account. In this paper we investigate the effect of support threshold on classification accuracy. Lower support thresholds are often unfeasible with current extraction algorithms, or may cause the generation of a huge rule set. To observe the effect of varying the support threshold, we first propose a compact form to encode a complete rule set. We then develop a new classifier, named L3G, based on the compact form. Taking advantage of the compact form, the classifier can be built also with rather low support rules. We ran a variety of experiments with different support thresholds on datasets from the UCI machine learning database repository. The experiments showed that the optimal accuracy is obtained for variable threshold values, sometime lower than 1%.
- R. Agrawal, T. Imilienski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD'93.]] Google ScholarDigital Library
- E. Baralis and P. Garza. A lazy approach to pruning classification rules. In ICDM'02.]] Google ScholarDigital Library
- Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. SIGKDD Explorations, December 2000.]] Google ScholarDigital Library
- R. J. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD'99.]] Google ScholarDigital Library
- C. Blake and C. Merz. UCI repository of machine learning databases, 1998.]]Google Scholar
- J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal, 7(1), pp. 5--22, Kluwer Academics Publishers, May 2003.]] Google ScholarDigital Library
- S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing associations rules to correlations. In ACM SIGMOD'97.]] Google ScholarDigital Library
- A. Bykowski and C. Rigotti. A condensed representation to find frequent patterns. In PODS'01.]] Google ScholarDigital Library
- B. Cremilleux and J.-F. Boulicaut. Simplest rules characterizing classes generated by delta-free sets. In ES'02.]]Google Scholar
- G. Dong, X. Zhang, L. Wong, and J. Li. CAEP: Classification by aggregating emerging patterns. In Int. Conf. on Discovery Science, 1999.]] Google ScholarDigital Library
- E. Baralis and S. Chiusano. Minimal non redundant classification rule sets. IEEE ICDM Workshop on Foundations of Data Mining and Discovery, 2002.]]Google Scholar
- W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In ICDM'01.]] Google ScholarDigital Library
- B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD'98.]]Google Scholar
- B. Liu, Y. Ma, and K. Wong. Improving an association rule based classifier. In PKDD'00.]] Google ScholarDigital Library
- N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemsets lattice. In Information Systems, 1999.]] Google ScholarDigital Library
- N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Closed itemsets discovery of small covers for association rules. In Networking and Information Systems, June 2001.]]Google Scholar
- J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD DMKD'00.]]Google Scholar
- J. Quinlan. C4.5: program for classification learning. Morgan Kaufmann, 1992.]]Google Scholar
- K. Wang, S. Zhou, and Y. He. Growing decision trees on support-less association rules. In KDD'00.]] Google ScholarDigital Library
- M. Zaki. Generating non-redundant association rules. In KDD'00.]] Google ScholarDigital Library
- M. Zaki and C.-J. Hsiao. Charm: An efficient algorithm for closed itemset mining. In SIAM'02.]]Google Scholar
- On support thresholds in associative classification
Recommendations
Association Rule Mining with Dynamic Adaptive Support Thresholds for Associative Classification
ICCIMA '07: Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) - Volume 02The first and foremost task in any associative classification algorithm is mining of the association rules. Many studies have shown that the minimum support measure plays a key role in building an accurate classifier. Without the knowledge of the items ...
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Protein subcellular localization prediction with associative classification and multi-class SVM
BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and BiomedicineProtein subcellular localization prediction is the problem of predicting where a protein functions within a living cell. In this paper, we apply associative classifications (CMAR and CPAR) and multi-class Support Vector Machines to tackle the problem of ...
Comments