ABSTRACT
Microarray datasets typically contain large number of columns but small number of rows. Association rules have been proved to be useful in analyzing such datasets. However, most existing association rule mining algorithms are unable to efficiently handle datasets with large number of columns. Moreover, the number of association rules generated from such datasets is enormous due to the large number of possible column combinations.In this paper, we describe a new algorithm called FARMER that is specially designed to discover association rules from microarray datasets. Instead of finding individual association rules, FARMER finds interesting rule groups which are essentially a set of rules that are generated from the same set of rows. Unlike conventional rule mining algorithms, FARMER searches for interesting rules in the row enumeration space and exploits all user-specified constraints including minimum support, confidence and chi-square to support efficient pruning. Several experiments on real bioinformatics datasets show that FARMER is orders of magnitude faster than previous association rule mining algorithms.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 487--499, Sept. 1994.]] Google ScholarDigital Library
- R. J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proc. of ACM SIGKDD, 1999.]] Google ScholarDigital Library
- R. J. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-based rule mining on large, dense data sets. In Proc. 1999 Int. Conf. Data Engineering (ICDE'99).]] Google ScholarDigital Library
- K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99).]] Google ScholarDigital Library
- Y. Cheng and G. M. Church. Biclustering of expression data. In Proc of the 8th Intl. Conf. on intelligent Systems for Mocular Biology, 2000.]] Google ScholarDigital Library
- G. Cong, A. K. H. Tung, X. Xu, F. Pan, and J. Yang. Farmer: Finding interesting rule groups in microarray datasets. Technical Report: National University of Singapore, 2004.]]Google Scholar
- C. Creighton and S. Hanash. Mining gene expression databases for association rules. Bioinformatics, 19, 2003.]]Google Scholar
- S. Doddi, A. Marathe, S. Ravi, and D. Torney. Discovery of association rules in medical data. Med. Inform. Internet. Med., 26:25--33, 2001.]]Google ScholarCross Ref
- G. Dong, X. Zhang, L. Wong, and J. Li. Caep: Classification by aggregating emerging patterns. In Proc. 2nd Int. Conf. Discovery Science (DS'99).]] Google ScholarDigital Library
- J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest: A framework for fast decision tree construction of large datasets. In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB'98).]] Google ScholarDigital Library
- J. Han and J. Pei. Mining frequent patterns by pattern-growth:methodology and implications. KDD Exploration, 2, 2000.]] Google ScholarDigital Library
- T. Joachims. Making large-scale svm learning practical. 1999. svmlight.joachims.org/.]]Google Scholar
- J. Li and L. Wong. Identifying good diagnostic genes or genes groups from gene expression data by using the concept of emerging patterns. Bioinformatics, 18:725--734, 2002.]]Google ScholarCross Ref
- B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98).]]Google Scholar
- S. Morishita and J. Sese. Traversing itemset lattices with statistical metric prunning. In Proc. of PODS, 2002.]] Google ScholarDigital Library
- R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98).]] Google ScholarDigital Library
- F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. J. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003.]] Google ScholarDigital Library
- N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. 7th Int. Conf. Database Theory (ICDT'99), Jan.]] Google ScholarDigital Library
- J. L. Pfaltz and C. Taylor. Closed set mining of biological data. In Workshop on Data Mining in BIoinformatics with (SIGKDD02), 2002.]]Google Scholar
- R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD'97), 1997.]]Google Scholar
- J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003.]] Google ScholarDigital Library
- M. Zaki. Generating non-redundant association rules. In Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD'00), 2000.]] Google ScholarDigital Library
- M. Zaki and C. Hsiao. Charm: An efficient algorithm for closed association rule mining. In Proc. of SIAM on Data Mining, 2002.]]Google Scholar
- FARMER: finding interesting rule groups in microarray datasets
Recommendations
FARMER: a novel approach to file access correlation mining and evaluation reference model for optimizing peta-scale file system performance
HPDC '08: Proceedings of the 17th international symposium on High performance distributed computingFile correlation, which refers to a relationship among related files that can manifest in the form of their common access locality (temporal and/or spatial), has become an increasingly important consideration for performance enhancement in peta-scale ...
Mining fuzzy specific rare itemsets for education data
Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...
Comments