Article

FARMER: finding interesting rule groups in microarray datasets

Authors:
Gao Cong

Natl. University of Singapore

Natl. University of Singapore
View Profile

,
Anthony K. H. Tung

Natl. University of Singapore

Natl. University of Singapore
View Profile

,
Xin Xu

Natl. University of Singapore

Natl. University of Singapore
View Profile

,
Feng Pan

Natl. University of Singapore

Natl. University of Singapore
View Profile

,
Jiong Yang

University of Illinois, Urbana Champaign

University of Illinois, Urbana Champaign
View Profile

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of dataJune 2004Pages 143–154https://doi.org/10.1145/1007568.1007587

Published:13 June 2004Publication History

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

Pages 143–154

ABSTRACT

Microarray datasets typically contain large number of columns but small number of rows. Association rules have been proved to be useful in analyzing such datasets. However, most existing association rule mining algorithms are unable to efficiently handle datasets with large number of columns. Moreover, the number of association rules generated from such datasets is enormous due to the large number of possible column combinations.In this paper, we describe a new algorithm called FARMER that is specially designed to discover association rules from microarray datasets. Instead of finding individual association rules, FARMER finds interesting rule groups which are essentially a set of rules that are generated from the same set of rows. Unlike conventional rule mining algorithms, FARMER searches for interesting rules in the row enumeration space and exploits all user-specified constraints including minimum support, confidence and chi-square to support efficient pruning. Several experiments on real bioinformatics datasets show that FARMER is orders of magnitude faster than previous association rule mining algorithms.

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 487--499, Sept. 1994.]] Google ScholarDigital Library
R. J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proc. of ACM SIGKDD, 1999.]] Google ScholarDigital Library
R. J. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-based rule mining on large, dense data sets. In Proc. 1999 Int. Conf. Data Engineering (ICDE'99).]] Google ScholarDigital Library
K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99).]] Google ScholarDigital Library
Y. Cheng and G. M. Church. Biclustering of expression data. In Proc of the 8th Intl. Conf. on intelligent Systems for Mocular Biology, 2000.]] Google ScholarDigital Library
G. Cong, A. K. H. Tung, X. Xu, F. Pan, and J. Yang. Farmer: Finding interesting rule groups in microarray datasets. Technical Report: National University of Singapore, 2004.]]Google Scholar
C. Creighton and S. Hanash. Mining gene expression databases for association rules. Bioinformatics, 19, 2003.]]Google Scholar
S. Doddi, A. Marathe, S. Ravi, and D. Torney. Discovery of association rules in medical data. Med. Inform. Internet. Med., 26:25--33, 2001.]]Google ScholarCross Ref
G. Dong, X. Zhang, L. Wong, and J. Li. Caep: Classification by aggregating emerging patterns. In Proc. 2nd Int. Conf. Discovery Science (DS'99).]] Google ScholarDigital Library
J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest: A framework for fast decision tree construction of large datasets. In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB'98).]] Google ScholarDigital Library
J. Han and J. Pei. Mining frequent patterns by pattern-growth:methodology and implications. KDD Exploration, 2, 2000.]] Google ScholarDigital Library
T. Joachims. Making large-scale svm learning practical. 1999. svmlight.joachims.org/.]]Google Scholar
J. Li and L. Wong. Identifying good diagnostic genes or genes groups from gene expression data by using the concept of emerging patterns. Bioinformatics, 18:725--734, 2002.]]Google ScholarCross Ref
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98).]]Google Scholar
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric prunning. In Proc. of PODS, 2002.]] Google ScholarDigital Library
R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98).]] Google ScholarDigital Library
F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. J. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003.]] Google ScholarDigital Library
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. 7th Int. Conf. Database Theory (ICDT'99), Jan.]] Google ScholarDigital Library
J. L. Pfaltz and C. Taylor. Closed set mining of biological data. In Workshop on Data Mining in BIoinformatics with (SIGKDD02), 2002.]]Google Scholar
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD'97), 1997.]]Google Scholar
J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003.]] Google ScholarDigital Library
M. Zaki. Generating non-redundant association rules. In Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD'00), 2000.]] Google ScholarDigital Library
M. Zaki and C. Hsiao. Charm: An efficient algorithm for closed association rule mining. In Proc. of SIAM on Data Mining, 2002.]]Google Scholar

FARMER: finding interesting rule groups in microarray datasets
1. Information systems
  1. Information systems applications

Recommendations

FARMER: a novel approach to file access correlation mining and evaluation reference model for optimizing peta-scale file system performance
HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing

File correlation, which refers to a relationship among related files that can manifest in the form of their common access locality (temporal and/or spatial), has become an increasingly important consideration for performance enhancement in peta-scale ...
Read More
Mining fuzzy specific rare itemsets for education data

Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...
Read More
Association rule mining and quantitative association rule mining among infrequent items
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
June 2004
988 pages
ISBN:1581138598
DOI:10.1145/1007568
Conference Chairs:
Arnd Christian König
Microsoft Research
,
Stefan Dessloch
University of Kaiserslautern, Germany
,
General Chair:
Patrick Valduriez
INRIA, France
,
Program Chair:
Gerhard Weikum
University of the Saarland
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 87
  Total Citations
  View Citations
- 1,330
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FARMER: finding interesting rule groups in microarray datasets

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Recommendations

FARMER: a novel approach to file access correlation mining and evaluation reference model for optimizing peta-scale file system performance

Mining fuzzy specific rare itemsets for education data

Association rule mining and quantitative association rule mining among infrequent items