An effective hash-based algorithm for mining association rules

Authors:
Jong Soo Park

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

IBM Thomas J. Watson Research Center, Yorktown Heights, New York
View Profile

,
Ming-Syan Chen

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

IBM Thomas J. Watson Research Center, Yorktown Heights, New York
View Profile

,
Philip S. Yu

IBM Thomas J. Watson Research Center, Yorktown Heights, New York

IBM Thomas J. Watson Research Center, Yorktown Heights, New York
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 24 Issue 2May 1995pp 175–186https://doi.org/10.1145/568271.223813

Published:22 May 1995Publication History

ACM SIGMOD Record

Abstract

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

References

1 R. Agrawal, C. Faloutsos, and A. Swami. Efficient Similarity Search in Sequence Databases. Proceedings of the ~th Intl. conf. on Foundations of Data Organization and Algorithms, October, 1993. Google ScholarDigital Library
2 l~. Agrawal, S. Ghosh, T. Imiellnskl, B. Iyer, and A. Swami. An Interval Classifier for Database Mining Applications. Proceedings of the 18th International Conference on Very Large Data Bases, pages 560-573, August 1992. Google ScholarDigital Library
3 R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. Proceedings of A CM SIGMOD, pages 207-216, May 1993. Google ScholarDigital Library
4 R. Agrawal and R. Srikant. Mining Sequential Patterns. Proceedings of the 11th International Conference on Data Engineering, March 1995. Google ScholarDigital Library
5 R. Agrawal and S. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, September 1994. Google ScholarDigital Library
6 T.M. Anwar, H.W. Beck, and S.B. Navathe. Knowledge Mining by Imprecise Querying: A Classification-Based Approach. Proceedings of ~he 8th International Conference on Daia Engineering, February 1992. Google ScholarDigital Library
7 J. Han, Y. Cai, , and N. Cercone. Knowledge Discovery in Databases: An Attribute-Oriented Approach. Proceedings of ~he 18th International Conference on Very Large Da~a Bases, pages 547- 559, August 1992. Google ScholarDigital Library
8 M. Houtsma and A. Swami. Set-Oriented Mining of Association Rules. Technical Report RJ 9567, IBM Almaden Research Laboratory, San Jose, CA, October 1993.Google Scholar
9 E. G. Coffman jr. and J. Eve. File structures using hashing functions. Comm. of the ACM, 13(7):427- 432, 436, July 1970. Google ScholarDigital Library
10 R.T. Ng and J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. Proceedings of ~he 18th International Conference on Very Large Data Bases, pages 144-155, September 1994. Google ScholarDigital Library
11 G. Piatetsky-Shapiro. Discovery, Analysis and Presentation of Strong Rules. Knowledge Discovery in Databases, 1991.Google Scholar
12 J.R. Quinlan. Induction of Decision Trees. Machine Learning, 1:81-106, 1986. Google ScholarDigital Library

Index Terms

An effective hash-based algorithm for mining association rules
1. Information systems
  1. Data management systems
    1. Database design and models
    2. Database management system engines
      1. Database transaction processing
2. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Sorting and searching

Recommendations

An effective hash-based algorithm for mining association rules
SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of data

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items ...
Read More
Mining Association Rules Based on Apriori Algorithm and Application
IFCSTA '09: Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 01

In the data mining research, mining association rules is an important topic. Apriori algorithm submitted by Agrawal and R. Srikant in 1994 is the most effective algorithm. Aimed at two problems of discovering frequent itemsets in a large database and ...
Read More
Preknowledge-based generalized association rules mining

The subject of this paper is the mining of generalized association rules using pruning techniques. Given a large transaction database and a hierarchical taxonomy tree of the items, we attempt to find the association rules between the items at different ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 24, Issue 2
May 1995
490 pages
ISSN:0163-5808
DOI:10.1145/568271
Issue’s Table of Contents
SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of data
June 1995
508 pages
ISBN:0897917316
DOI:10.1145/223784
Editors:
Michael Carey
Univ. of Wisconsin
,
Donovan Schneider
Red Brick Systems
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 May 1995
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,066
  Total Citations
  View Citations
- 5,713
  Total Downloads
- Downloads (Last 12 months)475
- Downloads (Last 6 weeks)67
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An effective hash-based algorithm for mining association rules

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

An effective hash-based algorithm for mining association rules

Mining Association Rules Based on Apriori Algorithm and Application

Preknowledge-based generalized association rules mining