skip to main content
10.1145/276304.276308acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Parallel mining algorithms for generalized association rules with classification hierarchy

Published:01 June 1998Publication History

ABSTRACT

Association rule mining recently attracted strong attention. Usually, the classification hierarchy over the data items is available. Users are interested in generalized association rules that span different levels of the hierarchy, since sometimes more interesting rules can be derived by taking the hierarchy into account.

In this paper, we propose the new parallel algorithms for mining association rules with classification hierarchy on a shared-nothing parallel machine to improve its performance. Our algorithms partition the candidate itemsets over the processors, which exploits the aggregate memory of the system effectively. If the candidate itemsets are partitioned without considering classification hierarchy, both the items and its all the ancestor items have to be transmitted, that causes prohibitively large amount of communications. Our method minimizes interprocessor communication by considering the hierarchy. Moreover, in our algorithm, the available memory space is fully utilized by identifying the frequently occurring candidate itemsets and copying them over all the processors, through which frequent itemsets can be processed locally without any communication. Thus it can effectively reduce the load skew among the processors. Several experiments are done by changing the granule of copying itemsets, from the whole tree, to the small group of the frequent itemsets along the hierarchy. The coarser the grain, the easier the control but it is rather difficult to achieve the sufficient load balance. The finer the grain, the more complicated the control is required but it can balance the load quite well.

We implemented proposed algorithms on IBM SP-2. Performance evaluations show that our algorithms are effective for handling skew and attain sufficient speedup ratio.

References

  1. AS96.R. Agrawal and J. C. Shafer. Parallel mining of association rules. In IEEE Transactions on Knowledge and Data Engineering, Vol.8, No.6, pages 962-969, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. CHN+96.D. W. Cheung, J. Han, V. T. Ng, A. W. Fu, and Y. Fu. A fast distributed algorithms for mining association rules. In Proceedings of IEEE 4th International Conference on Parallel and Distributed Information Systems, pages 31-42, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. CNFF96.D. W. Cheung, V. T. Ng, A. W. Fu, and Y. Fu. Efficient mining of association rules in distributed databases. In IEEE Transactions on Knowledge and Data Engineering, Vol.8, No.6, pages 911-922, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. HKK97.E.H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In Proceedings of 1997 A CM SIGMOD International Conference on Management of Data, pages 277- 288, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. PCY95.J.S. Park, M. S. Chen, and P. S. Yu. Efficient parallel data mining for association rules. In Proceedings of the ~th Conference on Information and Knowledge Management, pages 31-36, November 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. RR94.R.Agrawal and R.Srikant. Fast algorithms for mining association rules, in Proceedings of the 20th International Conference on Very Large Data Bases, pages 487-499, September 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. SA95.R. Srikant and R. Agrawal, Mining generalized association rules. In Proceedings of 20th International Conference on Very Large Data Bases, pages 407-419, September 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. SA96.R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology, March 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. SK96.T. Shintani and M. Kitsuregawa. Hash based parallel algorithms for mining association rules. In Proceedings of IEEE 4th International Conference on Parallel and Distributed Information Systems, pages 19-30, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. SK98.T. Shintani and M. Kitsuregawa. Mining algorithms for sequential patterns in parallel ~ Hash based approach, to be published in the Second Pacific-Asia Conference on Knowledge Discovery and Data mining, April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallel mining algorithms for generalized association rules with classification hierarchy

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data
                  June 1998
                  599 pages
                  ISBN:0897919955
                  DOI:10.1145/276304

                  Copyright © 1998 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 June 1998

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  Overall Acceptance Rate785of4,003submissions,20%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader