ABSTRACT
Association rule mining recently attracted strong attention. Usually, the classification hierarchy over the data items is available. Users are interested in generalized association rules that span different levels of the hierarchy, since sometimes more interesting rules can be derived by taking the hierarchy into account.
In this paper, we propose the new parallel algorithms for mining association rules with classification hierarchy on a shared-nothing parallel machine to improve its performance. Our algorithms partition the candidate itemsets over the processors, which exploits the aggregate memory of the system effectively. If the candidate itemsets are partitioned without considering classification hierarchy, both the items and its all the ancestor items have to be transmitted, that causes prohibitively large amount of communications. Our method minimizes interprocessor communication by considering the hierarchy. Moreover, in our algorithm, the available memory space is fully utilized by identifying the frequently occurring candidate itemsets and copying them over all the processors, through which frequent itemsets can be processed locally without any communication. Thus it can effectively reduce the load skew among the processors. Several experiments are done by changing the granule of copying itemsets, from the whole tree, to the small group of the frequent itemsets along the hierarchy. The coarser the grain, the easier the control but it is rather difficult to achieve the sufficient load balance. The finer the grain, the more complicated the control is required but it can balance the load quite well.
We implemented proposed algorithms on IBM SP-2. Performance evaluations show that our algorithms are effective for handling skew and attain sufficient speedup ratio.
- AS96.R. Agrawal and J. C. Shafer. Parallel mining of association rules. In IEEE Transactions on Knowledge and Data Engineering, Vol.8, No.6, pages 962-969, December 1996. Google ScholarDigital Library
- CHN+96.D. W. Cheung, J. Han, V. T. Ng, A. W. Fu, and Y. Fu. A fast distributed algorithms for mining association rules. In Proceedings of IEEE 4th International Conference on Parallel and Distributed Information Systems, pages 31-42, December 1996. Google ScholarDigital Library
- CNFF96.D. W. Cheung, V. T. Ng, A. W. Fu, and Y. Fu. Efficient mining of association rules in distributed databases. In IEEE Transactions on Knowledge and Data Engineering, Vol.8, No.6, pages 911-922, December 1996. Google ScholarDigital Library
- HKK97.E.H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. In Proceedings of 1997 A CM SIGMOD International Conference on Management of Data, pages 277- 288, 1997. Google ScholarDigital Library
- PCY95.J.S. Park, M. S. Chen, and P. S. Yu. Efficient parallel data mining for association rules. In Proceedings of the ~th Conference on Information and Knowledge Management, pages 31-36, November 1995. Google ScholarDigital Library
- RR94.R.Agrawal and R.Srikant. Fast algorithms for mining association rules, in Proceedings of the 20th International Conference on Very Large Data Bases, pages 487-499, September 1994. Google ScholarDigital Library
- SA95.R. Srikant and R. Agrawal, Mining generalized association rules. In Proceedings of 20th International Conference on Very Large Data Bases, pages 407-419, September 1995. Google ScholarDigital Library
- SA96.R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology, March 1996. Google ScholarDigital Library
- SK96.T. Shintani and M. Kitsuregawa. Hash based parallel algorithms for mining association rules. In Proceedings of IEEE 4th International Conference on Parallel and Distributed Information Systems, pages 19-30, December 1996. Google ScholarDigital Library
- SK98.T. Shintani and M. Kitsuregawa. Mining algorithms for sequential patterns in parallel ~ Hash based approach, to be published in the Second Pacific-Asia Conference on Knowledge Discovery and Data mining, April 1998. Google ScholarDigital Library
Index Terms
- Parallel mining algorithms for generalized association rules with classification hierarchy
Recommendations
Parallel mining algorithms for generalized association rules with classification hierarchy
Association rule mining recently attracted strong attention. Usually, the classification hierarchy over the data items is available. Users are interested in generalized association rules that span different levels of the hierarchy, since sometimes more ...
Preknowledge-based generalized association rules mining
The subject of this paper is the mining of generalized association rules using pruning techniques. Given a large transaction database and a hierarchical taxonomy tree of the items, we attempt to find the association rules between the items at different ...
Mining Inter-transactional Association Rules: Generalization and Empirical Evaluation
DaWaK '01: Proceedings of the Third International Conference on Data Warehousing and Knowledge DiscoveryThe problem of mining multidimensional inter-transactional association rules was recently introduced in [5, 4]. It extends the scope of mining association rules from traditional single-dimensional intra-transactional associations to multidimensional ...
Comments