ABSTRACT
This paper studies the problem of frequent pattern mining with uncertain data. We will show how broad classes of algorithms can be extended to the uncertain data setting. In particular, we will study candidate generate-and-test algorithms, hyper-structure algorithms and pattern growth based algorithms. One of our insightful observations is that the experimental behavior of different classes of algorithms is very different in the uncertain case as compared to the deterministic case. In particular, the hyper-structure and the candidate generate-and-test algorithms perform much better than tree-based algorithms. This counter-intuitive behavior is an important observation from the perspective of algorithm design of the uncertain variation of the problem. We will test the approach on a number of real and synthetic data sets, and show the effectiveness of two of our approaches over competitive techniques.
- R. Agarwal, C. Aggarwal, V. Prasad. A Tree Projection Algorithm for Generating Frequent Itemsets. Journal of Parallel and Distributed Computing, 61(3), 2001. Google ScholarDigital Library
- C. C. Aggarwal. Managing and Mining Uncertain Data, Springer, 2009. Google ScholarDigital Library
- R. Agrawal, R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994. Google ScholarDigital Library
- R. J. Bayardo. Efficiently mining long patterns from databases SIGMOD 1998. Google ScholarDigital Library
- F. Bodon. A fast APRIORI implementation. URL: {http://fimi.cs.helsinki.fi/src/}.Google Scholar
- D. Burdick, M. Calimlim, J. Gehrke. MAFIA: A Maximal Frequent Itemset Algorithm. IEEE TKDE, 17(11), pp. 1490--1504, 2005. Google ScholarDigital Library
- C.-K. Chui, B. Kao, E. Hung. Mining Frequent Itemsets from Uncertain Data. PAKDD 2007.Google ScholarDigital Library
- C.-K. Chui, B. Kao. Decremental Approach for Mining Frequent Itemsets from Uncertain Data. PAKDD 2008.Google ScholarCross Ref
- J. Han, J. Pei, Y. Yin. Mining frequent patterns without candidate generation. SIGMOD 2000. Google ScholarDigital Library
- S. Guha, N. Koudas, K. Shim. Approximation and streaming algorithms for histogram construction problems. ACM TODS, 31(1), 396--438, 2006. Google ScholarDigital Library
- C. K.-S. Leung, M. A. F. Mateo, D. A. Brajczuk. A Tree-Based Approach for Frequent Pattern Mining from UncertainData, PAKDD 2008.Google ScholarCross Ref
- J. Pei, J. Han, et al. H-Mine: Hyper-Struction Mining of Frequent Patterns in Large Databases. ICDM 2001. Google ScholarDigital Library
- Y.G. Sucahyo, R.P. Gopalan. CT-PRO: A Bottom-up Non-Recursive Frequent Itemset Mining Algorithm Using Compressed FP-Tree DataStructure. URL: {http://fimi.cs.helsinki.fi/src/}.Google Scholar
- Q. Zhang, F. Li, and K. Yi. Finding Frequent Items in Probabilistic Data, SIGMOD 2008. Google ScholarDigital Library
Index Terms
- Frequent pattern mining with uncertain data
Recommendations
Mining uncertain data for constrained frequent sets
IDEAS '09: Proceedings of the 2009 International Database Engineering & Applications SymposiumData mining aims to search for implicit, previously unknown, and potentially useful pieces of information---such as sets of items that are frequently co-occurring together---that are embedded in data. The mined frequent sets can be used in the discovery ...
Finding efficiencies in frequent pattern mining from big uncertain data
Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are ...
Item-centric mining of frequent patterns from big uncertain data
AbstractHigh volumes of wide varieties of valuable data of different veracity (e.g., imprecise and uncertain data) can be easily generated or collected at a high velocity for various knowledge-based and intelligent information & engineering systems in ...
Comments