ABSTRACT
Graph data are subject to uncertainties in many applications due to incompleteness and imprecision of data. Mining uncertain graph data is semantically different from and computationally more challenging than mining exact graph data. This paper investigates the problem of mining frequent subgraph patterns from uncertain graph data. The frequent subgraph pattern mining problem is formalized by designing a new measure called expected support. An approximate mining algorithm is proposed to find an approximate set of frequent subgraph patterns by allowing an error tolerance on the expected supports of the discovered subgraph patterns. The algorithm uses an efficient approximation algorithm to determine whether a subgraph pattern can be output or not. The analytical and experimental results show that the algorithm is very efficient, accurate and scalable for large uncertain graph databases.
- C. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, pages 29--38, 2009. Google ScholarDigital Library
- S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14(6):1170--1175, 2004.Google ScholarCross Ref
- T. Bernecker, H.-P. Kriegel, M. Renz, F. Verhein, and A. Züfle. Probabilistic frequent itemset mining in uncertain databases. In KDD, pages 119--128, 2009. Google ScholarDigital Library
- G. Cormode and A. McGregor. Approximation algorithms for clustering uncertain data. In PODS, pages 191--200, 2008. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979. Google ScholarDigital Library
- J. Ghosh, H. Q. Ngo, S. Yoon, and C. Qiao. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM, pages 1721--1729, 2007.Google ScholarDigital Library
- P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. Data Min. Knowl. Discov., 17(1):3--23, 2008. Google ScholarDigital Library
- J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, page 549, 2003. Google ScholarDigital Library
- J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004. Google ScholarDigital Library
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, pages 13--23, 2000. Google ScholarDigital Library
- R. M. Karp and M. Luby. Monte-carlo algorithms for enumeration and reliability problems. In FOCS, pages 56--64, 1983. Google ScholarDigital Library
- Y. Ke, J. Cheng, and W. Ng. Correlation search in graph databases. In KDD, pages 390--399, 2007. Google ScholarDigital Library
- M. Koyutürk, A. Grama, and W. Szpankowski. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 20(Suppl. 1):i200--i207, 2004. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001. Google ScholarDigital Library
- Y. Liu, J. Li, and H. Gao. Summarizing graph patterns. In ICDE, pages 903--912, 2008. Google ScholarDigital Library
- M. Luby and B. Velickovic. On deterministic approximation of dnf. In STOC, pages 430--438, 1991. Google ScholarDigital Library
- M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005. Google ScholarDigital Library
- S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004. Google ScholarDigital Library
- J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In KDD, pages 228--238, 2005. Google ScholarDigital Library
- S. Suthram, T. Shlomi, E. Ruppin, R. Sharan, and T. Ideker. A direct comparison of protein interaction confidence assignment schemes. BMC Bioinformatics, 7(1):360, 2006.Google ScholarCross Ref
- S. Tsang, B. Kao, K. Y. Yip, W.-S. Ho, and S. D. Lee. Decision trees for uncertain data. In ICDE, pages 441--444, 2009. Google ScholarDigital Library
- L. G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189--201, 1979.Google ScholarCross Ref
- N. Vanetik. Discovering frequent graph patterns using disjoint paths. TKDE, 18(11):1441--1456, 2006. Google ScholarDigital Library
- C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD, pages 316--325, 2004. Google ScholarDigital Library
- J. Wang, Z. Zeng, and L. Zhou. Clan: An algorithm for mining closed cliques from large dense graph databases. In ICDE, page 73, 2006. Google ScholarDigital Library
- X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008. Google ScholarDigital Library
- X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, page 721, 2002. Google ScholarDigital Library
- X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD, pages 286--295, 2003. Google ScholarDigital Library
- Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Out-of-core coherent closed quasi-clique mining from large dense graph databases. TODS, 32(2):13, 2007. Google ScholarDigital Library
- Q. Zhang, F. Li, and K. Yi. Finding frequent items in probabilistic data. In SIGMOD, 2008. Google ScholarDigital Library
Index Terms
- Frequent subgraph pattern mining on uncertain graph data
Recommendations
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningFrequent subgraph mining has been extensively studied on certain graph data. However, uncertainties are inherently accompanied with graph data in practice, and there is very few work on mining uncertain graph data. This paper investigates frequent ...
Mining Frequent Subgraph Patterns from Uncertain Graph Data
In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Mining frequent subgraphs over uncertain graph databases under probabilistic semantics
Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainty is intrinsic in graph data in practice, but there is very few work on mining uncertain graph data. This paper focuses on mining frequent subgraphs over ...
Comments