Abstract
Information in many applications, such as mobile wireless systems, social networks, and road networks, is captured by graphs. In many cases, such information is uncertain. We study the problem of querying a probabilistic graph, in which vertices are connected to each other probabilistically. In particular, we examine “source-to-target” queries (ST-queries), such as computing the shortest path between two vertices. The major difference with the deterministic setting is that query answers are enriched with probabilistic annotations. Evaluating ST-queries over probabilistic graphs is #P-hard, as it requires examining an exponential number of “possible worlds”—database instances generated from the probabilistic graph. Existing solutions to the ST-query problem, which sample possible worlds, have two downsides: (i) a possible world can be very large and (ii) many samples are needed for reasonable accuracy. To tackle these issues, we study the ProbTree, a data structure that stores a succinct, or indexed, version of the possible worlds of the graph. Existing ST-query solutions are executed on top of this structure, with the number of samples and sizes of the possible worlds reduced. We examine lossless and lossy methods for generating the ProbTree, which reflect the tradeoff between the accuracy and efficiency of query evaluation. We analyze the correctness and complexity of these approaches. Our extensive experiments on real datasets show that the ProbTree is fast to generate and small in size. It also enhances the accuracy and efficiency of existing ST-query algorithms significantly.
- Juancarlo Añez, Tomás De La Barra, and Beatnz Pérez. 1996. Dual graph representation of transport networks. Transport. Res. B: Method. 30, 3 (1996), 8 pages.Google Scholar
- Serge Abiteboul, T.-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. 2011. Capturing continuous data and answering aggregate queries in probabilistic XML. ACM Trans. Database Syst. 36, 4 (2011), Article 25, 45 pages. Google ScholarDigital Library
- Ittai Abraham, Amos Fiat, Andrew V. Goldberg, and Renato F. Werneck. 2010. Highway dimension, shortest paths, and provably efficient algorithms. In SODA. Google ScholarDigital Library
- Eytan Adar and Christopher Re. 2007. Managing uncertainty in social networks. IEEE Data Eng. Bull. 30, 2 (2007), 8 pages.Google Scholar
- Takuya Akiba, Christian Sommer, and Ken-ichi Kawarabayashi. 2012. Shortest-path queries for complex networks: Exploiting low tree-width outside the core. In EDBT. Google ScholarDigital Library
- Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. 2015. Provenance circuits for trees and treelike instances. In ICALP. Google ScholarDigital Library
- Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. 2016. Tractable lineages on treelike instances: Limits and extensions. In PODS. Google ScholarDigital Library
- Stefan. Arnborg, Derek G. Corneil, and Andrzej Proskuworski. 1987. Complexity of finding embeddings in a k-tree. SIAM J. Algebraic Discrete Methods 8, 2 (1987), 8 pages. Google ScholarDigital Library
- Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees. Discr. Appl. Math. 23, 1 (1989), 14 pages. Google ScholarDigital Library
- Robert B. Ash and Catherine A. Doléans. 1999. Probability 8 Measure Theory (2nd ed.). Academic Press.Google Scholar
- Saurabh Asthana, Oliver D. King, Francis D. Gibbons, and Frederick P. Roth. 2004. Predicting protein complex membership using probabilistic network reliability. Genome Res. 14, 6 (2004), 6 pages.Google ScholarCross Ref
- Michael O. Ball. 1986. Computational complexity of network reliability analysis: An overview. IEEE Trans. Reliabil. 35, 3 (1986), 10 pages.Google ScholarCross Ref
- Pablo Barceló, Leonid Libkin, and Juan L. Reutter. 2014. Querying regular graph patterns. J. ACM 61, 1 (2014), Article 8, 54 pages. Google ScholarDigital Library
- Hans L. Bodlaender. 1996. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM J. Comput. 25, 6 (1996), 9 pages. Google ScholarDigital Library
- Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On compressing social networks. In KDD. Google ScholarDigital Library
- Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32, 5 (2003), 18 pages. Google ScholarDigital Library
- Nilesh N. Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J. 16, 4 (2007). Google ScholarDigital Library
- Giuseppe Di Battista and Roberto Tamassia. 1990. On-line graph algorithms with SPQR-trees. In ICALP. Google ScholarDigital Library
- Edsger W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numer. Math. 1 (1959), 3 pages. Google ScholarDigital Library
- Pedro Domingos and Matthew Richardson. 2001. Mining the network value of customers. In KDD. Google ScholarDigital Library
- Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In SIGMOD Conference. Google ScholarDigital Library
- George S. Fishman. 1986. A comparison of four monte carlo methods for estimating the probability of s-t connectedness. IEEE Trans. Reliabil. 35, 2 (1986), 11 pages.Google ScholarCross Ref
- Joy Ghosh, Hung Q. Ngo, Seokhoon Yoon, and Chunming Qiao. 2007. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM. Google ScholarDigital Library
- Carsten Gutwenger and Petra Mutzel. 2000. A linear time implementation of SPQR-trees. In Graph Drawing. Google ScholarDigital Library
- Hao He, Haixun Wang, Jun Yang, and Philip S. Yu. 2005. Compact reachability labeling for graph-structured data. In CIKM. Google ScholarDigital Library
- John E. Hopcroft and Robert Endre Tarjan. 1973a. Dividing a graph into triconnected components. SIAM J. Comput. 2, 3 (1973), 24 pages.Google Scholar
- John E. Hopcroft and Robert Endre Tarjan. 1973b. Efficient algorithms for graph manipulation {H} (algorithm 447). Commun. ACM 16, 6 (1973), 7 pages. Google ScholarDigital Library
- Ming Hua and Jian Pei. 2010. Probabilistic path queries in road networks: Traffic uncertainty aware path selection. In EDBT. Google ScholarDigital Library
- Ruoming Jin, Lin Liu, Bolin Ding, and Haixun Wang. 2011. Distance-constraint reachability computation in uncertain graphs. PVLDB 4, 9 (2011), 12 pages. Google ScholarDigital Library
- Bhargav Kanagal and Amol Deshpande. 2010. Lineage processing over correlated probabilistic databases. In SIGMOD Conference. Google ScholarDigital Library
- Arijit Khan, Francesco Bonchi, Aristides Gionis, and Francesco Gullo. 2014. Fast reliability search in uncertain graphs. In EDBT.Google Scholar
- Michihiro Kuramochi and George Karypis. 2001. Frequent subgraph discovery. In ICDM. Google ScholarDigital Library
- David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58, 7 (2007), 13 pages. Google ScholarDigital Library
- Silviu Maniu, Bogdan Cautis, and Talel Abdessalem. 2011. Building a signed network from interactions in wikipedia. In Databases and Social Networks (DBSocial, SIGMOD’11). 19--24. Google ScholarDigital Library
- Silviu Maniu, Reynold Cheng, and Pierre Senellart. 2014. ProbTree: A query-efficient representation of probabilistic graphs. In Proc. BUDA.Google Scholar
- Mark E. J. Newman, Steven H. Strogatz, and Duncan J. Watts. 2001. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64 (Jul. 2001), Article 026118, 17 pages.Google ScholarCross Ref
- Christos H. Papadimitriou. 1994. Computational Complexity. Addison Wesley Pub. Co., Reading, MA.Google Scholar
- Odysseas Papapetrou, Ekaterini Ioannou, and Dimitrios Skoutas. 2011. Efficient discovery of frequent subgraph patterns in uncertain graph databases. In EDBT. Google ScholarDigital Library
- Michalis Potamias, Francesco Bonchi, Aristides Gionis, and George Kollios. 2010. k-nearest neighbors in uncertain graphs. Proc. VLDB 3, 1 (2010), 12 pages. Google ScholarDigital Library
- Neil Robertson and Paul D. Seymour. 1984. Graph minors. III. Planar tree-width. J. Comb. Theory B 36, 1 (1984), 16 pages.Google ScholarCross Ref
- Mohammad Ali Safari. 2005. D-Width: A more natural measure for directed tree width. In MFCS. Google ScholarDigital Library
- Prithviraj Sen and Amol Deshpande. 2007. Representing and querying correlated tuples in probabilistic databases. In ICDE.Google Scholar
- Stephan Seufert, Avishek Anand, Srikanta Bedathur, and Gerhard Weikum. 2013. FERRARI: Flexible and efficient reachability range assignment for graph indexing. In ICDE.Google Scholar
- Asma Souihli and Pierre Senellart. 2013. Optimizing approximations of DNF query lineage in probabilistic XML. In ICDE.Google Scholar
- William T. Tutte. 1966. Connectivity in Graphs. Mathematical Expositions, Vol. 15. University of Toronto Press.Google ScholarCross Ref
- Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 12 pages.Google Scholar
- Fang Wei. 2010. TEDI: Efficient shortest path query answering on graphs. In SIGMOD Conference. Google ScholarDigital Library
- Ye Yuan, Guoren Wang, Haixun Wang, and Lei Chen. 2011. Efficient subgraph search over large uncertain graphs. Proc. VLDB 4, 11 (2011), 12 pages.Google Scholar
- Zhaonian Zou, Hong Gao, and Jianzhong Li. 2010. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD. Google ScholarDigital Library
Index Terms
- An Indexing Framework for Queries on Probabilistic Graphs
Recommendations
Induced subgraphs and tree decompositions II. Toward walls and their line graphs in graphs of bounded degree
AbstractThis paper is motivated by the following question: what are the unavoidable induced subgraphs of graphs with large treewidth? Aboulker et al. made a conjecture which answers this question in graphs of bounded maximum degree, asserting that for ...
Tree decomposition-based indexing for efficient shortest path and nearest neighbors query answering on graphs
We propose TEDI, an indexing for solving shortest path, and k Nearest Neighbors (kNN) problems. TEDI is based on the tree decomposition methodology. The graph is first decomposed into a tree in which the node contains vertices. The shortest paths are ...
TEDI: efficient shortest path query answering on graphs
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataEfficient shortest path query answering in large graphs is enjoying a growing number of applications, such as ranked keyword search in databases, social networks, ontology reasoning and bioinformatics. A shortest path query on a graph finds the shortest ...
Comments