research-article

An Indexing Framework for Queries on Probabilistic Graphs

Authors:
Silviu Maniu

Université Paris-Sud, Université Paris-Saclay, Orsay, France

Université Paris-Sud, Université Paris-Saclay, Orsay, France
View Profile

,
Reynold Cheng

The University of Hong Kong, Hong Kong

The University of Hong Kong, Hong Kong
View Profile

,
Pierre Senellart

École normale supérieure, PSL Research University and Inria Paris, France; Inria Paris

École normale supérieure, PSL Research University and Inria Paris, France; Inria Paris
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 42 Issue 2Article No.: 13pp 1–34https://doi.org/10.1145/3044713

Published:10 May 2017Publication History

ACM Transactions on Database Systems

Abstract

Information in many applications, such as mobile wireless systems, social networks, and road networks, is captured by graphs. In many cases, such information is uncertain. We study the problem of querying a probabilistic graph, in which vertices are connected to each other probabilistically. In particular, we examine “source-to-target” queries (ST-queries), such as computing the shortest path between two vertices. The major difference with the deterministic setting is that query answers are enriched with probabilistic annotations. Evaluating ST-queries over probabilistic graphs is #P-hard, as it requires examining an exponential number of “possible worlds”—database instances generated from the probabilistic graph. Existing solutions to the ST-query problem, which sample possible worlds, have two downsides: (i) a possible world can be very large and (ii) many samples are needed for reasonable accuracy. To tackle these issues, we study the ProbTree, a data structure that stores a succinct, or indexed, version of the possible worlds of the graph. Existing ST-query solutions are executed on top of this structure, with the number of samples and sizes of the possible worlds reduced. We examine lossless and lossy methods for generating the ProbTree, which reflect the tradeoff between the accuracy and efficiency of query evaluation. We analyze the correctness and complexity of these approaches. Our extensive experiments on real datasets show that the ProbTree is fast to generate and small in size. It also enhances the accuracy and efficiency of existing ST-query algorithms significantly.

References

Juancarlo Añez, Tomás De La Barra, and Beatnz Pérez. 1996. Dual graph representation of transport networks. Transport. Res. B: Method. 30, 3 (1996), 8 pages.Google Scholar
Serge Abiteboul, T.-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. 2011. Capturing continuous data and answering aggregate queries in probabilistic XML. ACM Trans. Database Syst. 36, 4 (2011), Article 25, 45 pages. Google ScholarDigital Library
Ittai Abraham, Amos Fiat, Andrew V. Goldberg, and Renato F. Werneck. 2010. Highway dimension, shortest paths, and provably efficient algorithms. In SODA. Google ScholarDigital Library
Eytan Adar and Christopher Re. 2007. Managing uncertainty in social networks. IEEE Data Eng. Bull. 30, 2 (2007), 8 pages.Google Scholar
Takuya Akiba, Christian Sommer, and Ken-ichi Kawarabayashi. 2012. Shortest-path queries for complex networks: Exploiting low tree-width outside the core. In EDBT. Google ScholarDigital Library
Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. 2015. Provenance circuits for trees and treelike instances. In ICALP. Google ScholarDigital Library
Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. 2016. Tractable lineages on treelike instances: Limits and extensions. In PODS. Google ScholarDigital Library
Stefan. Arnborg, Derek G. Corneil, and Andrzej Proskuworski. 1987. Complexity of finding embeddings in a k-tree. SIAM J. Algebraic Discrete Methods 8, 2 (1987), 8 pages. Google ScholarDigital Library
Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees. Discr. Appl. Math. 23, 1 (1989), 14 pages. Google ScholarDigital Library
Robert B. Ash and Catherine A. Doléans. 1999. Probability 8 Measure Theory (2nd ed.). Academic Press.Google Scholar
Saurabh Asthana, Oliver D. King, Francis D. Gibbons, and Frederick P. Roth. 2004. Predicting protein complex membership using probabilistic network reliability. Genome Res. 14, 6 (2004), 6 pages.Google ScholarCross Ref
Michael O. Ball. 1986. Computational complexity of network reliability analysis: An overview. IEEE Trans. Reliabil. 35, 3 (1986), 10 pages.Google ScholarCross Ref
Pablo Barceló, Leonid Libkin, and Juan L. Reutter. 2014. Querying regular graph patterns. J. ACM 61, 1 (2014), Article 8, 54 pages. Google ScholarDigital Library
Hans L. Bodlaender. 1996. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM J. Comput. 25, 6 (1996), 9 pages. Google ScholarDigital Library
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On compressing social networks. In KDD. Google ScholarDigital Library
Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32, 5 (2003), 18 pages. Google ScholarDigital Library
Nilesh N. Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J. 16, 4 (2007). Google ScholarDigital Library
Giuseppe Di Battista and Roberto Tamassia. 1990. On-line graph algorithms with SPQR-trees. In ICALP. Google ScholarDigital Library
Edsger W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numer. Math. 1 (1959), 3 pages. Google ScholarDigital Library
Pedro Domingos and Matthew Richardson. 2001. Mining the network value of customers. In KDD. Google ScholarDigital Library
Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In SIGMOD Conference. Google ScholarDigital Library
George S. Fishman. 1986. A comparison of four monte carlo methods for estimating the probability of s-t connectedness. IEEE Trans. Reliabil. 35, 2 (1986), 11 pages.Google ScholarCross Ref
Joy Ghosh, Hung Q. Ngo, Seokhoon Yoon, and Chunming Qiao. 2007. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM. Google ScholarDigital Library
Carsten Gutwenger and Petra Mutzel. 2000. A linear time implementation of SPQR-trees. In Graph Drawing. Google ScholarDigital Library
Hao He, Haixun Wang, Jun Yang, and Philip S. Yu. 2005. Compact reachability labeling for graph-structured data. In CIKM. Google ScholarDigital Library
John E. Hopcroft and Robert Endre Tarjan. 1973a. Dividing a graph into triconnected components. SIAM J. Comput. 2, 3 (1973), 24 pages.Google Scholar
John E. Hopcroft and Robert Endre Tarjan. 1973b. Efficient algorithms for graph manipulation {H} (algorithm 447). Commun. ACM 16, 6 (1973), 7 pages. Google ScholarDigital Library
Ming Hua and Jian Pei. 2010. Probabilistic path queries in road networks: Traffic uncertainty aware path selection. In EDBT. Google ScholarDigital Library
Ruoming Jin, Lin Liu, Bolin Ding, and Haixun Wang. 2011. Distance-constraint reachability computation in uncertain graphs. PVLDB 4, 9 (2011), 12 pages. Google ScholarDigital Library
Bhargav Kanagal and Amol Deshpande. 2010. Lineage processing over correlated probabilistic databases. In SIGMOD Conference. Google ScholarDigital Library
Arijit Khan, Francesco Bonchi, Aristides Gionis, and Francesco Gullo. 2014. Fast reliability search in uncertain graphs. In EDBT.Google Scholar
Michihiro Kuramochi and George Karypis. 2001. Frequent subgraph discovery. In ICDM. Google ScholarDigital Library
David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58, 7 (2007), 13 pages. Google ScholarDigital Library
Silviu Maniu, Bogdan Cautis, and Talel Abdessalem. 2011. Building a signed network from interactions in wikipedia. In Databases and Social Networks (DBSocial, SIGMOD’11). 19--24. Google ScholarDigital Library
Silviu Maniu, Reynold Cheng, and Pierre Senellart. 2014. ProbTree: A query-efficient representation of probabilistic graphs. In Proc. BUDA.Google Scholar
Mark E. J. Newman, Steven H. Strogatz, and Duncan J. Watts. 2001. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64 (Jul. 2001), Article 026118, 17 pages.Google ScholarCross Ref
Christos H. Papadimitriou. 1994. Computational Complexity. Addison Wesley Pub. Co., Reading, MA.Google Scholar
Odysseas Papapetrou, Ekaterini Ioannou, and Dimitrios Skoutas. 2011. Efficient discovery of frequent subgraph patterns in uncertain graph databases. In EDBT. Google ScholarDigital Library
Michalis Potamias, Francesco Bonchi, Aristides Gionis, and George Kollios. 2010. k-nearest neighbors in uncertain graphs. Proc. VLDB 3, 1 (2010), 12 pages. Google ScholarDigital Library
Neil Robertson and Paul D. Seymour. 1984. Graph minors. III. Planar tree-width. J. Comb. Theory B 36, 1 (1984), 16 pages.Google ScholarCross Ref
Mohammad Ali Safari. 2005. D-Width: A more natural measure for directed tree width. In MFCS. Google ScholarDigital Library
Prithviraj Sen and Amol Deshpande. 2007. Representing and querying correlated tuples in probabilistic databases. In ICDE.Google Scholar
Stephan Seufert, Avishek Anand, Srikanta Bedathur, and Gerhard Weikum. 2013. FERRARI: Flexible and efficient reachability range assignment for graph indexing. In ICDE.Google Scholar
Asma Souihli and Pierre Senellart. 2013. Optimizing approximations of DNF query lineage in probabilistic XML. In ICDE.Google Scholar
William T. Tutte. 1966. Connectivity in Graphs. Mathematical Expositions, Vol. 15. University of Toronto Press.Google ScholarCross Ref
Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 12 pages.Google Scholar
Fang Wei. 2010. TEDI: Efficient shortest path query answering on graphs. In SIGMOD Conference. Google ScholarDigital Library
Ye Yuan, Guoren Wang, Haixun Wang, and Lei Chen. 2011. Efficient subgraph search over large uncertain graphs. Proc. VLDB 4, 11 (2011), 12 pages.Google Scholar
Zhaonian Zou, Hong Gao, and Jianzhong Li. 2010. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD. Google ScholarDigital Library

Index Terms

An Indexing Framework for Queries on Probabilistic Graphs
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Data model extensions
        Uncertainty
      2. Graph-based database models
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Induced subgraphs and tree decompositions II. Toward walls and their line graphs in graphs of bounded degree
Abstract
This paper is motivated by the following question: what are the unavoidable induced subgraphs of graphs with large treewidth? Aboulker et al. made a conjecture which answers this question in graphs of bounded maximum degree, asserting that for ...
Read More
Tree decomposition-based indexing for efficient shortest path and nearest neighbors query answering on graphs

We propose TEDI, an indexing for solving shortest path, and k Nearest Neighbors (kNN) problems. TEDI is based on the tree decomposition methodology. The graph is first decomposed into a tree in which the node contains vertices. The shortest paths are ...
Read More
TEDI: efficient shortest path query answering on graphs
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Efficient shortest path query answering in large graphs is enjoying a growing number of applications, such as ranked keyword search in databases, social networks, ontology reasoning and bioinformatics. A shortest path query on a graph finds the shortest ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Database Systems Volume 42, Issue 2
Invited Paper from SIGMOD 2015, Invited Paper from PODS 2015 and Regular Papers
June 2017
251 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3086510
Editor:
Christian S. Jensen
Aalborg University, Denmark
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 May 2017
- Accepted: 1 January 2017
- Revised: 1 October 2016
- Received: 1 July 2015
Published in tods Volume 42, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Reachability
SPQR
shortest path
tree decomposition
treewidth
triconnected component
uncertain graph
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 364
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An Indexing Framework for Queries on Probabilistic Graphs

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Induced subgraphs and tree decompositions II. Toward walls and their line graphs in graphs of bounded degree

Tree decomposition-based indexing for efficient shortest path and nearest neighbors query answering on graphs

TEDI: efficient shortest path query answering on graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An Indexing Framework for Queries on Probabilistic Graphs

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Induced subgraphs and tree decompositions II. Toward walls and their line graphs in graphs of bounded degree

Tree decomposition-based indexing for efficient shortest path and nearest neighbors query answering on graphs

TEDI: efficient shortest path query answering on graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media