skip to main content
research-article

An Indexing Framework for Queries on Probabilistic Graphs

Authors Info & Claims
Published:10 May 2017Publication History
Skip Abstract Section

Abstract

Information in many applications, such as mobile wireless systems, social networks, and road networks, is captured by graphs. In many cases, such information is uncertain. We study the problem of querying a probabilistic graph, in which vertices are connected to each other probabilistically. In particular, we examine “source-to-target” queries (ST-queries), such as computing the shortest path between two vertices. The major difference with the deterministic setting is that query answers are enriched with probabilistic annotations. Evaluating ST-queries over probabilistic graphs is #P-hard, as it requires examining an exponential number of “possible worlds”—database instances generated from the probabilistic graph. Existing solutions to the ST-query problem, which sample possible worlds, have two downsides: (i) a possible world can be very large and (ii) many samples are needed for reasonable accuracy. To tackle these issues, we study the ProbTree, a data structure that stores a succinct, or indexed, version of the possible worlds of the graph. Existing ST-query solutions are executed on top of this structure, with the number of samples and sizes of the possible worlds reduced. We examine lossless and lossy methods for generating the ProbTree, which reflect the tradeoff between the accuracy and efficiency of query evaluation. We analyze the correctness and complexity of these approaches. Our extensive experiments on real datasets show that the ProbTree is fast to generate and small in size. It also enhances the accuracy and efficiency of existing ST-query algorithms significantly.

References

  1. Juancarlo Añez, Tomás De La Barra, and Beatnz Pérez. 1996. Dual graph representation of transport networks. Transport. Res. B: Method. 30, 3 (1996), 8 pages.Google ScholarGoogle Scholar
  2. Serge Abiteboul, T.-H. Hubert Chan, Evgeny Kharlamov, Werner Nutt, and Pierre Senellart. 2011. Capturing continuous data and answering aggregate queries in probabilistic XML. ACM Trans. Database Syst. 36, 4 (2011), Article 25, 45 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ittai Abraham, Amos Fiat, Andrew V. Goldberg, and Renato F. Werneck. 2010. Highway dimension, shortest paths, and provably efficient algorithms. In SODA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eytan Adar and Christopher Re. 2007. Managing uncertainty in social networks. IEEE Data Eng. Bull. 30, 2 (2007), 8 pages.Google ScholarGoogle Scholar
  5. Takuya Akiba, Christian Sommer, and Ken-ichi Kawarabayashi. 2012. Shortest-path queries for complex networks: Exploiting low tree-width outside the core. In EDBT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. 2015. Provenance circuits for trees and treelike instances. In ICALP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Antoine Amarilli, Pierre Bourhis, and Pierre Senellart. 2016. Tractable lineages on treelike instances: Limits and extensions. In PODS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Stefan. Arnborg, Derek G. Corneil, and Andrzej Proskuworski. 1987. Complexity of finding embeddings in a k-tree. SIAM J. Algebraic Discrete Methods 8, 2 (1987), 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees. Discr. Appl. Math. 23, 1 (1989), 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Robert B. Ash and Catherine A. Doléans. 1999. Probability 8 Measure Theory (2nd ed.). Academic Press.Google ScholarGoogle Scholar
  11. Saurabh Asthana, Oliver D. King, Francis D. Gibbons, and Frederick P. Roth. 2004. Predicting protein complex membership using probabilistic network reliability. Genome Res. 14, 6 (2004), 6 pages.Google ScholarGoogle ScholarCross RefCross Ref
  12. Michael O. Ball. 1986. Computational complexity of network reliability analysis: An overview. IEEE Trans. Reliabil. 35, 3 (1986), 10 pages.Google ScholarGoogle ScholarCross RefCross Ref
  13. Pablo Barceló, Leonid Libkin, and Juan L. Reutter. 2014. Querying regular graph patterns. J. ACM 61, 1 (2014), Article 8, 54 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hans L. Bodlaender. 1996. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM J. Comput. 25, 6 (1996), 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On compressing social networks. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and distance queries via 2-hop labels. SIAM J. Comput. 32, 5 (2003), 18 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Nilesh N. Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J. 16, 4 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Giuseppe Di Battista and Roberto Tamassia. 1990. On-line graph algorithms with SPQR-trees. In ICALP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Edsger W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numer. Math. 1 (1959), 3 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pedro Domingos and Matthew Richardson. 2001. Mining the network value of customers. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In SIGMOD Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. George S. Fishman. 1986. A comparison of four monte carlo methods for estimating the probability of s-t connectedness. IEEE Trans. Reliabil. 35, 2 (1986), 11 pages.Google ScholarGoogle ScholarCross RefCross Ref
  23. Joy Ghosh, Hung Q. Ngo, Seokhoon Yoon, and Chunming Qiao. 2007. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Carsten Gutwenger and Petra Mutzel. 2000. A linear time implementation of SPQR-trees. In Graph Drawing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hao He, Haixun Wang, Jun Yang, and Philip S. Yu. 2005. Compact reachability labeling for graph-structured data. In CIKM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. John E. Hopcroft and Robert Endre Tarjan. 1973a. Dividing a graph into triconnected components. SIAM J. Comput. 2, 3 (1973), 24 pages.Google ScholarGoogle Scholar
  27. John E. Hopcroft and Robert Endre Tarjan. 1973b. Efficient algorithms for graph manipulation {H} (algorithm 447). Commun. ACM 16, 6 (1973), 7 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ming Hua and Jian Pei. 2010. Probabilistic path queries in road networks: Traffic uncertainty aware path selection. In EDBT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ruoming Jin, Lin Liu, Bolin Ding, and Haixun Wang. 2011. Distance-constraint reachability computation in uncertain graphs. PVLDB 4, 9 (2011), 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Bhargav Kanagal and Amol Deshpande. 2010. Lineage processing over correlated probabilistic databases. In SIGMOD Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Arijit Khan, Francesco Bonchi, Aristides Gionis, and Francesco Gullo. 2014. Fast reliability search in uncertain graphs. In EDBT.Google ScholarGoogle Scholar
  32. Michihiro Kuramochi and George Karypis. 2001. Frequent subgraph discovery. In ICDM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David Liben-Nowell and Jon M. Kleinberg. 2007. The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58, 7 (2007), 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Silviu Maniu, Bogdan Cautis, and Talel Abdessalem. 2011. Building a signed network from interactions in wikipedia. In Databases and Social Networks (DBSocial, SIGMOD’11). 19--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Silviu Maniu, Reynold Cheng, and Pierre Senellart. 2014. ProbTree: A query-efficient representation of probabilistic graphs. In Proc. BUDA.Google ScholarGoogle Scholar
  36. Mark E. J. Newman, Steven H. Strogatz, and Duncan J. Watts. 2001. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E 64 (Jul. 2001), Article 026118, 17 pages.Google ScholarGoogle ScholarCross RefCross Ref
  37. Christos H. Papadimitriou. 1994. Computational Complexity. Addison Wesley Pub. Co., Reading, MA.Google ScholarGoogle Scholar
  38. Odysseas Papapetrou, Ekaterini Ioannou, and Dimitrios Skoutas. 2011. Efficient discovery of frequent subgraph patterns in uncertain graph databases. In EDBT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michalis Potamias, Francesco Bonchi, Aristides Gionis, and George Kollios. 2010. k-nearest neighbors in uncertain graphs. Proc. VLDB 3, 1 (2010), 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Neil Robertson and Paul D. Seymour. 1984. Graph minors. III. Planar tree-width. J. Comb. Theory B 36, 1 (1984), 16 pages.Google ScholarGoogle ScholarCross RefCross Ref
  41. Mohammad Ali Safari. 2005. D-Width: A more natural measure for directed tree width. In MFCS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Prithviraj Sen and Amol Deshpande. 2007. Representing and querying correlated tuples in probabilistic databases. In ICDE.Google ScholarGoogle Scholar
  43. Stephan Seufert, Avishek Anand, Srikanta Bedathur, and Gerhard Weikum. 2013. FERRARI: Flexible and efficient reachability range assignment for graph indexing. In ICDE.Google ScholarGoogle Scholar
  44. Asma Souihli and Pierre Senellart. 2013. Optimizing approximations of DNF query lineage in probabilistic XML. In ICDE.Google ScholarGoogle Scholar
  45. William T. Tutte. 1966. Connectivity in Graphs. Mathematical Expositions, Vol. 15. University of Toronto Press.Google ScholarGoogle ScholarCross RefCross Ref
  46. Leslie G. Valiant. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 12 pages.Google ScholarGoogle Scholar
  47. Fang Wei. 2010. TEDI: Efficient shortest path query answering on graphs. In SIGMOD Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ye Yuan, Guoren Wang, Haixun Wang, and Lei Chen. 2011. Efficient subgraph search over large uncertain graphs. Proc. VLDB 4, 11 (2011), 12 pages.Google ScholarGoogle Scholar
  49. Zhaonian Zou, Hong Gao, and Jianzhong Li. 2010. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Indexing Framework for Queries on Probabilistic Graphs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 42, Issue 2
          Invited Paper from SIGMOD 2015, Invited Paper from PODS 2015 and Regular Papers
          June 2017
          251 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/3086510
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 May 2017
          • Accepted: 1 January 2017
          • Revised: 1 October 2016
          • Received: 1 July 2015
          Published in tods Volume 42, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader