skip to main content
10.1145/1871437.1871503acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Fast and accurate estimation of shortest paths in large graphs

Authors Info & Claims
Published:26 October 2010Publication History

ABSTRACT

Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a number of techniques exist for answering reachability queries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes.

In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs.

We evaluate our techniques - implemented within a fully functional RDF graph database system - over large real-world social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0% and 1% on average.

References

  1. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. DBpedia: A Nucleus for a Web of Open Data. In ISWC 2007 + ASWC 2007: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, Lecture Notes in Computer Science 4825. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Bast. Car or Public Transport - Two Worlds. Lecture Notes in Computer Science 5760/2009, pages 355--367. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Bast, S. Funke, D. Matijevic, P. Sanders, and D. Schultes. In Transit to Constant Time Shortest-Path Queries in Road Networks. In ALENEX'07: Proceedings of the 2007 SIAM Workshop on Algorithm Engineering and Experiments. SIAM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and Distance Queries via 2-Hop-Labels. In SODA'2002: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 937--946, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 3rd edition, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Das Sarma, S. Gollapudi, M. Najork, and R. Panigrahy. A Sketch-Based Distance Oracle for Web-Scale Graphs. In WSDM'10: Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, pages 401--410, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. V. Goldberg, H. Kaplan, and R. F. Werneck. Reach for A*: Efficient Point-to-Point Shortest Path Algorithms. In ALENEX'06: Proceedings of the 2006 SIAM Workshop on Algorithm Engineering and Experiments. SIAM, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-HOP: A High-Compression Indexing Scheme for Reachability Query. In SIGMOD'09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pages 813--826. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently Answering Reachability Queries on Very Large Directed Graphs. In SIGMOD'08: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 595--608. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Kanehisa, S. Goto, M. Hattori, K. F. Aoki-Kinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki, and M. Hirakawa. From Genomics to Chemical Genomics: New Developments in KEGG. Nucleic Acids Research, 34 (Database Issue):354--357, 2006.Google ScholarGoogle Scholar
  11. D. E. Knuth. Seminumerical Algorithms. The Art of Computer Programming. Addison-Wesley, 1981.Google ScholarGoogle Scholar
  12. J. Köhler, J. Baumbach, J. Taubert, M. Specht, A. Skusa, A. Rüegg. C. J. Rawlings, P. Verrier, and S. Philippi. Graph-Based Analysis and Visualization of Experimental Results with ONDEX. Bioinformatics, 22(11), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Küntzer, C. Backes, T. Blum, A. Gerasch, M. Kaufmann, O. Kohlbacher, and H.-P. Lenhof. BNDB - The Biochemical Network Database. BMC Bioinformatics, 8, 2007.Google ScholarGoogle Scholar
  14. J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting Positive and Negative Links in Online Social Networks. In WWW'10: Proceedings of the 19th International World Wide Web Conference, pages 641--650. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD'2005: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 177--187. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. arXiv:0810.1355v1, October 2008.Google ScholarGoogle Scholar
  17. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In SIGCOMM'07: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pages 29--42. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Neumann and G. Weikum. The textsfRDF-3X Engine for Scalable Management of RDF Data. The VLDB Journal - International Journal on Very Large Data Bases, 19(1):91--113, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Potamias, F. Bonchi, C. Castillo, and A. Gionis. Fast Shortest Path Distance Estimation in Large Networks. In CIKM'09: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 867--876. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. C. Prim. Shortest Connection Networks and some Generalizations. Bell SystTechnology Journal, 36:1389--1401, 1957.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. M. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez. Scaling Online Social Networks without Pains. In NetDB'09: 5th International Workshop on Networking Meets Databases, 2009.Google ScholarGoogle Scholar
  22. J. Sankaranarayanan and H. Samet. Distance Oracles for Spatial Networks. In ICDE'09: Proceedings of the 2009 IEEE International Conference on Data Engineering, pages 652--663. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Schenkel, A. Theobald, and G. Weikum. HOPI: An Efficient Connection Index for Complex XML Document Collections. In EDBT'04: Proceedings of the 9th International Conference on Extending Database Technology, Lecture Notes in Computer Science 2992, pages 237--255, 2004.Google ScholarGoogle Scholar
  24. C. Sommer, E. Verbin, and W. Yu. Distance Oracles for Sparse Graphs. In FOCS'09: Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science, pages 703--712, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A Core of Semantic Knowledge. In WWW'07: Proceedings of the 16th International World Wide Web Conference, pages 697--706. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Thorup and U. Zwick. Approximate Distance Oracles. In STOC'01: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pages 183--192. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Trißl and U. Leser. Fast and Practical Indexing and Querying of Very Large Graphs. In SIGMOD'07: Proceedings of the 2007 ACM SIGMOD Intl. Conf. on Management of Data, pages 845--856. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. B. Zhan and C. E. Noon. Shortest Path Algorithms: An Evaluation using Real Road Networks. Transportation Science, 32(1):65--73, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. U. Zwick. Exact and Approximate Distances in Graphs - A Survey. In ESA'01: Procceding of the 9th Annual European Symposium on Algorithms, Lecture Notes in Computer Science 2161/2001. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast and accurate estimation of shortest paths in large graphs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
          October 2010
          2036 pages
          ISBN:9781450300995
          DOI:10.1145/1871437

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 October 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader