ABSTRACT
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a number of techniques exist for answering reachability queries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes.
In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs.
We evaluate our techniques - implemented within a fully functional RDF graph database system - over large real-world social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0% and 1% on average.
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. DBpedia: A Nucleus for a Web of Open Data. In ISWC 2007 + ASWC 2007: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, Lecture Notes in Computer Science 4825. Springer, 2007. Google ScholarDigital Library
- H. Bast. Car or Public Transport - Two Worlds. Lecture Notes in Computer Science 5760/2009, pages 355--367. Springer, 2009. Google ScholarDigital Library
- H. Bast, S. Funke, D. Matijevic, P. Sanders, and D. Schultes. In Transit to Constant Time Shortest-Path Queries in Road Networks. In ALENEX'07: Proceedings of the 2007 SIAM Workshop on Algorithm Engineering and Experiments. SIAM, 2007.Google ScholarCross Ref
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and Distance Queries via 2-Hop-Labels. In SODA'2002: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 937--946, 2002. Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 3rd edition, 2009. Google ScholarDigital Library
- A. Das Sarma, S. Gollapudi, M. Najork, and R. Panigrahy. A Sketch-Based Distance Oracle for Web-Scale Graphs. In WSDM'10: Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, pages 401--410, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- A. V. Goldberg, H. Kaplan, and R. F. Werneck. Reach for A*: Efficient Point-to-Point Shortest Path Algorithms. In ALENEX'06: Proceedings of the 2006 SIAM Workshop on Algorithm Engineering and Experiments. SIAM, 2006.Google ScholarCross Ref
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-HOP: A High-Compression Indexing Scheme for Reachability Query. In SIGMOD'09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pages 813--826. ACM, 2009. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and H. Wang. Efficiently Answering Reachability Queries on Very Large Directed Graphs. In SIGMOD'08: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 595--608. ACM, 2008. Google ScholarDigital Library
- M. Kanehisa, S. Goto, M. Hattori, K. F. Aoki-Kinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki, and M. Hirakawa. From Genomics to Chemical Genomics: New Developments in KEGG. Nucleic Acids Research, 34 (Database Issue):354--357, 2006.Google Scholar
- D. E. Knuth. Seminumerical Algorithms. The Art of Computer Programming. Addison-Wesley, 1981.Google Scholar
- J. Köhler, J. Baumbach, J. Taubert, M. Specht, A. Skusa, A. Rüegg. C. J. Rawlings, P. Verrier, and S. Philippi. Graph-Based Analysis and Visualization of Experimental Results with ONDEX. Bioinformatics, 22(11), 2006. Google ScholarDigital Library
- J. Küntzer, C. Backes, T. Blum, A. Gerasch, M. Kaufmann, O. Kohlbacher, and H.-P. Lenhof. BNDB - The Biochemical Network Database. BMC Bioinformatics, 8, 2007.Google Scholar
- J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting Positive and Negative Links in Online Social Networks. In WWW'10: Proceedings of the 19th International World Wide Web Conference, pages 641--650. ACM, 2010. Google ScholarDigital Library
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD'2005: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 177--187. ACM, 2005. Google ScholarDigital Library
- J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. arXiv:0810.1355v1, October 2008.Google Scholar
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In SIGCOMM'07: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pages 29--42. ACM, 2007. Google ScholarDigital Library
- T. Neumann and G. Weikum. The textsfRDF-3X Engine for Scalable Management of RDF Data. The VLDB Journal - International Journal on Very Large Data Bases, 19(1):91--113, 2010. Google ScholarDigital Library
- M. Potamias, F. Bonchi, C. Castillo, and A. Gionis. Fast Shortest Path Distance Estimation in Large Networks. In CIKM'09: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 867--876. ACM, 2009. Google ScholarDigital Library
- R. C. Prim. Shortest Connection Networks and some Generalizations. Bell SystTechnology Journal, 36:1389--1401, 1957.Google ScholarCross Ref
- J. M. Pujol, G. Siganos, V. Erramilli, and P. Rodriguez. Scaling Online Social Networks without Pains. In NetDB'09: 5th International Workshop on Networking Meets Databases, 2009.Google Scholar
- J. Sankaranarayanan and H. Samet. Distance Oracles for Spatial Networks. In ICDE'09: Proceedings of the 2009 IEEE International Conference on Data Engineering, pages 652--663. IEEE Computer Society, 2009. Google ScholarDigital Library
- R. Schenkel, A. Theobald, and G. Weikum. HOPI: An Efficient Connection Index for Complex XML Document Collections. In EDBT'04: Proceedings of the 9th International Conference on Extending Database Technology, Lecture Notes in Computer Science 2992, pages 237--255, 2004.Google Scholar
- C. Sommer, E. Verbin, and W. Yu. Distance Oracles for Sparse Graphs. In FOCS'09: Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science, pages 703--712, 2009. Google ScholarDigital Library
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A Core of Semantic Knowledge. In WWW'07: Proceedings of the 16th International World Wide Web Conference, pages 697--706. ACM, 2007. Google ScholarDigital Library
- M. Thorup and U. Zwick. Approximate Distance Oracles. In STOC'01: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pages 183--192. ACM, 2001. Google ScholarDigital Library
- S. Trißl and U. Leser. Fast and Practical Indexing and Querying of Very Large Graphs. In SIGMOD'07: Proceedings of the 2007 ACM SIGMOD Intl. Conf. on Management of Data, pages 845--856. ACM, 2007. Google ScholarDigital Library
- F. B. Zhan and C. E. Noon. Shortest Path Algorithms: An Evaluation using Real Road Networks. Transportation Science, 32(1):65--73, 1998. Google ScholarDigital Library
- U. Zwick. Exact and Approximate Distances in Graphs - A Survey. In ESA'01: Procceding of the 9th Annual European Symposium on Algorithms, Lecture Notes in Computer Science 2161/2001. Springer, 2001. Google ScholarDigital Library
Index Terms
- Fast and accurate estimation of shortest paths in large graphs
Recommendations
Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementComputing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of ...
Reconfiguration graphs of shortest paths
AbstractFor a graph G anda , b ∈ V ( G ), the shortest path reconfiguration graph of G with respect to a andb is denoted by S ( G , a , b ). The vertex set of S ( G , a , b ) is the set of all shortest paths between a andb in G. Two vertices ...
Shortest paths in less than a millisecond
WOSN '12: Proceedings of the 2012 ACM workshop on Workshop on online social networksWe consider the problem of answering point-to-point shortest path queries on massive social networks. The goal is to answer queries within tens of milliseconds while minimizing the memory requirements. We present a technique that achieves this goal for ...
Comments