skip to main content
10.1145/1645953.1646063acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Fast shortest path distance estimation in large networks

Published:02 November 2009Publication History

ABSTRACT

In this paper we study approximate landmark-based methods for point-to-point distance estimation in very large networks. These methods involve selecting a subset of nodes as landmarks and computing offline the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, it can be estimated quickly by combining the precomputed distances. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. We therefore explore theoretical insights to devise a variety of simple methods that scale well in very large networks. The efficiency of the suggested techniques is tested experimentally using five real-world graphs having millions of edges. While theoretical bounds support the claim that random landmarks work well in practice, our extensive experimentation shows that smart landmark selection can yield dramatically more accurate results: for a given target accuracy, our methods require as much as 250 times less space than selecting landmarks at random. In addition, we demonstrate that at a very small accuracy loss our techniques are several orders of magnitude faster than the state-of-the-art exact methods. Finally, we study an application of our methods to the task of social search in large graphs.

References

  1. I. Abraham, Y. Bartal, H. Chan, K. Dhamdhere, A. Gupta, J. Kleinberg, O. Neiman, and A. Slivkins. Metric embeddings with relaxed guarantees. In FOCS 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Amer-Yahia, M. Benedikt, L. V. Lakshmanan, and J. Stoyanovic. Efficient network-aware search in collaborative tagging sites. In VLDB 2008.Google ScholarGoogle Scholar
  3. V. Athitsos, P. Papapetrou, M. Potamias, G. Kollios, and D. Gunopulos. Approximate embedding-based subsequence matching of time series. In SIGMOD 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bader, S. Kintali, K. Madduri, and M. Mihail. Approximating betweenness centrality. In WAW 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baeza and Ribeiro. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Bourgain. On Lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal of Mathematics, 52(1):46--52, March 1985.Google ScholarGoogle ScholarCross RefCross Ref
  7. U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  8. V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, 2nd Edition. The MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Dabek, R. Cox, F. Kaashoek, and R. Morris. Vivaldi: a decentralized network coordinate system. In SIGCOMM 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1959.Google ScholarGoogle Scholar
  12. R. Fagin, A. Lotem, and M. Naor Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. W. Floyd. Algorithm 97: Shortest path. Commun. ACM, 5(6), June 1962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Fogaras, and B. Racz Towards Scaling Fully Personalized PageRank. Algorithms and Models for the Web-Graph, pp. 105--117, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. L. Freeman. A set of measures of centrality based on betweenness. Sociometry, 40(1):35--41, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Goldberg, H. Kaplan, and R. Werneck. Reach for A¤: Efficient point-to-point shortest path algorithms. Tech. Rep. MSR-TR-2005-132, October 2005.Google ScholarGoogle Scholar
  17. A. Goldberg and C. Harrelson. Computing the shortest path: A* search meets graph theory. In SODA 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Goyal, F. Bonchi, and L. Lakshmanan. Discovering leaders from community actions. In CIKM 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. M. Hill. A simple general approach to inference about the tail of a distribution. Annals of Stat., 1975.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. Hjaltason and H. Samet. Properties of embedding methods for similarity searching in metric spaces. IEEE Trans. Pattern Anal. Mach. Intel., 25(5):530--549, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301):13--30, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  22. T. Ikeda, M.-Y. Hsu, H. Imai, S. Nishimura, H. Shimoura, T. Hashimoto, K. Tenmoku, and K. Mitoh. A Fast Alogrithm for Finding Better Routes by AI Search Techniques. In IEEE Vehicle Navigation and Information Systems Conference, 1994.Google ScholarGoogle Scholar
  23. G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Comp. 20(1):359--392, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Kleinberg, A. Slivkins, and T. Wexler. Triangulation and embedding using small sets of beacons. In FOCS 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. HP. Kriegel, P. Kroger, M. Renz, and T. Schmidt. Vivaldi: a decentralized network coordinate system. In SIGCOMM 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Ng and H. Zhang. Predicting internet network distance with coordiantes-based approaches. In INFOCOM 2001.Google ScholarGoogle Scholar
  27. I. Pohl. Bi-directional Search. In Machine Intelligence, vol. 6, Edinburgh University Press, 1971, pp. 127--140.Google ScholarGoogle Scholar
  28. M. J. Rattigan, M. Maier, and D. Jensen. Using structure indices for efficient approximation of network properties. In KDD 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Samet, J. Sankaranarayanan, and H. Alborzi. Scalable network distance browsing in spatial databases. In SIGMOD'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Singla and M. Richardson. Yes, there is a correlation: from social networks to personal behavior on the web. In WWW'08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Tang and M. Crovella. Virtual landmarks for the internet. In IMC 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Thorup and U. Zwick. Approximate distance oracles. In ACM Symp. on Theory of Computing, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Tong, C. Faloutsos and J-Y.Pan. Fast random walk with restart and its applications In ICDM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Ukkonen, C. Castillo, D. Donato, and A. Gionis. Searching the wikipedia with contextual information. In CIKM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Venkateswaran, D. Lachwani, T. Kahveci, and C. Jermaine. Reference-based indexing of sequence databases. In VLDB 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. V. Vieira, B. M. Fonseca, R. Damazio, P. B. Golgher, D. de Castro Reis, and B. Ribeiro--Neto. Efficient search ranking in social networks. In CIKM 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Xiao, W. Wu, J. Pei, W. Wang, and Z. He. Efficiently Indexing Shortest Paths by Exploiting Symmetry in Graphs. In EDBT 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. U. Zwick. Exact and approximate distances in graphs -- a survey. LNCS, 2161, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast shortest path distance estimation in large networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
      November 2009
      2162 pages
      ISBN:9781605585123
      DOI:10.1145/1645953

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader