ABSTRACT
In this paper we study approximate landmark-based methods for point-to-point distance estimation in very large networks. These methods involve selecting a subset of nodes as landmarks and computing offline the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, it can be estimated quickly by combining the precomputed distances. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. We therefore explore theoretical insights to devise a variety of simple methods that scale well in very large networks. The efficiency of the suggested techniques is tested experimentally using five real-world graphs having millions of edges. While theoretical bounds support the claim that random landmarks work well in practice, our extensive experimentation shows that smart landmark selection can yield dramatically more accurate results: for a given target accuracy, our methods require as much as 250 times less space than selecting landmarks at random. In addition, we demonstrate that at a very small accuracy loss our techniques are several orders of magnitude faster than the state-of-the-art exact methods. Finally, we study an application of our methods to the task of social search in large graphs.
- I. Abraham, Y. Bartal, H. Chan, K. Dhamdhere, A. Gupta, J. Kleinberg, O. Neiman, and A. Slivkins. Metric embeddings with relaxed guarantees. In FOCS 2005. Google ScholarDigital Library
- S. Amer-Yahia, M. Benedikt, L. V. Lakshmanan, and J. Stoyanovic. Efficient network-aware search in collaborative tagging sites. In VLDB 2008.Google Scholar
- V. Athitsos, P. Papapetrou, M. Potamias, G. Kollios, and D. Gunopulos. Approximate embedding-based subsequence matching of time series. In SIGMOD 2008. Google ScholarDigital Library
- D. Bader, S. Kintali, K. Madduri, and M. Mihail. Approximating betweenness centrality. In WAW 2007. Google ScholarDigital Library
- Baeza and Ribeiro. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999. Google ScholarDigital Library
- J. Bourgain. On Lipschitz embedding of finite metric spaces in Hilbert space. Israel Journal of Mathematics, 52(1):46--52, March 1985.Google ScholarCross Ref
- U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 2001.Google ScholarCross Ref
- V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 1979.Google ScholarDigital Library
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, 2nd Edition. The MIT Press, 2001. Google ScholarDigital Library
- F. Dabek, R. Cox, F. Kaashoek, and R. Morris. Vivaldi: a decentralized network coordinate system. In SIGCOMM 2004. Google ScholarDigital Library
- E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1959.Google Scholar
- R. Fagin, A. Lotem, and M. Naor Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci., 2003. Google ScholarDigital Library
- R. W. Floyd. Algorithm 97: Shortest path. Commun. ACM, 5(6), June 1962. Google ScholarDigital Library
- D. Fogaras, and B. Racz Towards Scaling Fully Personalized PageRank. Algorithms and Models for the Web-Graph, pp. 105--117, 2004.Google ScholarCross Ref
- L. Freeman. A set of measures of centrality based on betweenness. Sociometry, 40(1):35--41, 1977.Google ScholarCross Ref
- A. Goldberg, H. Kaplan, and R. Werneck. Reach for A¤: Efficient point-to-point shortest path algorithms. Tech. Rep. MSR-TR-2005-132, October 2005.Google Scholar
- A. Goldberg and C. Harrelson. Computing the shortest path: A* search meets graph theory. In SODA 2005. Google ScholarDigital Library
- A. Goyal, F. Bonchi, and L. Lakshmanan. Discovering leaders from community actions. In CIKM 2008. Google ScholarDigital Library
- B. M. Hill. A simple general approach to inference about the tail of a distribution. Annals of Stat., 1975.Google ScholarCross Ref
- G. Hjaltason and H. Samet. Properties of embedding methods for similarity searching in metric spaces. IEEE Trans. Pattern Anal. Mach. Intel., 25(5):530--549, 2003. Google ScholarDigital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301):13--30, 1963.Google ScholarCross Ref
- T. Ikeda, M.-Y. Hsu, H. Imai, S. Nishimura, H. Shimoura, T. Hashimoto, K. Tenmoku, and K. Mitoh. A Fast Alogrithm for Finding Better Routes by AI Search Techniques. In IEEE Vehicle Navigation and Information Systems Conference, 1994.Google Scholar
- G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Comp. 20(1):359--392, 1999. Google ScholarDigital Library
- J. Kleinberg, A. Slivkins, and T. Wexler. Triangulation and embedding using small sets of beacons. In FOCS 2004. Google ScholarDigital Library
- HP. Kriegel, P. Kroger, M. Renz, and T. Schmidt. Vivaldi: a decentralized network coordinate system. In SIGCOMM 2004. Google ScholarDigital Library
- E. Ng and H. Zhang. Predicting internet network distance with coordiantes-based approaches. In INFOCOM 2001.Google Scholar
- I. Pohl. Bi-directional Search. In Machine Intelligence, vol. 6, Edinburgh University Press, 1971, pp. 127--140.Google Scholar
- M. J. Rattigan, M. Maier, and D. Jensen. Using structure indices for efficient approximation of network properties. In KDD 2006. Google ScholarDigital Library
- H. Samet, J. Sankaranarayanan, and H. Alborzi. Scalable network distance browsing in spatial databases. In SIGMOD'08. Google ScholarDigital Library
- P. Singla and M. Richardson. Yes, there is a correlation: from social networks to personal behavior on the web. In WWW'08. Google ScholarDigital Library
- L. Tang and M. Crovella. Virtual landmarks for the internet. In IMC 2003. Google ScholarDigital Library
- M. Thorup and U. Zwick. Approximate distance oracles. In ACM Symp. on Theory of Computing, 2001. Google ScholarDigital Library
- H. Tong, C. Faloutsos and J-Y.Pan. Fast random walk with restart and its applications In ICDM, 2006. Google ScholarDigital Library
- A. Ukkonen, C. Castillo, D. Donato, and A. Gionis. Searching the wikipedia with contextual information. In CIKM, 2008. Google ScholarDigital Library
- J. Venkateswaran, D. Lachwani, T. Kahveci, and C. Jermaine. Reference-based indexing of sequence databases. In VLDB 2006. Google ScholarDigital Library
- M. V. Vieira, B. M. Fonseca, R. Damazio, P. B. Golgher, D. de Castro Reis, and B. Ribeiro--Neto. Efficient search ranking in social networks. In CIKM 2007. Google ScholarDigital Library
- Y. Xiao, W. Wu, J. Pei, W. Wang, and Z. He. Efficiently Indexing Shortest Paths by Exploiting Symmetry in Graphs. In EDBT 2009 Google ScholarDigital Library
- U. Zwick. Exact and approximate distances in graphs -- a survey. LNCS, 2161, 2001. Google ScholarDigital Library
Index Terms
- Fast shortest path distance estimation in large networks
Recommendations
Adaptive Landmark Selection Strategies for Fast Shortest Path Computation in Large Real-World Graphs
WI-IAT '14: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01This paper considers the task of answering shortest path queries in large real-world graphs such as social networks, communication networks and web graphs. The traditional Breadth First Search (BFS) approach for solving this problem is too time-...
Fast edge searching and fast searching on graphs
Given a graph G=(V,E) in which a fugitive hides on vertices or along edges, graph searching problems are usually to find the minimum number of searchers required to capture the fugitive. In this paper, we consider the problem of finding the minimum ...
Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementComputing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of ...
Comments