skip to main content
research-article

More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks

Authors Info & Claims
Published:01 September 2013Publication History
Skip Abstract Section

Abstract

Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a well-studied measure of similarity between two nodes in a graph. It recursively follows the philosophy that "two nodes are similar if they are referenced (have incoming edges) from similar nodes", which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., "zero-similarity": It only accommodates paths with equal length from a common "center" node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counter-intuitive "zero-similarity" issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank* can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixed-point iterative paradigm of SimRank* in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient and effective heuristic to speed up SimRank* computation to O(Knm) time, where m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency.

References

  1. I. Antonellis, H. G. Molina, and C. Chang. SimRank++: Query rewriting through link analysis of the click graph. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Berkhin. Survey: A survey on PageRank computing. Internet Mathematics, 2(1), 2005.Google ScholarGoogle Scholar
  3. V. D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. V. Dooren. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev., 46(4), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Brualdi and D. Cvetkovic. A Combinatorial Approach to Matrix Theory and Its Applications. Discrete Mathematics and Its Applications. Taylor & Francis, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  5. G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chakrabarti. Dynamic personalized PageRank in entity-relation graphs. In WWW, pages 571--580, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. He, H. Feng, C. Li, and H. Chen. Parallel SimRank computation on large graphs with iterative aggregation. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. In KDD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Jin, V. E. Lee, and H. Hong. Axiomatic ranking of network role similarity. In KDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. M. Kessler. Bibliographic coupling between scientific papers. Amer. Doc., 14(1): 10--25, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. A. Leicht, P. Holme, and M. E. J. Newman. Vertex similarity in networks. Physical Review E, 73(2), 2006.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of SimRank for static and dynamic information networks. In EDBT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Lin. On the computational complexity of edge concentration. Discrete Applied Mathematics, 101(1-3): 197--205, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Lin, M. R. Lyu, and I. King. MatchSim: A novel similarity measure based on maximum neighborhood matching. Knowl. Inf. Syst., 32(1), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for SimRank computation. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci., 24(4), 1973.Google ScholarGoogle ScholarCross RefCross Ref
  19. H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In ICDM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. SimFusion: Measuring similarity using unified relationship matrix. In SIGIR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Yin, J. Han, and P. S. Yu. LinkClus: Efficient clustering via heterogeneous semantic links. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. http://www.cse.unsw.edu.au/~weirenyu/pubs/20130428.pdf UNSW-CSE-TR-201304, University of New South Wales, 2013.Google ScholarGoogle Scholar
  23. P. Zhao, J. Han, and Y. Sun. P-Rank: A comprehensive structural similarity measure over information networks. In CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural / attribute similarities. PVLDB, 2(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 7, Issue 1
    September 2013
    96 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 September 2013
    Published in pvldb Volume 7, Issue 1

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader