research-article

More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks

Authors:
Weiren Yu

The University of New South Wales, Australia and NICTA, Australia

The University of New South Wales, Australia and NICTA, Australia
View Profile

,
Xuemin Lin

East China Normal University, China and The University of New South Wales, Australia

East China Normal University, China and The University of New South Wales, Australia
View Profile

,
Wenjie Zhang

The University of New South Wales, Australia

The University of New South Wales, Australia
View Profile

,
Lijun Chang

The University of New South Wales, Australia

The University of New South Wales, Australia
View Profile

,
Jian Pei

Simon Fraser University, Canada

Simon Fraser University, Canada
View Profile

Proceedings of the VLDB Endowment Volume 7 Issue 1pp 13–24https://doi.org/10.14778/2732219.2732221

Published:01 September 2013Publication History

Proceedings of the VLDB Endowment

Abstract

Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a well-studied measure of similarity between two nodes in a graph. It recursively follows the philosophy that "two nodes are similar if they are referenced (have incoming edges) from similar nodes", which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., "zero-similarity": It only accommodates paths with equal length from a common "center" node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counter-intuitive "zero-similarity" issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank* can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixed-point iterative paradigm of SimRank* in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient and effective heuristic to speed up SimRank* computation to O(Knm) time, where m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency.

References

I. Antonellis, H. G. Molina, and C. Chang. SimRank++: Query rewriting through link analysis of the click graph. PVLDB, 1(1), 2008. Google ScholarDigital Library
P. Berkhin. Survey: A survey on PageRank computing. Internet Mathematics, 2(1), 2005.Google Scholar
V. D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. V. Dooren. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev., 46(4), 2004. Google ScholarDigital Library
R. Brualdi and D. Cvetkovic. A Combinatorial Approach to Matrix Theory and Its Applications. Discrete Mathematics and Its Applications. Taylor & Francis, 2008.Google ScholarCross Ref
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, 2008. Google ScholarDigital Library
S. Chakrabarti. Dynamic personalized PageRank in entity-relation graphs. In WWW, pages 571--580, 2007. Google ScholarDigital Library
D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW, 2005. Google ScholarDigital Library
G. He, H. Feng, C. Li, and H. Chen. Parallel SimRank computation on large graphs with iterative aggregation. In KDD, 2010. Google ScholarDigital Library
G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. In KDD, 2002. Google ScholarDigital Library
R. Jin, V. E. Lee, and H. Hong. Axiomatic ranking of network role similarity. In KDD, 2011. Google ScholarDigital Library
M. M. Kessler. Bibliographic coupling between scientific papers. Amer. Doc., 14(1): 10--25, 1963.Google ScholarCross Ref
P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, 2012. Google ScholarDigital Library
E. A. Leicht, P. Holme, and M. E. J. Newman. Vertex similarity in networks. Physical Review E, 73(2), 2006.Google ScholarCross Ref
C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of SimRank for static and dynamic information networks. In EDBT, 2010. Google ScholarDigital Library
X. Lin. On the computational complexity of edge concentration. Discrete Applied Mathematics, 101(1-3): 197--205, 2000. Google ScholarDigital Library
Z. Lin, M. R. Lyu, and I. King. MatchSim: A novel similarity measure based on maximum neighborhood matching. Knowl. Inf. Syst., 32(1), 2012. Google ScholarDigital Library
D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for SimRank computation. PVLDB, 1(1), 2008. Google ScholarDigital Library
H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci., 24(4), 1973.Google ScholarCross Ref
H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In ICDM, 2006. Google ScholarDigital Library
W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. SimFusion: Measuring similarity using unified relationship matrix. In SIGIR, 2005. Google ScholarDigital Library
X. Yin, J. Han, and P. S. Yu. LinkClus: Efficient clustering via heterogeneous semantic links. In VLDB, 2006. Google ScholarDigital Library
W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. http://www.cse.unsw.edu.au/~weirenyu/pubs/20130428.pdf UNSW-CSE-TR-201304, University of New South Wales, 2013.Google Scholar
P. Zhao, J. Han, and Y. Sun. P-Rank: A comprehensive structural similarity measure over information networks. In CIKM, 2009. Google ScholarDigital Library
Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural / attribute similarities. PVLDB, 2(1), 2009. Google ScholarDigital Library

Recommendations

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Among the most interesting measures in intuitionistic fuzzy sets (IFSs) theory, the similarity measure is an essential tool to compare and determine degree of similarity between IFSs. Although there exist many similarity measures for IFSs, most of them ...
Read More
Single-valued neutrosophic similarity measures based on cotangent function and their application in the fault diagnosis of steam turbine

Similarity measure is an important tool in pattern recognition and fault diagnosis. This paper proposes two cotangent similarity measures for single-valued neutrosophic sets (SVNSs) based on cotangent function. Then, the weighted cotangent similarity ...
Read More
Improved cosine similarity measures of simplified neutrosophic sets for medical diagnoses

We proposed improved cosine similarity measures of simplified neutrosophic sets (SNSs) based on cosine function, including single valued neutrosophic cosine similarity measures and interval neutrosophic cosine similarity measures, to overcome some ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 7, Issue 1
September 2013
96 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2013
Published in pvldb Volume 7, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 146
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Single-valued neutrosophic similarity measures based on cotangent function and their application in the fault diagnosis of steam turbine

Improved cosine similarity measures of simplified neutrosophic sets for medical diagnoses

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition

Single-valued neutrosophic similarity measures based on cotangent function and their application in the fault diagnosis of steam turbine

Improved cosine similarity measures of simplified neutrosophic sets for medical diagnoses

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media