Abstract
Similarity assessment is one of the core tasks in hyperlink analysis. Recently, with the proliferation of applications, e.g., web search and collaborative filtering, SimRank has been a well-studied measure of similarity between two nodes in a graph. It recursively follows the philosophy that "two nodes are similar if they are referenced (have incoming edges) from similar nodes", which can be viewed as an aggregation of similarities based on incoming paths. Despite its popularity, SimRank has an undesirable property, i.e., "zero-similarity": It only accommodates paths with equal length from a common "center" node. Thus, a large portion of other paths are fully ignored. This paper attempts to remedy this issue. (1) We propose and rigorously justify SimRank*, a revised version of SimRank, which resolves such counter-intuitive "zero-similarity" issues while inheriting merits of the basic SimRank philosophy. (2) We show that the series form of SimRank* can be reduced to a fairly succinct and elegant closed form, which looks even simpler than SimRank, yet enriches semantics without suffering from increased computational cost. This leads to a fixed-point iterative paradigm of SimRank* in O(Knm) time on a graph of n nodes and m edges for K iterations, which is comparable to SimRank. (3) To further optimize SimRank* computation, we leverage a novel clustering strategy via edge concentration. Due to its NP-hardness, we devise an efficient and effective heuristic to speed up SimRank* computation to O(Knm) time, where m is generally much smaller than m. (4) Using real and synthetic data, we empirically verify the rich semantics of SimRank*, and demonstrate its high computation efficiency.
- I. Antonellis, H. G. Molina, and C. Chang. SimRank++: Query rewriting through link analysis of the click graph. PVLDB, 1(1), 2008. Google ScholarDigital Library
- P. Berkhin. Survey: A survey on PageRank computing. Internet Mathematics, 2(1), 2005.Google Scholar
- V. D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. V. Dooren. A measure of similarity between graph vertices: Applications to synonym extraction and web searching. SIAM Rev., 46(4), 2004. Google ScholarDigital Library
- R. Brualdi and D. Cvetkovic. A Combinatorial Approach to Matrix Theory and Its Applications. Discrete Mathematics and Its Applications. Taylor & Francis, 2008.Google ScholarCross Ref
- G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, 2008. Google ScholarDigital Library
- S. Chakrabarti. Dynamic personalized PageRank in entity-relation graphs. In WWW, pages 571--580, 2007. Google ScholarDigital Library
- D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW, 2005. Google ScholarDigital Library
- G. He, H. Feng, C. Li, and H. Chen. Parallel SimRank computation on large graphs with iterative aggregation. In KDD, 2010. Google ScholarDigital Library
- G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. In KDD, 2002. Google ScholarDigital Library
- R. Jin, V. E. Lee, and H. Hong. Axiomatic ranking of network role similarity. In KDD, 2011. Google ScholarDigital Library
- M. M. Kessler. Bibliographic coupling between scientific papers. Amer. Doc., 14(1): 10--25, 1963.Google ScholarCross Ref
- P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, 2012. Google ScholarDigital Library
- E. A. Leicht, P. Holme, and M. E. J. Newman. Vertex similarity in networks. Physical Review E, 73(2), 2006.Google ScholarCross Ref
- C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of SimRank for static and dynamic information networks. In EDBT, 2010. Google ScholarDigital Library
- X. Lin. On the computational complexity of edge concentration. Discrete Applied Mathematics, 101(1-3): 197--205, 2000. Google ScholarDigital Library
- Z. Lin, M. R. Lyu, and I. King. MatchSim: A novel similarity measure based on maximum neighborhood matching. Knowl. Inf. Syst., 32(1), 2012. Google ScholarDigital Library
- D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for SimRank computation. PVLDB, 1(1), 2008. Google ScholarDigital Library
- H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci., 24(4), 1973.Google ScholarCross Ref
- H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In ICDM, 2006. Google ScholarDigital Library
- W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. SimFusion: Measuring similarity using unified relationship matrix. In SIGIR, 2005. Google ScholarDigital Library
- X. Yin, J. Han, and P. S. Yu. LinkClus: Efficient clustering via heterogeneous semantic links. In VLDB, 2006. Google ScholarDigital Library
- W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. http://www.cse.unsw.edu.au/~weirenyu/pubs/20130428.pdf UNSW-CSE-TR-201304, University of New South Wales, 2013.Google Scholar
- P. Zhao, J. Han, and Y. Sun. P-Rank: A comprehensive structural similarity measure over information networks. In CIKM, 2009. Google ScholarDigital Library
- Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural / attribute similarities. PVLDB, 2(1), 2009. Google ScholarDigital Library
Recommendations
A novel similarity/dissimilarity measure for intuitionistic fuzzy sets and its application in pattern recognition
Among the most interesting measures in intuitionistic fuzzy sets (IFSs) theory, the similarity measure is an essential tool to compare and determine degree of similarity between IFSs. Although there exist many similarity measures for IFSs, most of them ...
Single-valued neutrosophic similarity measures based on cotangent function and their application in the fault diagnosis of steam turbine
Similarity measure is an important tool in pattern recognition and fault diagnosis. This paper proposes two cotangent similarity measures for single-valued neutrosophic sets (SVNSs) based on cotangent function. Then, the weighted cotangent similarity ...
Improved cosine similarity measures of simplified neutrosophic sets for medical diagnoses
We proposed improved cosine similarity measures of simplified neutrosophic sets (SNSs) based on cosine function, including single valued neutrosophic cosine similarity measures and interval neutrosophic cosine similarity measures, to overcome some ...
Comments