ABSTRACT
In the literature, various link-based similarity measures such as Adamic/Adar (in short Ada), SimRank, and random walk with restart (RWR) have been proposed. Contrary to SimRank and RWR, Ada is a non-recursive measure, which exploits the local graph structure in similarity computation. Motivated by Ada's promising results in various graph-related tasks, along with the fact that SimRank is a recursive generalization of the co -citation measure, in this paper, we propose AdaSim, a recursive similarity measure based on the Ada philosophy. Our AdaSim provides identical accuracy to that of Ada on the first iteration and it is applicable to both directed and undirected graphs. To accelerate our iterative form, we also propose a matrix form that is dramatically faster while providing the exact AdaSim scores. We conduct extensive experiments with five real-world datasets to evaluate both the effectiveness and efficiency of our AdaSim in comparison with those of existing similarity measures and graph embedding methods in the task of similarity computation of nodes. Our experimental results show that 1) AdaSim significantly improves the effectiveness of Ada and outperforms other competitors, 2) its efficiency is comparable to that of SimRank* while being better than the others, 3) AdaSim is not sensitive to the parameter tuning, and 4) similarity measures are better than embedding methods to compute similarity of nodes.
Supplemental Material
- Lada A. Adamic and Eytan Adar. 2003. Friends and Neighbors on the Web. Social Networks, Vol. 25, 3 (July 2003), 211--230.Google ScholarCross Ref
- Paweena Chaiwanarom and Chidchanok Lursinsap. 2015. Collaborator Recommendation in Interdisciplinary Computer Science Using Degrees of Collaborative Forces, Temporal Evolution of Research Interest, and Comparative Seniority Status. Knowledge-Based Systems, Vol. 75 (February 2015), 161--172. Google ScholarDigital Library
- Hung-Hsuan Chen and C. Lee Giles. 2015. ASCOS: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank. ACM Transactions on Knowledge Discovery from Data, TKDD, Vol. 10, 2, Article 15 (October 2015), 26 pages. Google ScholarDigital Library
- Quanyu Dai. 2019. Implementation of DWNS. https://github.com/wonniu/AdvT4NE_WWW2019 Retrieved May 26, 2021 fromGoogle Scholar
- Quanyu Dai, Xiao Shen, Liang Zhang, Qiang Li, and Dan Wang. 2019. Adversarial Training Methods for Network Embedding. In Proceedings of the 28th International Conference on World Wide Web, WWW. 329--339. Google ScholarDigital Library
- Daniel Fogaras and Balazs Racz. 2005. Scaling Link-based Similarity Search. In Proceedings of the 14th International Conference on World Wide Web, WWW. 641--650. Google ScholarDigital Library
- Aditya Grover. 2017. Implementation of node2vec. https://github.com/aditya-grover/node2vec Retrieved May 26, 2021 fromGoogle Scholar
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD. 855--864. Google ScholarDigital Library
- Masoud Reyhani Hamedani and Sang-Wook Kim. 2017. JacSim: An Accurate and Efficient Link-Based Similarity Measure In Graphs. Information Sciences, Vol. 414 (November 2017), 203--224.Google ScholarCross Ref
- Masoud Reyhani Hamedani and Sang-Wook Kim. 2019. Pairwise Normalization in Simrank Variants: Problem, Solution, and Evaluation. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, ACM SAC. 534--541. Google ScholarDigital Library
- Jiawei Han, Micheline Kamber, and Jian Pei. 2006. Data Mining: Concepts and Techniques, Second Edition. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
- Roger A. Horn Han and Charles R. Johnson. 2013. Matrix Analysis, Second Edition. Cambridge University Press. Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. 2002. SimRank: A Measure of Structural-Context Similarity. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD. 538--543. Google ScholarDigital Library
- Jinhong Jung, Namyong Park, Sael Lee, and U Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD. 789--804. Google ScholarDigital Library
- Jinhong Jung, Kijung Shin, Lee Sael, and U Kang. 2016. Random Walk with Restart on Large Graphs Using Block Elimination. ACM Transactions on Database Systems, TODS, Vol. 41, 2, Article 12 (May 2016), 43 pages. Google ScholarDigital Library
- Jérôme Kunegis, Julia Preusse, and Felix Schwager. 2013. What is the Added Value of Negative Links in Online Social Networks? In Proceedings of the 22nd International Conference on World Wide Web, WWW. 727--736. Google ScholarDigital Library
- Jundong Li, Liang Wu, Ruocheng Guo, Chenghao Liu, and Huan Liu. 2019b. Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM. 49--56. Google ScholarDigital Library
- Mo Li, Farhana M. Choudhury, Renata Borovica-Gajic, Zhiqiong Wang, Junchang Xin, and Jianxin Li. 2019a. CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs. In Proceedings of the 36th IEEE International Conference on Data Engineering, IEEE ICDE. 1141--1152.Google Scholar
- David Liben-Nowell and Jon Kleinberg. 2007. The Link-prediction Problem for Social Networks. Journal of the American Society for Information Science and Technology, JASIST, Vol. 58, 7 (May 2007), 1--23. Google ScholarDigital Library
- Zhenjiang Lin, Michael R. Lyu, and Irwin King. 2012. MatchSim: A Novel Similarity Measure Based on Maximum Neighborhood Matching. Knowledge and Information Systems, KAIS, Vol. 32, 1 (July 2012), 141--166. Google ScholarDigital Library
- Dmitry Lizorkin, Pavel Velikhov, Maxim Grinev, and Denis Turdakov. 2008. Accuracy Estimate and Optimization Techniques for SimRank Computation. In Proceedings of the VLDB Endowment. 422--433.Google ScholarDigital Library
- Walid Magdy and Gareth J.Jones. 2010. PRES: A Score Metric for Evaluating Recall-oriented Information Retrieval Applications. In Proceedings of the 33rd International Conference on Research and Development in Information Retrieval, ACM SIGIR. 611--618. Google ScholarDigital Library
- Christopher.D. Manning, Prabhakar Raghavan, and Hinrich Schutze. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arxiv: 1301.3781 [cs.CL]Google Scholar
- Bryan Perozzi. 2014. Implementation of DeepWalk. https://github.com/phanein/deepwalk Retrieved May 26, 2021 fromGoogle Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD. 701--710. Google ScholarDigital Library
- Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li. 2010. LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval. Information Retrieval, Vol. 13, 4 (2010), 346--374. Google ScholarDigital Library
- Jiezhong Qiu. 2018. Implementation of NetMF. https://github.com/xptree/NetMF Retrieved May 26, 2021 fromGoogle Scholar
- Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings of the 11st ACM International Conference on Web Search and Data Mining, ACM WSDM. 459--467. Google ScholarDigital Library
- Y. Saad. 2003. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. Google ScholarDigital Library
- Henry Small. 1973. Co-citation in the Scientific Literature: A New Measure of the Relationship between Two Documents. Journal of the American Society for Information Science and Technology, JASIST, Vol. 24, 4 (1973), 165--269.Google Scholar
- Jiankai Sun, Bortik Bandyopadhyay, Armin Bashizade, Jiongqian Liang, P. Sadayappan, and Srinivasan Parthasarathy. 2019. ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI. 265--272.Google ScholarDigital Library
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing Properties of Neural Networks. arxiv: 1312.6199 [cs.CV]Google Scholar
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web, WWW. 1067--1077. Google ScholarDigital Library
- Hanghang Tong, Christos Faloutsos, and Jia yu Pan. 2006. Fast Random Walk with Restart and Its Applications. In Proceedings of the 6th IEEE International Conference on Data Mining, IEEE ICDM. 613--622. Google ScholarDigital Library
- Hongwei Wang, Jia Wang, Jialin Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Xing Xie, and Minyi Guo1. 2018. GraphGAN: Graph Representation Learning with Generative Adversarial Nets. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI. 2508--2515.Google ScholarCross Ref
- Zhitao Wang, Chengyao Chen, and Wenjie Li. 2017. Predictive Network Representation Learning for Link Prediction. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR. 969--972. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities Based on Ground-Truth. In Proceedings of the 12th IEEE International Conference on Data Mining, IEEE ICDM. 745--754. Google ScholarDigital Library
- Seok-Ho Yoon, Sang-Wook Kim, and Sunju Park. 2016. C-Rank: A Link-based Similarity Measure for Scientific Literature Databases. Information Sciences, Vol. 326 (January 2016), 25--40. Google ScholarDigital Library
- Weiren Yu, Xuemin Lin, Wenjie Zhang, Jian Pei, and Julie A. McCann. 2019a. Simrank*: Effective and Scalable Pairwise Similarity Search Based on Graph Topology. The VLDB Journal, Vol. 28, 3 (June 2019), 401--426. Google ScholarDigital Library
- Weiren Yu, Wenjie Zhang, Xuemin Lin, Qing Zhang, and Jiajin Le. 2019b. Accelerating Pairwise SimRank Estimation Over Static and Dynamic Graphs. The VLDB Journal, Vol. 28, 1 (2019), 99--122. Google ScholarDigital Library
- Peixiang Zhao, Jiawei Han, and Sun Yizhou. 2009. P-Rank: A Comprehensive Structural Similarity Measure over Information Networks. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, ACM CIKM. 553--562. Google ScholarDigital Library
Index Terms
- AdaSim: A Recursive Similarity Measure in Graphs
Recommendations
GELTOR: A Graph Embedding Method based on Listwise Learning to Rank
WWW '23: Proceedings of the ACM Web Conference 2023Similarity-based embedding methods have introduced a new perspective on graph embedding by conforming the similarity distribution of latent vectors in the embedding space to that of nodes in the graph; they show significant effectiveness over ...
Measuring Similarity Based on Link Information: A Comparative Study
Measuring similarity between objects is a fundamental task in domains such as data mining, information retrieval, and so on. Link-based similarity measures have attracted the attention of many researchers and have been widely applied in recent years. ...
Pairwise normalization in SimRank variants: problem, solution, and evaluation
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied ComputingDespite of the success in the real-world applications, SimRank and its variants, rvs-SimRank and PRank, suffer from the pairwise normalization problem (PNP) as a counter intuitive property hidden in their computation paradigm. Jac-Sim, a state-of-the-...
Comments