research-article

Accuracy estimate and optimization techniques for SimRank computation

Authors:
Dmitry Lizorkin

Institute for System Programming of the Russian Academy of Sciences

Institute for System Programming of the Russian Academy of Sciences
View Profile

,
Pavel Velikhov

Institute for System Programming of the Russian Academy of Sciences

Institute for System Programming of the Russian Academy of Sciences
View Profile

,
Maxim Grinev

Institute for System Programming of the Russian Academy of Sciences

Institute for System Programming of the Russian Academy of Sciences
View Profile

,
Denis Turdakov

Institute for System Programming of the Russian Academy of Sciences

Institute for System Programming of the Russian Academy of Sciences
View Profile

Proceedings of the VLDB Endowment Volume 1 Issue 1pp 422–433https://doi.org/10.14778/1453856.1453904

Published:01 August 2008Publication History

Proceedings of the VLDB Endowment

Abstract

The measure of similarity between objects is a very useful tool in many areas of computer science, including information retrieval. SimRank is a simple and intuitive measure of this kind, based on graph-theoretic model. SimRank is typically computed iteratively, in the spirit of PageRank. However, existing work on SimRank lacks accuracy estimation of iterative computation and has discouraging time complexity.

In this paper we present a technique to estimate the accuracy of computing SimRank iteratively. This technique provides a way to find out the number of iterations required to achieve a desired accuracy when computing SimRank. We also present optimization techniques that improve the computational complexity of the iterative algorithm from O(n⁴) to O(n³) in the worst case. We also introduce a threshold sieving heuristic and its accuracy estimation that further improves the efficiency of the method.

As a practical illustration of our techniques we computed SimRank scores on a subset of English Wikipedia corpus, consisting of the complete set of articles and category links.

References

S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarDigital Library
R. Cohen and S. Havlin. Scale-free networks are ultrasmall. Physical Review Letter, 90(5):058701, 2003.Google ScholarCross Ref
D. Fogaras and B. Rácz. Scaling link-based similarity search. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 641--650, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of The Twentieth International Joint Conference for Artificial Intelligence, pages 1606--1611, Hyderabad, India, 2007. Google ScholarDigital Library
P. Ganesan, H. Garcia-Molina, and J. Widom. Exploiting hierarchical domain structure to compute similarity. ACM Transactions on Information Systems, 21(1):64--93, 2003. Google ScholarDigital Library
F. Geerts, H. Mannila, and E. Terzi. Relational link-based ranking. In VLDB '2004: Proceedings of the Thirtieth international conference on Very large data bases, pages 552--563. VLDB Endowment, 2004. Google ScholarDigital Library
G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543. ACM Press, 2002. Google ScholarDigital Library
L. Li, D. Alderson, R. Tanaka, J. C. Doyle, and W. Willinger. Towards a theory of scale-free graphs: Definition, properties, and implications (extended version). CoRR, abs/cond-mat/0501169, 2005.Google Scholar
D. Lin. An information-theoretic definition of similarity. In Proc. 15th International Conf. on Machine Learning, pages 296--304. Morgan Kaufmann, San Francisco, CA, 1998. Google ScholarDigital Library
Z. Lin, I. King, and M. R. Lyu. Pagesim: A novel link-based similarity measure for the world wide web. In WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pages 687--693, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
W. Lu, J. Janssen, E. E. Milios, and N. Japkowicz. Node similarity in networked information spaces. In D. A. Stewart and J. H. Johnson, editors, CASCON, page 11. IBM, 2001. Google ScholarDigital Library
A. G. Maguitman, F. Menczer, F. Erdinc, H. Roinestad, and A. Vespignani. Algorithmic computation and approximation of semantic similarity. World Wide Web, 9(4):431--456, 2006. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web, 1998.Google Scholar
A. R. Schmidt, F. Waas, M. L. Kersten, M. J. Carey, I. Manolescu, and R. Busse. XMark: A Benchmark for XML Data Management. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 974--985, Hong Kong, China, 2002. Google ScholarDigital Library
H. Small. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4):265--269, 1973.Google ScholarCross Ref
M. Strube and S. Ponzetto. WikiRelate! Computing semantic relatedness using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pages 1419--1424, Boston, Mass., July 2006. Google ScholarDigital Library
W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: measuring similarity using unified relationship matrix. In SIGIR '05: Proceedings of the 28th international ACM SIGIR conference on Research and development in information retrieval, pages 130--137, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
T. Zesch and I. Gurevych. Analysis of the Wikipedia Category Graph for NLP Applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), pages 1--8, 2007.Google Scholar

Index Terms

Accuracy estimate and optimization techniques for SimRank computation

Recommendations

Scalable similarity search for SimRank
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

SimRank, proposed by Jeh and Widom, provides a good similarity score and has been successfully used in many of the above mentioned applications. While there are many algorithms proposed so far to compute SimRank, but unfortunately, none of them are ...
Read More
Fast computation of SimRank for static and dynamic information networks
EDBT '10: Proceedings of the 13th International Conference on Extending Database Technology

Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between ...
Read More
SimRank: a measure of structural-context similarity
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

The problem of measuring "similarity" of objects arises in many applications, and many domain-specific measures have been developed, e.g., matching text across documents or computing overlap among item-sets. We propose a complementary approach, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 1, Issue 1
August 2008
1216 pages
ISSN:2150-8097
Editors:
Peter Buneman,
Beng Chin Ooi,
Kenneth Ross,
Gerald Weber
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2008
Published in pvldb Volume 1, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 456
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accuracy estimate and optimization techniques for SimRank computation

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Scalable similarity search for SimRank

Fast computation of SimRank for static and dynamic information networks

SimRank: a measure of structural-context similarity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accuracy estimate and optimization techniques for SimRank computation

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Scalable similarity search for SimRank

Fast computation of SimRank for static and dynamic information networks

SimRank: a measure of structural-context similarity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media