skip to main content
10.1145/1645953.1646025acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

P-Rank: a comprehensive structural similarity measure over information networks

Authors Info & Claims
Published:02 November 2009Publication History

ABSTRACT

With the ubiquity of information networks and their broad applications, the issue of similarity computation between entities of an information network arises and draws extensive research interests. However, to effectively and comprehensively measure "how similar two entities are within an information network" is nontrivial, and the problem becomes even more challenging when the information network to be examined is massive and diverse. In this paper, we propose a new similarity measure, P-Rank (Penetrating Rank), toward effectively computing the structural similarities of entities in real information networks. P-Rank enriches the well-known similarity measure, SimRank, by jointly encoding both in- and out-link relationships into structural similarity computation. P-Rank is proven to be a unified structural similarity framework, under which all state-of-the-art similarity measures, including CoCitation, Coupling, Amsler and SimRank, are just its special cases. Based on its recursive nature of P-Rank, we propose a fixed point algorithm to reinforce structural similarity of vertex pairs beyond the localized neighborhood scope toward the entire information network. Our experimental studies demonstrate the power of P-Rank as an effective similarity measure in different information networks. Meanwhile, under the same time/space complexity, P-Rank outperforms SimRank as a comprehensive and more meaningful structural similarity measure, especially in large real information networks.

References

  1. R. Amsler. Application of citation-based automatic classification. Technical report, The University of Texas at Austin Linguistics Research Center, December 1972.Google ScholarGoogle Scholar
  2. I. Antonellis, H. Garcia-Molina, and C.-C. Chang. Simrank++: Query rewriting through link analysis of the click graph. In Proceedings of VLDB, pages 408--421, 2008.Google ScholarGoogle Scholar
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In Fourth SIAM International Conference on Data Mining (SDM' 04), April 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Fogaras and B. Racz. Scaling link-based similarity search. In Proceedings of WWW, pages 641--650, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Geerts, H. Mannila, and E. Terzi. Relational link-based ranking. In Proceedings of VLDB, pages 552--563, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. L. Giles. The future of citeseer. In 10th European Conference on PKDD (PKDD'06), page 2, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Heymans and A. K. Singh. Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics, 19 Suppl 1, 2003.Google ScholarGoogle Scholar
  10. G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD conference (KDD'02), pages 538--543. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Jiang, J. Vaidya, Z. Balaporia, C. Clifton, and B. Banich. Knowledge discovery from transportation network data. In Proceedings of the 21st ICDE Conference (ICDE'05), pages 1061--1072, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Kaufman and P. J. Rousseeuw. Finding groups in data: an introduction to cluster analysis. John Wiley and Sons, 1990.Google ScholarGoogle Scholar
  13. M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10--25, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tompkins, and E. Upfal. The web as a graph. In Proceedings of PODS, pages 1--10, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Lin, I. King, and M. R. Lyu. Pagesim: A novel link-based similarity measure for the world wide web. In Web Intelligence, pages 687--693, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Lizorkin, P. Velikhov, M. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. Proc. VLDB Endow., 1(1):422--433, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. G. Maguitman, F. Menczer, F. Erdinc, H. Roinestad, and A. Vespignani. Algorithmic computation and approximation of semantic similarity. World Wide Web, 9(4):431--456, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (IMC'07), pages 29--42, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Popescul, G. Flake, S. Lawrence, L. Ungar, and C. L. Giles. Clustering and identifying temporal trends in document databases. In Proceedings of the IEEE Advances in Digital Libraries, pages 173--182, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Roy, T. Lane, and M. Werner-Washburne. Integrative construction and analysis of condition-specific biological networks. In Proceedings of AAAI'07, pages 1898--1899, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. G. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4):265--269, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  23. W. Xi, E. A. Fox, W. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: Measuring similarity using unified relationship matrix. In SIGIR, pages 130--137, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Yin, J. Han, and P. S. Yu. Linkclus: Efficient clustering via heterogeneous semantic links. In Proceedings of VLDB, pages 427--438, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. P-Rank: a comprehensive structural similarity measure over information networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
        November 2009
        2162 pages
        ISBN:9781605585123
        DOI:10.1145/1645953

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 November 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader