skip to main content
10.1145/1008992.1009068acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Block-level link analysis

Authors Info & Claims
Published:25 July 2004Publication History

ABSTRACT

Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.

References

  1. B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Proc. ACM SIGIR 2000, pages 296--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the ACM-SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web search engine", In The Seventh International World Wide Web Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, Extracting content structure for web pages based on visual representation, Proc.5th Asia Pacific Web Conference, Xi'an China, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, VIPS: a vision-based page segmentation algorithm, Microsoft Technical Report, MSR-TR-2003-79, 2003.Google ScholarGoogle Scholar
  6. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proc. of the 7th Int. World Wide Web Conference, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chakrabarti, Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction, In the 10th International World Wide Web Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chakrabarti, M. Joshi, and V. Tawde, Enhanced topic distillation using text, markup tags, and hyperlinks, In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press, 2001, pp. 208--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Brian D. Davison. Recognizing nepotistic links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000.Google ScholarGoogle Scholar
  10. N. E. Efthimiadis, Query Expansion, In Annual Review of Information Systems and Technology, Vol. 31, 1996, pp. 121--187.Google ScholarGoogle Scholar
  11. G. Flake, S. Lawrence, L. Giles, and F. Coetzee, Self-organization and identification of web communities, IEEE Computer, pp. 66--71, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (HYPER-98), pages 225--234, New York, June 20--24 1998. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T.H. Haveliwala. Topic-sensitive pagerank. In Proc. of the 11th Int. World Wide Web Conference, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Jushmerick. Learning to remove Intemet advertisements. Proc. of 3rd International Conf. On Autonomous Agents, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Kao, S. Lin, J. Ho and M. Chen, Entropy-Based Link Analysis for Mining Web Informative Structures. CIKM'02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the ACM, Vol. 46, No. 5, pp. 604--622, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Lempel and S. Moran, The stochastic approach for link-structure analysis (SALSA) and the TKC effect, Proc. 9th International World Wide Web Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Joel C. Miller, Gregory Rae, Fred Schaefer. Modifications of Kleinberg's HITS algorithms Using Matrix Exponentiation and Web Log Records, in: Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web, Technical report, Stanford University, Stanford, CA, 1998.Google ScholarGoogle Scholar
  20. S. E. Robertson, Overview of the okapi projects, Journal of Documentation, Vol. 53, No. 1, 1997, pp. 3--7.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Song, H. Liu, J. R. Wen, W. Y. Ma, "Learning Block Importance Models for Web Pages," Proc. 13th World Wide Web Conference, New York, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Vapnik, The nature of statistical learning theory, Springer, New York, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. R. Wen, R. Song, D. Cai, K. Zhu, S. Yu, S. Ye, and W.-Y. Ma, Microsoft Research Asia at the web track of TREC 2003, in the twelfth Text Retrieval Conference (TREC 2003), 2003.Google ScholarGoogle Scholar
  24. S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma, Improving pseudo-relevance feedback in web information retrieval using web page segmentation, Proc. 12th World Wide Web Conference, Budapest, Hungary, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Block-level link analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
          July 2004
          624 pages
          ISBN:1581138814
          DOI:10.1145/1008992

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 July 2004

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader