ABSTRACT
Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.
- B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? predicting expert quality ratings of web documents. In Proc. ACM SIGIR 2000, pages 296--303. Google ScholarDigital Library
- K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the ACM-SIGIR, 1998. Google ScholarDigital Library
- S. Brin and L. Page, "The anatomy of a large-scale hypertextual Web search engine", In The Seventh International World Wide Web Conference, 1998. Google ScholarDigital Library
- D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, Extracting content structure for web pages based on visual representation, Proc.5th Asia Pacific Web Conference, Xi'an China, 2003. Google ScholarDigital Library
- D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma, VIPS: a vision-based page segmentation algorithm, Microsoft Technical Report, MSR-TR-2003-79, 2003.Google Scholar
- S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource list compilation by analyzing hyperlink structure and associated text. In Proc. of the 7th Int. World Wide Web Conference, May 1998. Google ScholarDigital Library
- S. Chakrabarti, Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction, In the 10th International World Wide Web Conference, 2001. Google ScholarDigital Library
- S. Chakrabarti, M. Joshi, and V. Tawde, Enhanced topic distillation using text, markup tags, and hyperlinks, In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press, 2001, pp. 208--216. Google ScholarDigital Library
- Brian D. Davison. Recognizing nepotistic links on the Web. In Artificial Intelligence for Web Search, pages 23--28. AAAI Press, July 2000.Google Scholar
- N. E. Efthimiadis, Query Expansion, In Annual Review of Information Systems and Technology, Vol. 31, 1996, pp. 121--187.Google Scholar
- G. Flake, S. Lawrence, L. Giles, and F. Coetzee, Self-organization and identification of web communities, IEEE Computer, pp. 66--71, 2002. Google ScholarDigital Library
- D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (HYPER-98), pages 225--234, New York, June 20--24 1998. ACM Press. Google ScholarDigital Library
- T.H. Haveliwala. Topic-sensitive pagerank. In Proc. of the 11th Int. World Wide Web Conference, May 2002. Google ScholarDigital Library
- N. Jushmerick. Learning to remove Intemet advertisements. Proc. of 3rd International Conf. On Autonomous Agents, 1999. Google ScholarDigital Library
- H. Kao, S. Lin, J. Ho and M. Chen, Entropy-Based Link Analysis for Mining Web Informative Structures. CIKM'02, 2002. Google ScholarDigital Library
- J. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the ACM, Vol. 46, No. 5, pp. 604--622, 1999. Google ScholarDigital Library
- R. Lempel and S. Moran, The stochastic approach for link-structure analysis (SALSA) and the TKC effect, Proc. 9th International World Wide Web Conference, 2000. Google ScholarDigital Library
- Joel C. Miller, Gregory Rae, Fred Schaefer. Modifications of Kleinberg's HITS algorithms Using Matrix Exponentiation and Web Log Records, in: Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web, Technical report, Stanford University, Stanford, CA, 1998.Google Scholar
- S. E. Robertson, Overview of the okapi projects, Journal of Documentation, Vol. 53, No. 1, 1997, pp. 3--7.Google ScholarCross Ref
- R. Song, H. Liu, J. R. Wen, W. Y. Ma, "Learning Block Importance Models for Web Pages," Proc. 13th World Wide Web Conference, New York, 2004. Google ScholarDigital Library
- V. Vapnik, The nature of statistical learning theory, Springer, New York, 1995. Google ScholarDigital Library
- J. R. Wen, R. Song, D. Cai, K. Zhu, S. Yu, S. Ye, and W.-Y. Ma, Microsoft Research Asia at the web track of TREC 2003, in the twelfth Text Retrieval Conference (TREC 2003), 2003.Google Scholar
- S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma, Improving pseudo-relevance feedback in web information retrieval using web page segmentation, Proc. 12th World Wide Web Conference, Budapest, Hungary, 2003. Google ScholarDigital Library
Index Terms
- Block-level link analysis
Recommendations
Block-based web search
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalMultiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentation algorithms to partition web pages into blocks and investigate how to ...
Link analysis ranking: algorithms, theory, and experiments
The explosive growth and the widespread accessibility of the Web has led to a surge of research activity in the area of information retrieval on the World Wide Web. The seminal papers of Kleinberg [1998, 1999] and Brin and Page [1998] introduced Link ...
Link analysis using time series of web graphs
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementLink analysis is a key technology in contemporary web search engines. Most of the previous work on link analysis only used information from one snapshot of web graph. Since commercial search engines crawl the Web periodically, they will naturally obtain ...
Comments