ABSTRACT
As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyperlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in effectiveness while reducing the processing time dramatically. We attempt to explain through experiments what factors contribute to the improvement.
- 1.Chidanand Apte, Fred Damerau, and Sholom M. Weis, "Towards Language Independent Automated Learning of Text Categorization models", Proc. of the 17 th annual international ACM-SIGIR, 1994. Google ScholarDigital Library
- 2.Bharat, K., and Henzmger, M., "Improved Algonthms for Topic Distillation in a Hyperlinked Environment", Proc. of the 2P' annual international ACM S1GIR, pages 104-111, Melbourne, Australia, 1998. Google ScholarDigital Library
- 3.Soumen Chakrabarti, Byron Dora, and Ptotr Indyk, "Enhanced Hypertext Categorization using Hyperhnks", Proc. of the international Conference on SIGMOD '98, 1998. Google ScholarDigital Library
- 4.Soumen Chakrabarth Byron Dom, S. Raw Kumar, Prabhakar Raghavan, Sridhan Rajagopalan, Andrew Tomkins, David Gibson and Jon Klemberg, "Mining the Web's Link Structure'; IEEE Computer, Vol. 32, No. 8, August 1999. Google ScholarDigital Library
- 5.Mark Craven, Dan Di Pasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, and Sean Siattery, "Learning to Extract Knowledge from the World Wide Web. Proc. of the international Workshop on AAAI "98, 1998, also in Internal Report, School of Computer Science, CMU, CMU-CS-98-122, September 1, 1998 (http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ theo-11/www/wwkb/overview-aaai98.ps.gz .) Google ScholarDigital Library
- 6.W.B. Croft & H. R. Turtle, "Retrieval Strategies for Hypertext." Information Processing and Management, 29 (3), 313-324, 1993. Google ScholarDigital Library
- 7.P.J. Hayes, P. M. Andersen, I. B. Niernhurg, and L. M. Schmandt, "TCS: A Shell for Content-Based Text Categorization", Proc. of the 6 'h IEEE-CAIA '90, 1990. Google ScholarDigital Library
- 8.Mart A. Hearst, "Support Vector Machines", IEEE Information Systems, 13(4):1828, 1998. Google ScholarDigital Library
- 9.Won-Kyun Joo 7 Sung-Hyon Myaeng, "Improving Retrieval Effectivness with Link Information", Proc. of the international Workshop on IRAL '98, 1998.Google Scholar
- 10.Kleinberg, J., "Authoritative sources in a hyperlinked environment." Proc. of 9th ACM SIAM Symposium in Discrete Algorithms, pages 668-677, San Francisco, California, 1998. Google ScholarDigital Library
- 11.Jeong-Mook Lim, Hyo-Jung Oh, Sung-Hyon Myaeng, and Mann-Ho Lee, "Improving Efficiency with Document Category Information in Link-based Retrieval", Proc. of the international Workshop on IRAL "99, 1999.Google Scholar
- 12.David D. Lewis and Marc Ringuette, " A Comparison of Two Learning Algorithms for Text Categorization", Proc. of the ya Annual Symposium on Document Analysis and Information Retreival, 1994.Google Scholar
- 13.Andrew McCallum and Kamal Nigram, "A Comparison of Event Models for Naive Bayes Text Classification", AAA1 '98 Workshop on Learning for Text Categorization, 1998.Google Scholar
- 14.J. Savoy, "An Extended Vector Processing Scheme for Searching Information in Hypertext." Information Processing and Management, 32 (2), 155- 170, 1996. Google ScholarDigital Library
- 15.Yiming Yang and Xin Liu, "A Re-examination Of Text Categorization Methods", Proc. of the 22th annual international A CM-SIGIR '99, 1999. Google ScholarDigital Library
Index Terms
- A practical hypertext catergorization method using links and incrementally available class information
Recommendations
Feral hypertext: when hypertext literature escapes control
HYPERTEXT '05: Proceedings of the sixteenth ACM conference on Hypertext and hypermediaThis paper presents a historical view of hypertext looking at pre-web hypertext as a domesticated species bred in captivity, and arguing that on the web, some breeds of hypertext have gone feral. Feral hypertext is no longer tame and domesticated, but ...
Hypertext as Method
HT '19: Proceedings of the 30th ACM Conference on Hypertext and Social MediaHistorically, there has been a tendency to consider hypertext as a type of system, perhaps characterized by provision of links or other structure to users. In this paper, we consider hypertext as a method of inquiry, a way of viewing arbitrary systems. ...
Comments