skip to main content
10.1145/345508.345594acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

A practical hypertext catergorization method using links and incrementally available class information

Authors Info & Claims
Published:01 July 2000Publication History

ABSTRACT

As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyperlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in effectiveness while reducing the processing time dramatically. We attempt to explain through experiments what factors contribute to the improvement.

References

  1. 1.Chidanand Apte, Fred Damerau, and Sholom M. Weis, "Towards Language Independent Automated Learning of Text Categorization models", Proc. of the 17 th annual international ACM-SIGIR, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Bharat, K., and Henzmger, M., "Improved Algonthms for Topic Distillation in a Hyperlinked Environment", Proc. of the 2P' annual international ACM S1GIR, pages 104-111, Melbourne, Australia, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Soumen Chakrabarti, Byron Dora, and Ptotr Indyk, "Enhanced Hypertext Categorization using Hyperhnks", Proc. of the international Conference on SIGMOD '98, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Soumen Chakrabarth Byron Dom, S. Raw Kumar, Prabhakar Raghavan, Sridhan Rajagopalan, Andrew Tomkins, David Gibson and Jon Klemberg, "Mining the Web's Link Structure'; IEEE Computer, Vol. 32, No. 8, August 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Mark Craven, Dan Di Pasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, and Sean Siattery, "Learning to Extract Knowledge from the World Wide Web. Proc. of the international Workshop on AAAI "98, 1998, also in Internal Report, School of Computer Science, CMU, CMU-CS-98-122, September 1, 1998 (http://www.cs.cmu.edu/afs/cs.cmu.edu/project/ theo-11/www/wwkb/overview-aaai98.ps.gz .) Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.W.B. Croft & H. R. Turtle, "Retrieval Strategies for Hypertext." Information Processing and Management, 29 (3), 313-324, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.P.J. Hayes, P. M. Andersen, I. B. Niernhurg, and L. M. Schmandt, "TCS: A Shell for Content-Based Text Categorization", Proc. of the 6 'h IEEE-CAIA '90, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Mart A. Hearst, "Support Vector Machines", IEEE Information Systems, 13(4):1828, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Won-Kyun Joo 7 Sung-Hyon Myaeng, "Improving Retrieval Effectivness with Link Information", Proc. of the international Workshop on IRAL '98, 1998.Google ScholarGoogle Scholar
  10. 10.Kleinberg, J., "Authoritative sources in a hyperlinked environment." Proc. of 9th ACM SIAM Symposium in Discrete Algorithms, pages 668-677, San Francisco, California, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Jeong-Mook Lim, Hyo-Jung Oh, Sung-Hyon Myaeng, and Mann-Ho Lee, "Improving Efficiency with Document Category Information in Link-based Retrieval", Proc. of the international Workshop on IRAL "99, 1999.Google ScholarGoogle Scholar
  12. 12.David D. Lewis and Marc Ringuette, " A Comparison of Two Learning Algorithms for Text Categorization", Proc. of the ya Annual Symposium on Document Analysis and Information Retreival, 1994.Google ScholarGoogle Scholar
  13. 13.Andrew McCallum and Kamal Nigram, "A Comparison of Event Models for Naive Bayes Text Classification", AAA1 '98 Workshop on Learning for Text Categorization, 1998.Google ScholarGoogle Scholar
  14. 14.J. Savoy, "An Extended Vector Processing Scheme for Searching Information in Hypertext." Information Processing and Management, 32 (2), 155- 170, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Yiming Yang and Xin Liu, "A Re-examination Of Text Categorization Methods", Proc. of the 22th annual international A CM-SIGIR '99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A practical hypertext catergorization method using links and incrementally available class information

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
              July 2000
              396 pages
              ISBN:1581132263
              DOI:10.1145/345508

              Copyright © 2000 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 July 2000

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate792of3,983submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader