skip to main content
10.1145/345508.345572acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Text filtering by boosting naive Bayes classifiers

Authors Info & Claims
Published:01 July 2000Publication History

ABSTRACT

Several machine learning algorithms have recently been used for text categorization and filtering. In particular, boosting methods such as AdaBoost have shown good performance applied to real text data. However, most of existing boosting algorithms are based on classifiers that use binary-valued features. Thus, they do not fully make use of the weight information provided by standard term weighting methods. In this paper, we present a boosting-based learning method for text filtering that uses naive Bayes classifiers as a weak learner. The use of naive Bayes allows the boosting algorithm to utilize term frequency information while maintaining probabilistically accurate confidence ratio. Applied to TREC-7 and TREC-8 filtering track documents, the proposed method obtained a significant improvement in LF1, LF2, F1 and F3 measures compared to the best results submitted by other TREC entries.

References

  1. 1.N. J. Belkin and W. B. Croft. Information filtering and information retrieval: Two sides of the same coin?. Communications of the ACM, 35(12):29-38, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.L. Breiman. Bagging predictors. Machine Learning, 24(2):123.-140, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.C. Buckley and G. Salton. Optimization of relevance feedback weights. In Proc. SIGIR-95, pp. 351-357, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.H. Drucker and C. Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems 8, pp. 479-485, 1996.Google ScholarGoogle Scholar
  5. 5.Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th Int. Conf. on Machine Learning, pp. 148.-156, 1996.Google ScholarGoogle Scholar
  6. 6.D. Hull. The TREC-8 filtering track: Description and analysis. In Proc. 7th Text Retrieval Conf. (TREC-7), pp. 33-56, 1998.Google ScholarGoogle Scholar
  7. 7.T. Joachims. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In Proc. Int. Conf. on Machine Learning (ICML-97), pp. 143-151, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.K. L. Kwok, L. Grunfeld, M. Chan, N. Dinstl, and C. Cool. TREC-8 ad-hoc, query and filtering track experiments using PIRCS. In Proc. Text Retrieval Conf. (TREC-8), pp. 107-116, 1998.Google ScholarGoogle Scholar
  9. 9.D. Lewis, R. E. Schapire, J. P.Callan, and R. Papka. Training algorithms for linear text classifters. In Proc. SIGIR-#6, pp. 298-306, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.David Lewis. Evaluating and optimizing autonomous text classification systems. In Proc. SLGIR-95, pp. 246-255, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.A. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. .In Proc. Int. Conf. on Machine Learning ICML- 98), pp. 350-358, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.J. R. Quinlan. bagging, boosting and C4.5 In Proc. AAAI-96, pp. 725-730, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.R. E. Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297-336, Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.R. E. Schapire, Y. Freund, P. Barlett, and W.S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The annual of Statistics, 26(5):1651-1686, 1998.Google ScholarGoogle Scholar
  15. 15.R.E. Schapire, Yoram Singer, and Amit singal Boosting and Rocchio applied to text filtering. In Proc. SIGIR-98, pp. 251-223, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In Proc. SIGIR- 96, pp. 21-29, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proc. SIGIR- 96, pp. 21-29, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.D. K. Harman. Overview of 8th Text Retrieval Conference (TREC-8). In Proc. 8th Text Retrieval Conf. (TREC-8), pp. 1-19, 1999.Google ScholarGoogle Scholar
  19. 19.Y. Yang and X. Liu. A Re-examination of text categorization methods. In Proc. SIGIR-pp. 42-49. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Text filtering by boosting naive Bayes classifiers

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
            July 2000
            396 pages
            ISBN:1581132263
            DOI:10.1145/345508

            Copyright © 2000 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 July 2000

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate792of3,983submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader