Skip to main content
Log in

Text categorization based on combination of modified back propagation neural network and latent semantic analysis

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper proposed a new text categorization model based on the combination of modified back propagation neural network (MBPNN) and latent semantic analysis (LSA). The traditional back propagation neural network (BPNN) has slow training speed and is easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the MBPNN to accelerate the training speed of BPNN and improve the categorization accuracy. LSA can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimension but also discovers the important associative relationship between terms. We test our categorization model on 20-newsgroup corpus and reuter-21578 corpus, experimental results show that the MBPNN is much faster than the traditional BPNN. It also enhances the performance of the traditional BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Kim SB, Rim HC, Yook D, Lim HS (2002) Effective methods for improving Naive Bayes text classifiers. In: The 7th Pacific rim international conference on artificial intelligence. pp 414–423

  2. Wu MC, Lin SY, Lin CH (2006) An effective application of decision tree to stock trading. Expert Syst Appl 31(2):270–274. doi:10.1016/j.eswa.2005.09.026

    Article  Google Scholar 

  3. Soucy P, Mineau GW (2001) A simple k-NN algorithm for text categorization. In: Proceeding of the first IEEE international conference on data mining (ICDM_01), 28, pp 647–648

  4. Li R, Wang J, Chen X, Tao X, Hu Y (2005) Using maximum entropy model for Chinese text categorization. J Comput Res Dev 42(1):94–101. doi:10.1360/crad20050113 in Chinese with English abstract

    Article  Google Scholar 

  5. Kazama J, Tsujii J (2005) Maximum entropy models with inequality constraints: A case study on text categorization. Mach Learn 60(1–3):159–194. doi:10.1007/s10994-005-0911-3

    Article  Google Scholar 

  6. Liu WY, Song N (2003) A fuzzy approach to classification of text documents. J Comput Sci Technol 18(5):640–647

    Article  MATH  MathSciNet  Google Scholar 

  7. Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: Nedellec C, Rouveirol C (eds) Proceedings of the 10th European conference on machine learning (ECML-98). Springer, Chemnitz, pp 137–142

    Google Scholar 

  8. Yang Y, Liu X (1999). A re-examination of text categorization methods, Proceedigs of SIGIR’99. pp 42–49

  9. Ma L, Shepherd J, Zhang Y (2003) Enhancing text classification using synopses extraction. In: Proceeding of the fourth international conference on web information systems engineering, pp 115–124

  10. Savio LY Lam, Dik Lun Lee (1999). Feature reduction for neural network based text categorization, 6th international conference on database systems for advanced applications (DASFAA ‘99)

  11. Ng HT, Goh WB, Low KL (1997). Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of the 20th annual international ACM-SIGIR conference on research and development in information retrieval, pp 67–73

  12. Nakayama M, Shimizu Y (2003) Subject categorization for web educational resources using MLP. In: Proceedings of 11th European symposium on artificial neural networks, pp 9–14

  13. Ruiz ME, Srinivasan P (1998). Automatic text categorization using neural network. In: Proceedings of the 8th ASIS SIG/CR workshop on classification research, pp 59–72

  14. Ma S, Ji C (1998) A unified approach on fast training of feedforward and recurrent networks using EM algorithm. IEEE Trans Signal Process 46(46):2270–2274

    Google Scholar 

  15. Van oyen, Nienhuis B (1992) Improving the convergence of the back propagation algorithm. Neural Netw 5:465–471

  16. Yu XH, Chen GA, Cheng SX (1993) Acceleration of backpropagation learning using optimised learning rate and momentum. Electron Lett 29(14):1288–1289. doi:10.1049/el:19930860

    Article  Google Scholar 

  17. Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407. doi:10.1002/(SICI)1097-4571(199009)41:6≤391::AID-ASI1≥3.0.CO;2-9

    Article  Google Scholar 

  18. Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37(4):573–595. doi:10.1137/1037127

    Article  MATH  MathSciNet  Google Scholar 

  19. Yang Y (1995) Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th ACM international conference on rexorch ond development in informorion retrieval. New York, pp 256.263

  20. Zelikovitz S, Hirsh H (2001) Using LSI for text classification in the presence of background text. In: Proceedings of the tenth international conference on Information and knowledge management. ACM Press, pp 113–118

  21. Sun J-T, Chen Z, Zeng H-J, Lu Y, Shi C-Y, Ma W-Y (2004) Supervised Latent Semantic Indexing for Document Categorization. In: ICDM. IEEE Press, pp 535–538

  22. Mitra V et al. (2005). A neuro-SVM model for text classification using latent semantic indexing. In: Proceeding of international joint conference on neural networks, pp 564–569

  23. Wu W, Feng G, Li Z, Xu Y (2005) Deterministic convergence of an online gradient method for BP neural networks. IEEE Trans Neural Netw 16(3). doi:10.1109/TNN.2005.844903

  24. Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137

    Google Scholar 

  25. The Open CV is available on sourceforge http://www.sourceforge.net/projects/opencvlibrary/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, W., Yu, B. Text categorization based on combination of modified back propagation neural network and latent semantic analysis. Neural Comput & Applic 18, 875–881 (2009). https://doi.org/10.1007/s00521-008-0193-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-008-0193-3

Keywords

Navigation