Skip to main content

Automatic Detection of Suspicious Bangla Text Using Logistic Regression

  • Conference paper
  • First Online:
Intelligent Computing and Optimization (ICO 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1072))

Included in the following conference series:

Abstract

Suspicious Bangla text detection is a text classification problem of determining Bangla texts into suspicious and non suspicious categories. In this paper, we have proposed a machine learning based system that can classify Bangla texts into suspicious and non-suspicious. For this purpose, a corpus is developed and logistic regression algorithm is used for classification task. In order to measure the effectiveness of the proposed system a comparison of accuracy among other algorithms such as Naive Bayes, SVM, KNN, and decision tree also performed. The experimental result with 1500 training documents and 500 testing documents shows that the logistic regression provides the highest accuracy (92%) than other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The Daily Jugantor. www.jugantor.com/

  2. The Daily Kaler Kantho. http://www.kalerkantho.com/

  3. Dhormockery Blog. https://www.dhormockery.com/

  4. Facebook Page Basher Kella. https://www.facebook.com/basherkellanews/

  5. Istishoner Blog. www.istishon.com/

  6. Open Source Bengali Corpus. https://scdnlab.com/corpus/

  7. U.S Department of Homeland Security. https://www.dhs.gov/see-something-say-something/what-suspicious-activity

  8. Ahmad, A., Amin, M.R.: Bengali word embedding and it’s application in solving document classification problem. In: International Conference Computer and Information Technology, pp. 425–430. IEEE (2016)

    Google Scholar 

  9. Alami, S., Beqali, O.: Detecting suspicious profiles using text analysis within social media. J. Theor. Appl. Inf. Technol. 73(3) (2015)

    Google Scholar 

  10. Alsaleem, S., et al.: Automated arabic text categorization using SVM and NB. Int. Arab J. e-Technol. 2(2), 124–128 (2011)

    Google Scholar 

  11. Chavan, G.S., Manjare, S., Hegde, P., Sankhe, A.: A survey of various machine learning techniques for text classification. Int. J. Eng. Trends Tech. 15(6) (2014)

    Google Scholar 

  12. Chy, A.N., Seddiqui, M.H., Das, S.: Bangla news classification using naive Bayes classifier. In: International Conference on Computer and Information Technology, pp. 366–371. IEEE (2014)

    Google Scholar 

  13. Harisinghaney, A., Dixit, A., Gupta, S., Arora, A.: Text and image based spam email classification using KNN, Naïve Bayes and reverse DBSCAN algorithm. In: International Conference on Optimization, Reliability, and Information Technology, pp. 153–155. IEEE (2014)

    Google Scholar 

  14. Hossain, M.R., Hoque, M.M.: Automatic Bengali document categorization based on word embedding and statistical learning approaches. In: International Conference on Computer, Communication, Chemical, Material and Electronic Engineering, pp. 1–6. IEEE (2018)

    Google Scholar 

  15. Ismail, S., Rahman, M.S.: Bangla word clustering based on n-gram language model. In: International Conference on Electrical Engineering and Information and Communication Technology, pp. 1–5. IEEE (2014)

    Google Scholar 

  16. Jong, Y.Y., Dongmin, Y.: Classification scheme of unstructured text document using TF-IDF and naive Bayes classifier. In: Computer and Computing Science

    Google Scholar 

  17. Kaya, M., Fidan, G., Toroslu, I.H.: Sentiment analysis of Turkish political news. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, pp. 174–180. IEEE Computer Society (2012)

    Google Scholar 

  18. Krendzelak, M., Jakab, F.: Text categorization with machine learning and hierarchical structures. In: International Conference on Emerging eLearning Technologies and Applications, pp. 1–5. IEEE (2015)

    Google Scholar 

  19. Nizamani, S., Memon, N., Wiil, U.K., Karampelas, P.: Modeling suspicious email detection using enhanced feature selection. arXiv preprint arXiv:1312.1971 (2013)

  20. Sharma, M., Zhuang, D., Bilgic, M.: Active learning with rationales for text classification. In: Conference of the North American Chapter of the ACL: Human Language Technologies, pp. 441–451 (2015)

    Google Scholar 

  21. Villmann, T., Bohnsack, A., Kaden, M.: Can learning vector quantization be an alternative to SVM and deep learning? - recent trends and advanced variants of learning vector quantization for classification learning. J. Artif. Intell. Soft Comput. Res. 7(1), 65–81 (2017)

    Article  Google Scholar 

  22. Wei, L., Wei, B., Wang, B.: Text classification using support vector machine with mixture of kernel. J. Softw. Eng. Appl. 5, 55 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Moshiul Hoque .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharif, O., Hoque, M.M. (2020). Automatic Detection of Suspicious Bangla Text Using Logistic Regression. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_57

Download citation

Publish with us

Policies and ethics