Automatic Detection of Suspicious Bangla Text Using Logistic Regression

Sharif, Omar; Hoque, Mohammed Moshiul

doi:10.1007/978-3-030-33585-4_57

Omar Sharif¹⁷ &
Mohammed Moshiul Hoque¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1072))

Included in the following conference series:

International Conference on Intelligent Computing & Optimization

1132 Accesses
4 Citations

Abstract

Suspicious Bangla text detection is a text classification problem of determining Bangla texts into suspicious and non suspicious categories. In this paper, we have proposed a machine learning based system that can classify Bangla texts into suspicious and non-suspicious. For this purpose, a corpus is developed and logistic regression algorithm is used for classification task. In order to measure the effectiveness of the proposed system a comparison of accuracy among other algorithms such as Naive Bayes, SVM, KNN, and decision tree also performed. The experimental result with 1500 training documents and 500 testing documents shows that the logistic regression provides the highest accuracy (92%) than other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

The Daily Jugantor. www.jugantor.com/
The Daily Kaler Kantho. http://www.kalerkantho.com/
Dhormockery Blog. https://www.dhormockery.com/
Facebook Page Basher Kella. https://www.facebook.com/basherkellanews/
Istishoner Blog. www.istishon.com/
Open Source Bengali Corpus. https://scdnlab.com/corpus/
U.S Department of Homeland Security. https://www.dhs.gov/see-something-say-something/what-suspicious-activity
Ahmad, A., Amin, M.R.: Bengali word embedding and it’s application in solving document classification problem. In: International Conference Computer and Information Technology, pp. 425–430. IEEE (2016)
Google Scholar
Alami, S., Beqali, O.: Detecting suspicious profiles using text analysis within social media. J. Theor. Appl. Inf. Technol. 73(3) (2015)
Google Scholar
Alsaleem, S., et al.: Automated arabic text categorization using SVM and NB. Int. Arab J. e-Technol. 2(2), 124–128 (2011)
Google Scholar
Chavan, G.S., Manjare, S., Hegde, P., Sankhe, A.: A survey of various machine learning techniques for text classification. Int. J. Eng. Trends Tech. 15(6) (2014)
Google Scholar
Chy, A.N., Seddiqui, M.H., Das, S.: Bangla news classification using naive Bayes classifier. In: International Conference on Computer and Information Technology, pp. 366–371. IEEE (2014)
Google Scholar
Harisinghaney, A., Dixit, A., Gupta, S., Arora, A.: Text and image based spam email classification using KNN, Naïve Bayes and reverse DBSCAN algorithm. In: International Conference on Optimization, Reliability, and Information Technology, pp. 153–155. IEEE (2014)
Google Scholar
Hossain, M.R., Hoque, M.M.: Automatic Bengali document categorization based on word embedding and statistical learning approaches. In: International Conference on Computer, Communication, Chemical, Material and Electronic Engineering, pp. 1–6. IEEE (2018)
Google Scholar
Ismail, S., Rahman, M.S.: Bangla word clustering based on n-gram language model. In: International Conference on Electrical Engineering and Information and Communication Technology, pp. 1–5. IEEE (2014)
Google Scholar
Jong, Y.Y., Dongmin, Y.: Classification scheme of unstructured text document using TF-IDF and naive Bayes classifier. In: Computer and Computing Science
Google Scholar
Kaya, M., Fidan, G., Toroslu, I.H.: Sentiment analysis of Turkish political news. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, pp. 174–180. IEEE Computer Society (2012)
Google Scholar
Krendzelak, M., Jakab, F.: Text categorization with machine learning and hierarchical structures. In: International Conference on Emerging eLearning Technologies and Applications, pp. 1–5. IEEE (2015)
Google Scholar
Nizamani, S., Memon, N., Wiil, U.K., Karampelas, P.: Modeling suspicious email detection using enhanced feature selection. arXiv preprint arXiv:1312.1971 (2013)
Sharma, M., Zhuang, D., Bilgic, M.: Active learning with rationales for text classification. In: Conference of the North American Chapter of the ACL: Human Language Technologies, pp. 441–451 (2015)
Google Scholar
Villmann, T., Bohnsack, A., Kaden, M.: Can learning vector quantization be an alternative to SVM and deep learning? - recent trends and advanced variants of learning vector quantization for classification learning. J. Artif. Intell. Soft Comput. Res. 7(1), 65–81 (2017)
Article Google Scholar
Wei, L., Wei, B., Wang, B.: Text classification using support vector machine with mixture of kernel. J. Softw. Eng. Appl. 5, 55 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong, 4349, Bangladesh
Omar Sharif & Mohammed Moshiul Hoque

Authors

Omar Sharif
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Moshiul Hoque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Moshiul Hoque .

Editor information

Editors and Affiliations

Department of Fundamental and Applied Sciences, Universiti Teknologi Petronas, Tronoh, Perak, Malaysia
Pandian Vasant
Computer Science, FEI, VSB-TU Ostrava, Ostrava, Czech Republic
Ivan Zelinka
Faculty of Engineering Management, Poznan University of Technology, Poznan, Poland
Gerhard-Wilhelm Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharif, O., Hoque, M.M. (2020). Automatic Detection of Suspicious Bangla Text Using Logistic Regression. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-33585-4_57
Published: 27 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33584-7
Online ISBN: 978-3-030-33585-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics