Abstract
Suspicious Bangla text detection is a text classification problem of determining Bangla texts into suspicious and non suspicious categories. In this paper, we have proposed a machine learning based system that can classify Bangla texts into suspicious and non-suspicious. For this purpose, a corpus is developed and logistic regression algorithm is used for classification task. In order to measure the effectiveness of the proposed system a comparison of accuracy among other algorithms such as Naive Bayes, SVM, KNN, and decision tree also performed. The experimental result with 1500 training documents and 500 testing documents shows that the logistic regression provides the highest accuracy (92%) than other algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The Daily Jugantor. www.jugantor.com/
The Daily Kaler Kantho. http://www.kalerkantho.com/
Dhormockery Blog. https://www.dhormockery.com/
Facebook Page Basher Kella. https://www.facebook.com/basherkellanews/
Istishoner Blog. www.istishon.com/
Open Source Bengali Corpus. https://scdnlab.com/corpus/
U.S Department of Homeland Security. https://www.dhs.gov/see-something-say-something/what-suspicious-activity
Ahmad, A., Amin, M.R.: Bengali word embedding and it’s application in solving document classification problem. In: International Conference Computer and Information Technology, pp. 425–430. IEEE (2016)
Alami, S., Beqali, O.: Detecting suspicious profiles using text analysis within social media. J. Theor. Appl. Inf. Technol. 73(3) (2015)
Alsaleem, S., et al.: Automated arabic text categorization using SVM and NB. Int. Arab J. e-Technol. 2(2), 124–128 (2011)
Chavan, G.S., Manjare, S., Hegde, P., Sankhe, A.: A survey of various machine learning techniques for text classification. Int. J. Eng. Trends Tech. 15(6) (2014)
Chy, A.N., Seddiqui, M.H., Das, S.: Bangla news classification using naive Bayes classifier. In: International Conference on Computer and Information Technology, pp. 366–371. IEEE (2014)
Harisinghaney, A., Dixit, A., Gupta, S., Arora, A.: Text and image based spam email classification using KNN, Naïve Bayes and reverse DBSCAN algorithm. In: International Conference on Optimization, Reliability, and Information Technology, pp. 153–155. IEEE (2014)
Hossain, M.R., Hoque, M.M.: Automatic Bengali document categorization based on word embedding and statistical learning approaches. In: International Conference on Computer, Communication, Chemical, Material and Electronic Engineering, pp. 1–6. IEEE (2018)
Ismail, S., Rahman, M.S.: Bangla word clustering based on n-gram language model. In: International Conference on Electrical Engineering and Information and Communication Technology, pp. 1–5. IEEE (2014)
Jong, Y.Y., Dongmin, Y.: Classification scheme of unstructured text document using TF-IDF and naive Bayes classifier. In: Computer and Computing Science
Kaya, M., Fidan, G., Toroslu, I.H.: Sentiment analysis of Turkish political news. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, pp. 174–180. IEEE Computer Society (2012)
Krendzelak, M., Jakab, F.: Text categorization with machine learning and hierarchical structures. In: International Conference on Emerging eLearning Technologies and Applications, pp. 1–5. IEEE (2015)
Nizamani, S., Memon, N., Wiil, U.K., Karampelas, P.: Modeling suspicious email detection using enhanced feature selection. arXiv preprint arXiv:1312.1971 (2013)
Sharma, M., Zhuang, D., Bilgic, M.: Active learning with rationales for text classification. In: Conference of the North American Chapter of the ACL: Human Language Technologies, pp. 441–451 (2015)
Villmann, T., Bohnsack, A., Kaden, M.: Can learning vector quantization be an alternative to SVM and deep learning? - recent trends and advanced variants of learning vector quantization for classification learning. J. Artif. Intell. Soft Comput. Res. 7(1), 65–81 (2017)
Wei, L., Wei, B., Wang, B.: Text classification using support vector machine with mixture of kernel. J. Softw. Eng. Appl. 5, 55 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Sharif, O., Hoque, M.M. (2020). Automatic Detection of Suspicious Bangla Text Using Logistic Regression. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2019. Advances in Intelligent Systems and Computing, vol 1072. Springer, Cham. https://doi.org/10.1007/978-3-030-33585-4_57
Download citation
DOI: https://doi.org/10.1007/978-3-030-33585-4_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33584-7
Online ISBN: 978-3-030-33585-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)