Spam detection in social media using convolutional and long short term memory neural network

Jain, Gauri; Sharma, Manisha; Agarwal, Basant

doi:10.1007/s10472-018-9612-z

Spam detection in social media using convolutional and long short term memory neural network

Published: 02 January 2019

Volume 85, pages 21–44, (2019)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

1607 Accesses
97 Citations
Explore all metrics

Abstract

As the use of the Internet is increasing, people are connected virtually using social media platforms such as text messages, Facebook, Twitter, etc. This has led to increase in the spread of unsolicited messages known as spam which is used for marketing, collecting personal information, or just to offend the people. Therefore, it is crucial to have a strong spam detection architecture that could prevent these types of messages. Spam detection in noisy platform such as Twitter is still a problem due to short text and high variability in the language used in social media. In this paper, we propose a novel deep learning architecture based on Convolutional Neural Network (CNN) and Long Short Term Neural Network (LSTM). The model is supported by introducing the semantic information in representation of the words with the help of knowledge-bases such as WordNet and ConceptNet. Use of these knowledge-bases improves the performance by providing better semantic vector representation of testing words which earlier were having random value due to not seen in the training. Proposed Experimental results on two benchmark datasets show the effectiveness of the proposed approach with respect to the accuracy and F1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spam detection on social networks using deep contextualized word representation

Article 14 July 2022

Optimizing semantic LSTM for spam detection

Article 12 April 2018

Fake news detection using recurrent neural network based on bidirectional LSTM and GloVe

Article 10 February 2024

References

Agarwal, B., Mittal, N.: Sentiment analysis using conceptnet ontology and context information. In: Prominent Feature Extraction for Sentiment Analysis. Springer. https://doi.org/10.1007/978-3-319-25343-5 https://doi.org/10.1007/978-3-319-25343-5 (2016)
Almeida, T.A., Yamakami, A., Almeida, J.: Evaluation of approaches for dimensionality reduction applied with naive bayes anti-spam filters. In: International Conference on Machine Learning and Applications, 2009. ICMLA’09, pp 517–522. IEEE (2009)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An experimental comparison of naive bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and development in information retrieval, pp 160–167. ACM (2000)
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1724–1734 (2014)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Cournane, A., Hunt, R.: An analysis of the tools used for the generation and prevention of spam. Comput. Secur. 23(2), 154–166 (2004)
Article Google Scholar
DeBarr, D., Wechsler, H.: Spam detection using clustering, random forests, and active learning. In: Sixth Conference on Email and Anti-Spam. Mountain View (2009)
Devlin, J., Kamali, M., Subramanian, K., Prasad, R., Natarajan, P.: Statistical machine translation as a language model for handwriting recognition. In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 291–296. IEEE (2012)
Gao, Y., Mi, G., Tan, Y.: Variable length concentration based feature construction method for spam detection. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2015)
Grier, C., Thomas, K., Paxson, V.: Zhang, M.: spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)
Havasi, C., Speer, R., Alonso, J.: Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In: Recent Advances in Natural Language Processing (RANLP). John Benjamins Philadelphia, pp 27–29 (2007)
Healy, M., Delany, S.J., Zamolotskikh, A.: An assessment of case base reasoning for short text message classification. In: Proceedings of the 15th Irish Conference on Artificial Intelligence and Cognitive Sciences (AICS’04), pp. 9–18 (2004)
Jain, G., Sharma, M.: Social media: a review. In: Information Systems Design and Intelligent Applications, pp. 387–395. Springer (2016)
Jain, G., Sharma, M., Agarwal, B.: Optimizing semantic lstm for spam detection. Int. J. Inf. Technol. 1–12 (2018)
Jain, G., Sharma, M., Agarwal, B.: Spam detection on social media using semantic convolutional neural network. Int. J. Knowl. Disc. Bioinfo 8(1), 12–26 (2018)
Article Google Scholar
Karami, A., Zhou, L.: Improving static sms spam detection by using new content-based features. In: Twentieth Americas Conference on Information Systems, Savannah, pp. 1–9 (2014)
Kim, C., Hwang, K.B.: Naive bayes classifier learning with feature selection for spam detection in social bookmarking. In: ECML PKDD Discovery Challenge, p 32 (2008)
Kim, J., Chung, K., Choi, K.: Spam filtering with dynamically updated url statistics. IEEE Secur. Priv. 5(4) (2007)
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751. Association for Computational Linguistics (2014)
Kolari, P., Finin, T., Joshi, A.: SVMS for the blogosphere: Blog identification and splog detection. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 92–99 (2006)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), vol. 333, pp. 2267–2273 (2015)
Lei, T., Barzilay, R., Jaakkola, T.: Molding cnns for text: non-linear, non-consecutive convolutions. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1565–1575. Association for Computational Linguistics (2015)
Levine, J.R.: Experiences with greylisting. In: Second Conference on Email and Anti-Spam (CEAS), pp. 1–2 (2005)
Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), pp. 3818–3824 (2016)
Martinez-Romo, J., Araujo, L.: Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40(8), 2992–3000 (2013)
Article Google Scholar
Mccord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: International Conference on Autonomic and Trusted Computing, pp. 175–186. Springer, Berlin (2011)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Miller, Z., Dickinson, B., Deitrick, W., Hu, W., Wang, A.H.: Twitter spammer detection using data stream clustering. Inf. Sci. 260, 64–73 (2014)
Article Google Scholar
Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., Jin, Z.: Discriminative neural sentence modeling by tree-based convolution. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2315–2325. Association for Computational Linguistics (2015)
Sabri, A.T., Mohammads, A.H., Al-Shargabi, B., Hamdeh, M.A.: Developing new continuous learning approach for spam detection using artificial neural network. Eur. J. Sci. Res. 42(3), 525–535 (2010)
Google Scholar
Sainath, T.N., Vinyals, O., Senior, A., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4580–4584. IEEE (2015)
Silva, R.M., Almeida, T.A., Yamakami, A.: Artificial neural networks for content-based web spam detection. In: Proceedings on the International Conference on Artificial Intelligence (ICAI), p. 1 (2012)
Socher, R., Bauer, J., Manning, C.D., Manning, C.D., Andrew, Y.N.: Parsing with compositional vector grammars. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 455–465 (2013)
Stern, H.: A survey of modern spam tools. In: The Fifth Conference on Email and Anti-Spam (CEAS), pp. 1–10 (2008)
Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC’10), pp. 1–9. ACM (2010)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 1556–1566. Association for Computational Linguistics (2015)
Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: 2011 IEEE Symposium on Security and Privacy (SP), pp. 447–462. IEEE (2011)
Tseng, C.Y., Chen, M.S.: Incremental SVM model for spam detection on dynamic email social networks. In: International Conference on Computational Science and Engineering, (CSE’09), vol. 4, pp. 128–135. IEEE (2009)
Wang, H.B., Yu, Y., Liu, Z.: SVM classifier incorporating feature selection using ga for spam detection. In: Embedded and Ubiquitous Computing–EUC, vol. 2005, pp 1147–1154 (2005)
Wu, F., Shu, J., Huang, Y., Yuan, Z.: Co-detecting social spammers and spam messages in microblogging via exploiting social contexts. Neurocomputing 201, 51–65 (2016)
Article Google Scholar
Wu, T., Liu, S., Zhang, J., Xiang, Y.: Twitter spam detection based on deep learning. In: Proceedings of the Australasian Computer Science Week Multiconference, ACSW ’17, pp. 3:1–3:8. ACM, New York (2017)
Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 3(4), 243–269 (2004)
Article Google Scholar
Zhang, Y., Wang, S., Phillips, P., Ji, G.: Binary pso with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 64, 22–31 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Banasthali Vidyapith, Banasthali, India
Gauri Jain & Manisha Sharma
Department of Computer Science and Engineering, Swami Keshvanand Institute of Technology, Jaipur, India
Basant Agarwal

Authors

Gauri Jain
View author publications
You can also search for this author in PubMed Google Scholar
Manisha Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Basant Agarwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gauri Jain.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, G., Sharma, M. & Agarwal, B. Spam detection in social media using convolutional and long short term memory neural network. Ann Math Artif Intell 85, 21–44 (2019). https://doi.org/10.1007/s10472-018-9612-z

Download citation

Published: 02 January 2019
Issue Date: January 2019
DOI: https://doi.org/10.1007/s10472-018-9612-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spam detection in social media using convolutional and long short term memory neural network

Abstract

Access this article

Similar content being viewed by others

Spam detection on social networks using deep contextualized word representation

Optimizing semantic LSTM for spam detection

Fake news detection using recurrent neural network based on bidirectional LSTM and GloVe

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spam detection in social media using convolutional and long short term memory neural network

Abstract

Access this article

Similar content being viewed by others

Spam detection on social networks using deep contextualized word representation

Optimizing semantic LSTM for spam detection

Fake news detection using recurrent neural network based on bidirectional LSTM and GloVe

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation