skip to main content
10.1145/3014812.3014815acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesaus-cswConference Proceedingsconference-collections
research-article

Twitter spam detection based on deep learning

Authors Info & Claims
Published:31 January 2017Publication History

ABSTRACT

Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning-based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 80%. However, due to the problems of spam drift and information fabrication, these machine-learning based methods cannot efficiently detect spam activities in real-life scenarios. Moreover, the blacklisting method cannot catch up with the variations of spamming activities as manually inspecting suspicious URLs is extremely time-consuming. In this paper, we proposed a novel technique based on deep learning techniques to address the above challenges. The syntax of each tweet will be learned through WordVector Training Mode. We then constructed a binary classifier based on the preceding representation dataset. In experiments, we collected and implemented a 10-day real Tweet datasets in order to evaluate our proposed method. We first studied the performance of different classifiers, and then compared our method to other existing text-based methods. We found that our method largely outperformed existing methods. We further compared our method to non-text-based detection techniques. According to the experiment results, our proposed method was more accurate.

References

  1. R. Aires, A. Manfrin, S. M. Aluísio, and D. Santos. Which Classification Algorithm Works Best with Stylistic Features of Portuguese in Order to Classify Web Texts According to Users' needs?. ICMC-USP, 2004.Google ScholarGoogle Scholar
  2. N. B. Amor, S. Benferhat, and Z. Elouedi. Naive bayes vs decision trees in intrusion detection systems. In Proceedings of the 2004 ACM symposium on Applied computing, pages 420--424. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.Google ScholarGoogle Scholar
  4. M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, and B. Wiswedel. Knime-the konstanz information miner: version 2.0 and beyond. AcM SIGKDD explorations Newsletter, 11(1):26--31, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Chen, J. Zhang, Y. Xiang, and W. Zhou. Asymmetric self-learning for tackling twitter spam drift. In 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pages 208--213. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  7. C. Chen, J. Zhang, Y. Xie, Y. Xiang, W. Zhou, M. M. Hassan, A. AlElaiwi, and M. Alrubaian. A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Transactions on Computational Social Systems, 2(3):65--76, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  8. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493--2537, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. G. Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1--15. Springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. N. Ghate and S. V. Dudul. Optimal mlp neural network classifier for fault detection of three phase induction motor. Expert Systems with Applications, 37(4):3468--3481, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Grier, K. Thomas, V. Paxson, and M. Zhang. @ spam: the underground on 140 characters or less. In Proceedings of the 17th ACM conference on Computer and communications security, pages 27--37. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56--65. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Jin, C. Lin, J. Luo, and J. Han. A data mining-based spam detection system for social media networks. Proceedings of the VLDB Endowment, 4(12):1458--1461, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, volume 14, pages 1188--1196, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  16. K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: social honeypots+ machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 435--442. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Lee and J. Kim. Warningbird: Detecting suspicious urls in twitter stream. In NDSS, volume 12, pages 1--13, 2012.Google ScholarGoogle Scholar
  18. A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18--22, 2002.Google ScholarGoogle Scholar
  19. S. Liu, J. Zhang, Y. Wang, and Y. Xiang. Fuzzy-based feature and instance recovery. In Asian Conference on Intelligent Information and Database Systems, pages 605--615. Springer, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Liu, J. Zhang, and Y. Xiang. Statistical detection of online drifting twitter spam: Invited paper. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pages 1--10. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Learning to detect malicious urls. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):30, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google ScholarGoogle Scholar
  23. J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang. An in-depth analysis of abuse on twitter. Trend Micro, 225, 2014.Google ScholarGoogle Scholar
  24. J. D. Rennie, L. Shih, J. Teevan, D. R. Karger, et al. Tackling the poor assumptions of naive bayes text classifiers. In ICML, volume 3, pages 616--623. Washington DC), 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Rybina. Sentiment analysis of contexts around query terms in documents. PhD thesis, MasterâĂŹs thesis, 2012.Google ScholarGoogle Scholar
  26. J. Song, S. Lee, and J. Kim. Spam filtering in twitter using sender-receiver relationship. In International Workshop on Recent Advances in Intrusion Detection, pages 301--317. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Stringhini, C. Kruegel, and G. Vigna. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference, pages 1--9. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Tang, F. Wei, B. Qin, T. Liu, and M. Zhou. Coooolll: A deep learning system for twitter sentiment classification. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 208--212, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  30. D. Urbansky, K. Muthmann, P. Katz, and S. Reichert. Tud palladian overview. TU Dresden, Department of Systems Engineering, Chair Computer Networks, IIR Group, 5, 2011.Google ScholarGoogle Scholar
  31. A. H. Wang. Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1--10. IEEE, 2010.Google ScholarGoogle Scholar
  32. D. Wang, S. B. Navathe, L. Liu, D. Irani, A. Tamersoy, and C. Pu. Click traffic analysis of short url spam on twitter. In Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), 2013 9th International Conference Conference on, pages 250--259. IEEE, 2013.Google ScholarGoogle Scholar
  33. C. Yang, R. Harkreader, and G. Gu. Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security, 8(8):1280--1293, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Twitter spam detection based on deep learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference
        January 2017
        615 pages
        ISBN:9781450347686
        DOI:10.1145/3014812

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 January 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ACSW '17 Paper Acceptance Rate78of156submissions,50%Overall Acceptance Rate204of424submissions,48%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader