research-article

Twitter spam detection based on deep learning

Authors:
Tingmin Wu

Deakin University, Burwood Hwy, Burwood, Australia

Deakin University, Burwood Hwy, Burwood, Australia
View Profile

,
Shigang Liu

Deakin University, Burwood Hwy, Burwood, Australia

Deakin University, Burwood Hwy, Burwood, Australia
View Profile

,
Jun Zhang

Deakin University, Burwood Hwy, Burwood, Australia

Deakin University, Burwood Hwy, Burwood, Australia
View Profile

,
Yang Xiang

Deakin University, Burwood Hwy, Burwood, Australia

Deakin University, Burwood Hwy, Burwood, Australia
View Profile

ACSW '17: Proceedings of the Australasian Computer Science Week MulticonferenceJanuary 2017Article No.: 3Pages 1–8https://doi.org/10.1145/3014812.3014815

Published:31 January 2017Publication History

ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference

Pages 1–8

ABSTRACT

Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning-based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 80%. However, due to the problems of spam drift and information fabrication, these machine-learning based methods cannot efficiently detect spam activities in real-life scenarios. Moreover, the blacklisting method cannot catch up with the variations of spamming activities as manually inspecting suspicious URLs is extremely time-consuming. In this paper, we proposed a novel technique based on deep learning techniques to address the above challenges. The syntax of each tweet will be learned through WordVector Training Mode. We then constructed a binary classifier based on the preceding representation dataset. In experiments, we collected and implemented a 10-day real Tweet datasets in order to evaluate our proposed method. We first studied the performance of different classifiers, and then compared our method to other existing text-based methods. We found that our method largely outperformed existing methods. We further compared our method to non-text-based detection techniques. According to the experiment results, our proposed method was more accurate.

References

R. Aires, A. Manfrin, S. M. Aluísio, and D. Santos. Which Classification Algorithm Works Best with Stylistic Features of Portuguese in Order to Classify Web Texts According to Users' needs?. ICMC-USP, 2004.Google Scholar
N. B. Amor, S. Benferhat, and Z. Elouedi. Naive bayes vs decision trees in intrusion detection systems. In Proceedings of the 2004 ACM symposium on Applied computing, pages 420--424. ACM, 2004. Google ScholarDigital Library
F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), volume 6, page 12, 2010.Google Scholar
M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, K. Thiel, and B. Wiswedel. Knime-the konstanz information miner: version 2.0 and beyond. AcM SIGKDD explorations Newsletter, 11(1):26--31, 2009. Google ScholarDigital Library
L. Breiman. Random forests. Machine learning, 45(1):5--32, 2001. Google ScholarDigital Library
C. Chen, J. Zhang, Y. Xiang, and W. Zhou. Asymmetric self-learning for tackling twitter spam drift. In 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pages 208--213. IEEE, 2015.Google ScholarCross Ref
C. Chen, J. Zhang, Y. Xie, Y. Xiang, W. Zhou, M. M. Hassan, A. AlElaiwi, and M. Alrubaian. A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Transactions on Computational Social Systems, 2(3):65--76, 2015.Google ScholarCross Ref
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493--2537, 2011. Google ScholarDigital Library
T. G. Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1--15. Springer, 2000. Google ScholarDigital Library
V. N. Ghate and S. V. Dudul. Optimal mlp neural network classifier for fault detection of three phase induction motor. Expert Systems with Applications, 37(4):3468--3481, 2010. Google ScholarDigital Library
C. Grier, K. Thomas, V. Paxson, and M. Zhang. @ spam: the underground on 140 characters or less. In Proceedings of the 17th ACM conference on Computer and communications security, pages 27--37. ACM, 2010. Google ScholarDigital Library
A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages 56--65. ACM, 2007. Google ScholarDigital Library
X. Jin, C. Lin, J. Luo, and J. Han. A data mining-based spam detection system for social media networks. Proceedings of the VLDB Endowment, 4(12):1458--1461, 2011.Google ScholarDigital Library
Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, volume 14, pages 1188--1196, 2014.Google ScholarDigital Library
Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436--444, 2015.Google ScholarCross Ref
K. Lee, J. Caverlee, and S. Webb. Uncovering social spammers: social honeypots+ machine learning. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 435--442. ACM, 2010. Google ScholarDigital Library
S. Lee and J. Kim. Warningbird: Detecting suspicious urls in twitter stream. In NDSS, volume 12, pages 1--13, 2012.Google Scholar
A. Liaw and M. Wiener. Classification and regression by randomforest. R news, 2(3):18--22, 2002.Google Scholar
S. Liu, J. Zhang, Y. Wang, and Y. Xiang. Fuzzy-based feature and instance recovery. In Asian Conference on Intelligent Information and Database Systems, pages 605--615. Springer, 2016.Google ScholarCross Ref
S. Liu, J. Zhang, and Y. Xiang. Statistical detection of online drifting twitter spam: Invited paper. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pages 1--10. ACM, 2016. Google ScholarDigital Library
J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Learning to detect malicious urls. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):30, 2011. Google ScholarDigital Library
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
J. Oliver, P. Pajares, C. Ke, C. Chen, and Y. Xiang. An in-depth analysis of abuse on twitter. Trend Micro, 225, 2014.Google Scholar
J. D. Rennie, L. Shih, J. Teevan, D. R. Karger, et al. Tackling the poor assumptions of naive bayes text classifiers. In ICML, volume 3, pages 616--623. Washington DC), 2003.Google ScholarDigital Library
K. Rybina. Sentiment analysis of contexts around query terms in documents. PhD thesis, MasterâĂ&Zacute;s thesis, 2012.Google Scholar
J. Song, S. Lee, and J. Kim. Spam filtering in twitter using sender-receiver relationship. In International Workshop on Recent Advances in Intrusion Detection, pages 301--317. Springer, 2011. Google ScholarDigital Library
G. Stringhini, C. Kruegel, and G. Vigna. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference, pages 1--9. ACM, 2010. Google ScholarDigital Library
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014. Google ScholarDigital Library
D. Tang, F. Wei, B. Qin, T. Liu, and M. Zhou. Coooolll: A deep learning system for twitter sentiment classification. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pages 208--212, 2014.Google ScholarCross Ref
D. Urbansky, K. Muthmann, P. Katz, and S. Reichert. Tud palladian overview. TU Dresden, Department of Systems Engineering, Chair Computer Networks, IIR Group, 5, 2011.Google Scholar
A. H. Wang. Don't follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, pages 1--10. IEEE, 2010.Google Scholar
D. Wang, S. B. Navathe, L. Liu, D. Irani, A. Tamersoy, and C. Pu. Click traffic analysis of short url spam on twitter. In Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom), 2013 9th International Conference Conference on, pages 250--259. IEEE, 2013.Google Scholar
C. Yang, R. Harkreader, and G. Gu. Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Transactions on Information Forensics and Security, 8(8):1280--1293, 2013. Google ScholarDigital Library

Index Terms

Twitter spam detection based on deep learning
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Social engineering attacks
      1. Phishing
      2. Spoofing attacks

Recommendations

Statistical Detection of Online Drifting Twitter Spam: Invited Paper
ASIA CCS '16: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security

Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of ...
Read More
A comprehensive survey on deep learning based malware detection techniques
Abstract
Recent theoretical and practical studies have revealed that malware is one of the most harmful threats to the digital world. Malware mitigation techniques have evolved over the years to ensure security. Earlier, several classical ...
Read More
Spam detection on twitter using traditional classifiers
ATC'11: Proceedings of the 8th international conference on Autonomic and trusted computing

Social networking sites have become very popular in recent years. Users use them to find new friends, updates their existing friends with their latest thoughts and activities. Among these sites, Twitter is the fastest growing site. Its popularity also ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference
January 2017
615 pages
ISBN:9781450347686
DOI:10.1145/3014812

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 January 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Twitter spam detection
deep learning
social network security
Qualifiers
- research-article
Conference

Acceptance Rates
ACSW '17 Paper Acceptance Rate78of156submissions,50%Overall Acceptance Rate204of424submissions,48%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 1,897
  Total Downloads
- Downloads (Last 12 months)59
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Twitter spam detection based on deep learning

ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Statistical Detection of Online Drifting Twitter Spam: Invited Paper

A comprehensive survey on deep learning based malware detection techniques

Spam detection on twitter using traditional classifiers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Twitter spam detection based on deep learning

ACSW '17: Proceedings of the Australasian Computer Science Week Multiconference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Statistical Detection of Online Drifting Twitter Spam: Invited Paper

A comprehensive survey on deep learning based malware detection techniques

Spam detection on twitter using traditional classifiers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media