An Advanced Spam Detection Technique Based on Self-adaptive Piecewise Hash Algorithm

Zhu, Junxing; Li, Aiping

doi:10.1007/978-3-319-11119-3_14

Junxing Zhu²⁰ &
Aiping Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8710))

Included in the following conference series:

Asia-Pacific Web Conference

1672 Accesses

Abstract

Nowadays, email spam problems continue growing drastically and many spam detection algorithms have been developed at the same time. However, there are several shortcomings shared by most of these algorithms. In order to solve these shortcomings, we present an advanced spam detection technique(ASDT). It is based on the extremum characteristic theory, Rabin fingerprint algorithm, modified Bayesian method and optimization theory. Then we designed several experiments to evaluate ASDT’s performance, including accuracy, speed and robustness, by comparing them with SFSPH, SFSPH-S, the famous DSC algorithm and the Email Remove-duplicate Algorithm Based on SHA-1(ERABS). Our extensive experiments demonstrated that ASDT has the best accuracy, speed and robustness on spam filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2005) (2005)
Google Scholar
Hayati, P., et al.: Definition of spam 2.0: New spamming boom. In: 2010 4th IEEE International Conference on Digital Ecosystems and Technologies (DEST). IEEE (2010)
Google Scholar
Moniza, P., Asha, P.: An assortment of spam detection system. In: 2012 International Conference on Computing, Electronics and Electrical Technologies (ICCEET). IEEE (2012)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Article Google Scholar
Whitworth, B., Whitworth, E.: Spam and the social-technical gap. Computer 37(10), 38–45 (2004)
Article Google Scholar
Xu, Q., et al.: Sms spam detection using noncontent features.”. IEEE Intelligent Systems 27(6), 44–51 (2012)
Article Google Scholar
Hidalgo, G., María, J., et al.: Content based SMS spam filtering. In: Proceedings of the 2006 ACM Symposium on Document Engineering. ACM (2006)
Google Scholar
Resnick, P.: RFC 2822: Internet message format. IETF (Standards Track) Request for Comments 2822 (2001)
Google Scholar
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3, 91–97 (2006)
Article Google Scholar
Breitinger, F., Baier, H.: Performance issues about context-triggered piecewise hashing. In: Gladyshev, P., Rogers, M.K. (eds.) ICDF2C 2011. LNICST, vol. 88, pp. 141–155. Springer, Heidelberg (2012)
Chapter Google Scholar
Broder, A.Z., et al.: Syntactic clustering of the web. Computer Networks and ISDN Systems 29(8), 1157–1166 (1997)
Article Google Scholar
Kołcz, A., Chowdhury, A., Alspector, J.: Improved robustness of signature-based near-replica detection via lexicon randomization. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2004)
Google Scholar
Zhang, M., Li, B.C., Chen, L.: Email Remove-duplicate Algorithm Based on SHA-1. Computer Engineering 11, 098 (2008)
Google Scholar
Kołcz, A.: Lexicon randomization for near-duplicate detection with I-Match. The Journal of Supercomputing 45(3), 255–276 (2008)
Article Google Scholar
Sun, J.Z., Ma, Y.Q., Li, Y.H.: Data Chunking Algorithm Based on Byte-fingerprint Extremum Characteristics. Computer Engineering 8, 26 (2010)
Google Scholar
Zhong, Z., Li, K.: Speed Up Statistical Spam Filter by Approximation. IEEE Transactions on Computers 60(1), 120–134 (2011)
Article MathSciNet Google Scholar
Rabin, M.O.: Fingerprinting by random polynomials. Center for Research in Computing Techn. Aiken Computation Laboratory, Univ. (1981)
Google Scholar
Luo, Q., Qin, Y.-P., Wang, C.-L.: Anti-spam technology review. Journal of Bohai University (Natural Science Edition) 4 (2008)
Google Scholar
Kosmopoulos, A., Paliouras, G., Androutsopoulos, I.: Adaptive spam filtering using only naive bayes text classifiers. In: Proceedings of the Fifth Conference on Email and Anti-Spam (CEAS) (2008)
Google Scholar
Shao, J., Yan, X., Shao, S.: SNR of DNA sequences mapped by general affine transformations of the indicator sequences. Journal of Mathematical Biology 67(2), 433–451 (2013)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, Changsha, 410073, China
Junxing Zhu & Aiping Li

Authors

Junxing Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Aiping Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, National University of Defense Technology, 410073, Changsha, China
Weihong Han
School of Information Technology & Electrical Engineering, University of Queensland, 4107, Brisbane, QLD, Australia
Zi Huang
School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083, Beijing, China
Changjun Hu
School of Computer Science and Technology, Harbin Institute of Technology, 150006, Harbin, China
Hongli Zhang
Institute of Information Engineering, Chinese Academy of Sciences, 100864, Beijing, China
Li Guo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, J., Li, A. (2014). An Advanced Spam Detection Technique Based on Self-adaptive Piecewise Hash Algorithm. In: Han, W., Huang, Z., Hu, C., Zhang, H., Guo, L. (eds) Web Technologies and Applications. APWeb 2014. Lecture Notes in Computer Science, vol 8710. Springer, Cham. https://doi.org/10.1007/978-3-319-11119-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-11119-3_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11118-6
Online ISBN: 978-3-319-11119-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics