Abstract
Nowadays, email spam problems continue growing drastically and many spam detection algorithms have been developed at the same time. However, there are several shortcomings shared by most of these algorithms. In order to solve these shortcomings, we present an advanced spam detection technique(ASDT). It is based on the extremum characteristic theory, Rabin fingerprint algorithm, modified Bayesian method and optimization theory. Then we designed several experiments to evaluate ASDT’s performance, including accuracy, speed and robustness, by comparing them with SFSPH, SFSPH-S, the famous DSC algorithm and the Email Remove-duplicate Algorithm Based on SHA-1(ERABS). Our extensive experiments demonstrated that ASDT has the best accuracy, speed and robustness on spam filtering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gyongyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2005) (2005)
Hayati, P., et al.: Definition of spam 2.0: New spamming boom. In: 2010 4th IEEE International Conference on Digital Ecosystems and Technologies (DEST). IEEE (2010)
Moniza, P., Asha, P.: An assortment of spam detection system. In: 2012 International Conference on Computing, Electronics and Electrical Technologies (ICCEET). IEEE (2012)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Whitworth, B., Whitworth, E.: Spam and the social-technical gap. Computer 37(10), 38–45 (2004)
Xu, Q., et al.: Sms spam detection using noncontent features.”. IEEE Intelligent Systems 27(6), 44–51 (2012)
Hidalgo, G., María, J., et al.: Content based SMS spam filtering. In: Proceedings of the 2006 ACM Symposium on Document Engineering. ACM (2006)
Resnick, P.: RFC 2822: Internet message format. IETF (Standards Track) Request for Comments 2822 (2001)
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3, 91–97 (2006)
Breitinger, F., Baier, H.: Performance issues about context-triggered piecewise hashing. In: Gladyshev, P., Rogers, M.K. (eds.) ICDF2C 2011. LNICST, vol. 88, pp. 141–155. Springer, Heidelberg (2012)
Broder, A.Z., et al.: Syntactic clustering of the web. Computer Networks and ISDN Systems 29(8), 1157–1166 (1997)
Kołcz, A., Chowdhury, A., Alspector, J.: Improved robustness of signature-based near-replica detection via lexicon randomization. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2004)
Zhang, M., Li, B.C., Chen, L.: Email Remove-duplicate Algorithm Based on SHA-1. Computer Engineering 11, 098 (2008)
Kołcz, A.: Lexicon randomization for near-duplicate detection with I-Match. The Journal of Supercomputing 45(3), 255–276 (2008)
Sun, J.Z., Ma, Y.Q., Li, Y.H.: Data Chunking Algorithm Based on Byte-fingerprint Extremum Characteristics. Computer Engineering 8, 26 (2010)
Zhong, Z., Li, K.: Speed Up Statistical Spam Filter by Approximation. IEEE Transactions on Computers 60(1), 120–134 (2011)
Rabin, M.O.: Fingerprinting by random polynomials. Center for Research in Computing Techn. Aiken Computation Laboratory, Univ. (1981)
Luo, Q., Qin, Y.-P., Wang, C.-L.: Anti-spam technology review. Journal of Bohai University (Natural Science Edition) 4 (2008)
Kosmopoulos, A., Paliouras, G., Androutsopoulos, I.: Adaptive spam filtering using only naive bayes text classifiers. In: Proceedings of the Fifth Conference on Email and Anti-Spam (CEAS) (2008)
Shao, J., Yan, X., Shao, S.: SNR of DNA sequences mapped by general affine transformations of the indicator sequences. Journal of Mathematical Biology 67(2), 433–451 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhu, J., Li, A. (2014). An Advanced Spam Detection Technique Based on Self-adaptive Piecewise Hash Algorithm. In: Han, W., Huang, Z., Hu, C., Zhang, H., Guo, L. (eds) Web Technologies and Applications. APWeb 2014. Lecture Notes in Computer Science, vol 8710. Springer, Cham. https://doi.org/10.1007/978-3-319-11119-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-11119-3_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11118-6
Online ISBN: 978-3-319-11119-3
eBook Packages: Computer ScienceComputer Science (R0)