Skip to main content
Log in

A Collaborative Abstraction Based Email Spam Filtering with Fingerprints

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Spam detection in emails tends to be an endless research interest among many researchers and academicians. Even though email communication has become a major role in day to day activities, the increasing volumes of threats towards spam emails has paved the way for numerous email spam detection techniques. Many spam filtering methods including data mining and machine learning techniques are adopted by researchers; yet a complete accurate filtering model is an expected solution to cope up with the intentional spam attacks. This paper proposes one such model that uses a hybrid approach towards efficient spam detection. A collaborative spam filtering framework using abstraction of the entire email layout and the fingerprints of the layout is proposed to match and catch the sprouting nature of spam. Collaborative framework uses recommendations from other users to create spam database. Any incoming mail is checked against the spam database for spam or ham classification using near duplicate similarity matching scheme. To reduce false positive and false negative ratio in spam classification, we calculate cumulative weights from both email layouts and fingerprints. Fingerprint signatures of newly classified spam are progressively updated to the spam database for up-to-date spam detection. The system is evaluated with Spam Assassin dataset and the results are proven for a comparatively better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the public corpus of spamassassin.apache.org.

References

  1. Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. M. (2019). Machine learning for email spam filtering: Review, approaches and open research problems. Heliyon. https://doi.org/10.1016/j.heliyon.2019.e01802.

  2. Radovanovic, D., & Krstajic, B. (2018). Review spam detection using machine learning. In 23rd International Scientific-Professional Conference on Information Technology (IT), 1–4, https://doi.org/10.1109/SPIT.2018.8350457.

  3. Liu, P., & Moh, T. (2016). Content based spam e-mail filtering. In International Conference on Collaboration Technologies and Systems (CTS), 218–224, https://doi.org/10.1109/CTS.2016.0052.

  4. Sokolov, M., Olufowobi, K., and Herndon, N. (2020). Visual spoofing in content-based spam detection. In 13th International Conference on Security of Information and Networks (SIN 2020). Association for Computing Machinery, 1–5. https://doi.org/10.1145/3433174.3433605.

  5. Shyry, P., & Jinila, B. (2021). Detection and prevention of spam mail with semantics-based text classification of collaborative and content filtering. Journal of Physics: Conference Series., 1770, 012031. https://doi.org/10.1088/1742-6596/1770/1/012031

    Article  Google Scholar 

  6. Wang, S., Zhang, X., Cheng, Y., Jiang, F., Yu, W., & Peng, J. (2018). A fast content- based spam filtering algorithm with fuzzy-SVM and K-means. IEEE International Conference on Big Data and Smart Computing (BigComp). https://doi.org/10.1109/BigComp.2018.00051.

  7. Anitha, P. U. & Rao, C. V. G. & Babu, S. (2017). Email spam classification using neighbor probability based Naïve Bayes algorithm. In 7th International Conference on Communication Systems and Network Technologies (CSNT), 350–355.https://doi.org/10.1109/CSNT.2017.8418565

  8. Ma, T.M., Yamamori, K., & Thida, A. (2020). A comparative approach to Naïve Bayes classifier and support vector machine for email spam classification. In IEEE 9th Global Conference on Consumer Electronics (GCCE), 324–326, https://doi.org/10.1109/GCCE50665.2020.9291921

  9. Peng, W., Huang, L., Jia, J., & Ingram, E. (2018). Enhancing the Naive Bayes spam filter through intelligent text modification detection. In 17th IEEE International Conference on Trust, Security and Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). 849–854, https://doi.org/10.1109/TrustCom/BigDataSE.2018.00122.

  10. Gupta, P., Dubey, R. K., Dr. Mishra, S. (2019). Detecting Spam emails/sms using naive bayes and support vector machine. International Journal of Scientific & Technology Research, 8(11)

  11. Samsudin, N., Foozy, M., Feresa, C., Alias, N., Shamala, P., Othman, N., Din, W., & Sofiah, W. I. (2019). Youtube spam detection framework using naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science., 14, 1508–1517.

    Article  Google Scholar 

  12. Santoshi, K.U., Bhavya,S.S., Sri, Y.B., & Venkateswarlu, B. (2021). Twitter spam detection using naïve bayes classifier. In 6th International Conference on Inventive Computation Technologies (ICICT), 773–777. https://doi.org/10.1109/ICICT50816.2021.9358579.

  13. Ahmad, S. B. S., Rafie, M., & Ghorabie, S. M. (2021). Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions. Multimedia Tools and Applications, 80, 11583–11605. https://doi.org/10.1007/s11042-020-10405-7.

  14. Mishra, S., & Malathi, D. (2017). Behaviour analysis of SVM based spam filtering using various parameter values and accuracy comparison. International Conference on Computing Methodologies and Communication (ICCMC), 2017, 27–31. https://doi.org/10.1109/ICCMC.2017.8282698

    Article  Google Scholar 

  15. Mahdi, W., Aziz, Q., Manel, M., & Florence, S. (2017). A topic-based hidden Markov model for real-time spam tweets filtering. Procedia Computer Science, 112, 833–843. https://doi.org/10.1016/j.procs.2017.08.075

    Article  Google Scholar 

  16. El-Mawass, N., Honeine, P., & Vercouter, L. (2020). SimilCatch: Enhanced social spammers detection on Twitter using Markov Random Fields. Information Processing & Management. https://doi.org/10.1016/j.ipm.2020.102317

    Article  Google Scholar 

  17. Wang, Z., Hu, R., Chen, Q., Gao, P., & Xu, X. (2020). ColluEagle: Collusive review spammer detection using Markov random fields. Data Mining and Knowledge Discovery., 34, 1621–1641. https://doi.org/10.1007/s10618-020-00693-w

    Article  Google Scholar 

  18. Dedeturk, B. K., & Akay, B. (2020). Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Applied Soft Computing. https://doi.org/10.1016/j.asoc.2020.106229

    Article  Google Scholar 

  19. Wijaya, A., & Bisri, A. (2016). Hybrid decision tree and logistic regression classifier for email spam detection. In 8th International Conference on Information Technology and Electrical Engineering (ICITEE), 1–4. https://doi.org/10.1109/ICITEED.2016.7863267

  20. Madisetty, S., & Desarkar, M. S. (2018). A neural network-based ensemble approach for spam detection in twitter. IEEE Transactions on Computational Social Systems, 5(4), 973–984. https://doi.org/10.1109/TCSS.2018.2878852

    Article  Google Scholar 

  21. Sharmin, T., Di Troia, F., Potika, K., & Stamp, M. (2020). Convolutional neural networks for image spam detection. Information Security Journal: A Global Perspective, 29(3), 103–117. https://doi.org/10.1080/19393555.2020.1722867

    Article  Google Scholar 

  22. AlMahmoud, A., Damiani, E., Otrok, H., & Al-Hammadi, Y. (2019). Spamdoop: A privacy-preserving big data platform for collaborative spam detection. IEEE Transactions on Big Data, 5(3), 293–304. https://doi.org/10.1109/TBDATA.2017.2716409.

  23. Azad, M. A., Bag, S., Tabassum, S., & Hao, F. (2020). Privy: Privacy preserving collaboration across multiple service providers to combat telecom spams. IEEE Transactions on Emerging Topics in Computing, 8(2), 313–327.

    Article  Google Scholar 

  24. Balika, J., & Chelliah., Anand, Sasidharan., Dharmesh, Kumar, Singh., & Nilesh, Dangi. (2021). Collaborative and early detection of email spam using multitask learning. International Journal of Performability Engineering, 17(6), 528–535.

  25. Chen, M., Sung, P., & Tseng, C. (2011). Cosdes: A collaborative spam detection system with a novel E-Mail abstraction scheme. IEEE Transactions on Knowledge & Data Engineering, 23(5), 669–682. https://doi.org/10.1109/TKDE.2010.147

  26. Guo, Z., Shen, Yu., Bashir, A., Imran, M., Kumar, N., Zhang, Di., & Yu, K. (2020). Robust spammer detection using collaborative neural network in internet of thing applications. IEEE Internet of Things Journal, 8(12), 9549–9558. https://doi.org/10.1109/JIOT.2020.3003802

    Article  Google Scholar 

  27. Shi, W., & Xie, M. (2013). A reputation-based collaborative approach for spam filtering. AASRI Procedia, 5, 220–227. https://doi.org/10.1016/j.aasri.2013.10.082

  28. Sousa, P., Machado, A., Rocha, M., Cortez, P., & Rio, M. (2010). A collaborative approach for spam detection. 2nd international conference on evolving internet, 92–97, https://doi.org/10.1109/INTERNET.2010.25

  29. Hau, X., Pham, L., Nam-Hee, J. J., & Sadeghi-Niaraki, A. (2011). Collaborative spam filtering based on incremental ontology learning. Telecommunication Systems - TELSYS. https://doi.org/10.1007/s11235-011-9513-5

  30. Damiani, E., Vimercati, S., Paraboschi, S., & Samarati, P. (2004). P2P-based collaborative spam detection and filtering. In 4th International Conference on Peer-to-Peer Computing, 176–183. https://doi.org/10.1109/PTP.2004.1334945

  31. Koggalahewa, D. N., Xu, Y., & Ernest, F. (2020). Spam detection in social networks based on peer acceptance. In Proceedings of the Australasian Computer Science Week Multiconference (ACSW '20). Association for Computing Machinery, 1–7. https://doi.org/10.1145/3373017.3373025

  32. Pera, M., & Ng, Y.-K. (2007). Using word similarity to eradicate junk emails. International Conference on Information and Knowledge Management. https://doi.org/10.1145/1321440.1321581

    Article  Google Scholar 

  33. Moniza, P., & Asha, P. (2012). An assortment of spam detection system. In International Conference on Computing, Electronics and Electrical Technologies (ICCEET), 860–867, https://doi.org/10.1109/ICCEET.2012.6203823

  34. Ho, P.-T., & Kim, S.-R. (2014). Fingerprint-based near-duplicate document detection with applications to SNS spam detection. International Journal of Distributed Sensor Networks. https://doi.org/10.1155/2014/612970

    Article  Google Scholar 

  35. Jaiswal, S., Patel, S., Singh, & Ravi. (2016). Privacy preserving spam email filtering based on somewhat homomorphic using functional encryption. https://doi.org/10.1007/978-81-322-2695-6_49.

  36. Gopi, S., & Ketan, K. (2019). Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update, Expert Systems with Applications.

  37. Henke, M., Santos, E., Souto, E., & Santin, A. O. (2021). Spam detection based on feature evolution to deal with concept drift. JUCS - Journal of Universal Computer Science, 27(4), 364–386. https://doi.org/10.3897/jucs.66284.

  38. Luo, GuangJun, Shah, N., Khan, H. U., & Haq, A. U. (2020). Spam detection approach for secure mobile message communication using machine learning algorithms. Security and Communication Networks. https://doi.org/10.1155/2020/8873639.

  39. Ma, J., Zhang, Y., Liu, J., Yu, K., & Wang, X. (2016). Intelligent SMS spam filtering using topic model. International Conference on Intelligent Networking and Collaborative Systems (INCoS). https://doi.org/10.1109/INCoS.2016.47

    Article  Google Scholar 

  40. El Kouari, O., Benaboud, H., & Lazaar, S. (2020). Using machine learning to deal with Phishing and spam detection: An overview. In Proceedings of the 3rd International Conference on Networking, Information Systems & Security (NISS2020). Association for Computing Machinery, 1–7. https://doi.org/10.1145/3386723.3387891

  41. Yeganeh & Mehdi (2012). A Model for fuzzy logic based machine learning approach for spam filtering. IOSR Journal of Computer Engineering. https://doi.org/10.9790/0661-0450710.

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

All the authors made substantial contribution to the conception of the work.

Corresponding author

Correspondence to P. Rajendran.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article. The authors did not receive support from any organization for the submitted work.

Code availability

(software application or custom code).

The algorithm of the proposed work is included in this article itself.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajendran, P., Tamilarasi, A. & Mynavathi, R. A Collaborative Abstraction Based Email Spam Filtering with Fingerprints. Wireless Pers Commun 123, 1913–1923 (2022). https://doi.org/10.1007/s11277-021-09221-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-021-09221-5

Keywords

Navigation