Skip to main content

Malicious URL Detection Using Machine Learning

  • Chapter
  • First Online:
Predictive Data Security using AI

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1065))

Abstract

In recent years cyberattacks have become destructive and targeted. With technological advancements, diverse threats are launching in a sophisticated way that targets people to defraud them. Many web applications have been struggling to improve the reliability and security of their platforms to protect users from fraud, revenue, or malware. These attacks use malicious uniform resource locators (URLs) to attack web users. These URLs host unwanted content in the form of junk emails, phishing, or unauthorized drive-by downloads. Unsuspecting people click these phishing URLs and become victims of unethical anonymous activities like identity theft (personal or financial details) and installation of viruses. Therefore, it is necessary to detect malicious URLs accurately for resolving security issues. Traditional protection method, such as blacklisting, remains a classical technique for the detection of malicious URLs due to its simplicity but cannot detect unknown malicious URLs; hence, machine learning approaches are being used for achieving better results. This chapter aims to provide a structural understanding of popular feature extraction techniques and machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RWMFIi?id=101738

  2. Kim, D.: Potential risk analysis method for malware distribution networks. IEEE Access 7, 185157–185167 (2019)

    Article  Google Scholar 

  3. Janet, B., Kumar, R.J.A., et al.: Malicious url detection: a comparative study. In: Proceedings of 2nd International Conference on Artificial Intelligence and Smart Systems (ICAIS’21), pp. 1147–1151 (2021)

    Google Scholar 

  4. OpenDNS, L.: Phishtank: An anti-phishing site (2016). https://www.phishtank.com

  5. Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of 14th ACM Workshop on Recurring Malcode (WORM ’07), pp. 1–8 (2007)

    Google Scholar 

  6. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Learning to detect malicious urls. ACM Trans. Intell. Syst. Technol. 2, 1–24 (2011)

    Google Scholar 

  7. Felegyhazi, M., Kreibich, C., Paxson, V.: On the potential of proactive domain blacklisting. Large-Scale Exploits Emergent Threats 10, 6–6 (2010)

    Google Scholar 

  8. Sinha, S., Bailey, M., Jahanian, F.: Shades of grey: on the effectiveness of reputation-based “blacklists”. In: Proceedings of 3rd International Conference on Malicious and Unwanted Software (MALWARE’08), pp. 57–64 (2008)

    Google Scholar 

  9. Lu, G., Sadagopan, N., Krishnamachari, B., Goel, A.: Delay efficient sleep scheduling in wireless sensor networks. In: Proceedings of 24th Annual Joint Conference of The IEEE Computer and Communications Societies (INFOCOM’05), vol. 4, pp. 2470–2481 (2005)

    Google Scholar 

  10. Do Xuan, C., Nguyen, H.D., Nikolaevich, T.V., et al.: Malicious url detection based on machine learning. Int. J. Adv. Comput. Sci. Appl. 11 (2020)

    Google Scholar 

  11. Tsolas, I.E., Charles, V.: Incorporating risk into bank efficiency: a satisficing idea approach to assess the greek banking crisis. Expert Syst. Appl. 42, 3491–3500 (2015)

    Article  Google Scholar 

  12. Jeeva, S.C., Rajsingh, E.B.: Intelligent phishing url detection using association rule mining. Human-Centric Comput. Inf. Sci. 6, 1–19 (2016)

    Article  Google Scholar 

  13. Aung, E.S., Yamana, H.: Url-based phishing detection using the entropy of non-alphanumeric characters. In: Proceedings of 21st International Conference on Information Integration and Web-Based Applications and Services (IIWAS’19), pp. 385–392 (2019)

    Google Scholar 

  14. Tung, S.P., Wong, K.Y., Kuzminykh, I., Bakhshi, T., Ghita, B.: Using a machine learning model for malicious url type detection. In: Internet of Things, Smart Spaces, and Next Generation Networks and Systems, pp. 493–505 (2021)

    Google Scholar 

  15. Dong, H., Li, T., Ding, R., Sun, J.: A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl. Soft Comput. 65, 33–46 (2018)

    Article  Google Scholar 

  16. Aung, E.S., Zan, C.T., Yamana, H.: A survey of url-based phishing detection. In: Proceedings of 11th Forum on Data Engineering and Information Management (DEIM’11), pp. G2–3 (2019)

    Google Scholar 

  17. Kumi, S., Lim, C., Lee, S.-G.: Malicious url detection based on associative classification. Entropy 23, 182 (2021)

    Article  Google Scholar 

  18. Tan, G., Zhang, P., Liu, Q., Liu, X., Zhu, C., Dou, F.: Adaptive malicious url detection: learning in the presence of concept drifts. In: Proceedings of 17th IEEE International Conference on Trust, Security and Privacy In Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 737–743 (2018)

    Google Scholar 

  19. Srinivasan, S., Vinayakumar, R., Arunachalam, A., Alazab, M., Soman, K.: Durld: malicious url detection using deep learning-based character level representations. In: Malware Analysis using Artificial Intelligence and Deep Learning, pp. 535–554 (2021)

    Google Scholar 

  20. Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2vec: character-based distributed representations for social media. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), pp. 269–274 (2016)

    Google Scholar 

  21. Anderson, H.S., Woodbridge, J., Filar, B.: Deepdga: adversarially-tuned domain generation and detection. In: Proceedings of 9th ACM Workshop on Artificial Intelligence and Security (AISEC’16), pp. 13–21 (2016)

    Google Scholar 

  22. Kuzminykh, I., Shevchuk, D., Shiaeles, S., Ghita, B.: Audio interval retrieval using convolutional neural networks. In: Internet Of Things. Smart Spaces, And Next Generation Networks And Systems, pp. 229–240 (2020)

    Google Scholar 

  23. Johnson, C., Khadka, B., Basnet, R.B., Doleck, T.: Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl. 11, 31–48 (2020)

    Google Scholar 

  24. Li, T., Kou, G., Peng, Y.: Improving malicious urls detection via feature engineering: linear and nonlinear space transformation methods. Inf. Syst. 91, 101494 (2020)

    Article  Google Scholar 

  25. Vundavalli, V., Barsha, F., Masum, M., Shahriar, H., Haddad, H.: Malicious url detection using supervised machine learning techniques. In: Proceedings of 13th International Conference on Security of Information and Networks (SIN’13), pp. 1–6 (2020)

    Google Scholar 

  26. Urcuqui, C.: Malicious and Benign Websites Dataset. Accessed on: March, vol. 3 (2021)

    Google Scholar 

  27. Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. In: Proceedings of 2nd USENIX Conference on Web Application Development (WEBAPPS’11) (2011)

    Google Scholar 

  28. Mašetic, Z., Subasi, A., Azemovic, J.: Malicious web sites detection using c4. 5 decision tree. Southeast Eur. J. Soft Comput. 5(1) (2016)

    Google Scholar 

  29. Eshete, B., Villafiorita, A., Weldemariam, K., Zulkernine, M.: Einspect: evolution-guided analysis and detection of malicious web pages. In: Proceedings of 37th IEEE Annual Computer Software and Applications Conference (COMPSAC’13), pp. 375–380 (2013)

    Google Scholar 

  30. Chu, W., Zhu, B.B., Xue, F., Guan, X., Cai, Z.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing urls. In: Proceedings of 19th IEEE International Conference on communications (ICC’19), pp. 1990–1994 (2013)

    Google Scholar 

  31. Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of 20th International Conference on World Wide Web (WWW’11), pp. 197–206 (2011)

    Google Scholar 

  32. Bell, S., Komisarczuk, P.: An analysis of phishing blacklists: google safe browsing, openphish, and phishtank. In: Proceedings of 1st Australasian Computer Science Week Multiconference (ACSW’16), pp. 1–11 (2020)

    Google Scholar 

  33. Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious urls. In: Proceedings of 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 1245–1254 (2009)

    Google Scholar 

  34. https://www.alexa.com

  35. https://chato.cl/webspam/datasets/

  36. OpenPhish, P.I.: Openphish (2020)

    Google Scholar 

  37. Analytics, R.: Dns-bh-malware domain blocklist (2007). http://www.malwaredomains.com

  38. Kolari, P., Finin, T., Joshi, A., et al.: Svms for the blogosphere: Blog identification and splog detection. In: Proceedings of AAAI Spring Symposium on Computational Approaches To Analysing Weblogs (CAAW’06) (2006)

    Google Scholar 

  39. Patil, D.R., Patil, J.B., et al.: Malicious urls detection using decision tree classifiers and majority voting technique. Cybern. Inf. Technol. 18, 11–29 (2018)

    Google Scholar 

  40. Hou, Y.-T., Chang, Y., Chen, T., Laih, C.-S., Chen, C.-M.: Malicious web content detection by machine learning. Expert Syst. Appl. 37, 55–60 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mayank Swarnkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Swarnkar, M., Sharma, N., Kumar Thakkar, H. (2023). Malicious URL Detection Using Machine Learning. In: Thakkar, H.K., Swarnkar, M., Bhadoria, R.S. (eds) Predictive Data Security using AI. Studies in Computational Intelligence, vol 1065. Springer, Singapore. https://doi.org/10.1007/978-981-19-6290-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-6290-5_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-6289-9

  • Online ISBN: 978-981-19-6290-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics