Malicious URL Detection Using Machine Learning

Swarnkar, Mayank; Sharma, Neha; Kumar Thakkar, Hiren

doi:10.1007/978-981-19-6290-5_11

Mayank Swarnkar⁵,
Neha Sharma⁵ &
Hiren Kumar Thakkar⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1065))

534 Accesses
1 Citations

Abstract

In recent years cyberattacks have become destructive and targeted. With technological advancements, diverse threats are launching in a sophisticated way that targets people to defraud them. Many web applications have been struggling to improve the reliability and security of their platforms to protect users from fraud, revenue, or malware. These attacks use malicious uniform resource locators (URLs) to attack web users. These URLs host unwanted content in the form of junk emails, phishing, or unauthorized drive-by downloads. Unsuspecting people click these phishing URLs and become victims of unethical anonymous activities like identity theft (personal or financial details) and installation of viruses. Therefore, it is necessary to detect malicious URLs accurately for resolving security issues. Traditional protection method, such as blacklisting, remains a classical technique for the detection of malicious URLs due to its simplicity but cannot detect unknown malicious URLs; hence, machine learning approaches are being used for achieving better results. This chapter aims to provide a structural understanding of popular feature extraction techniques and machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RWMFIi?id=101738
Kim, D.: Potential risk analysis method for malware distribution networks. IEEE Access 7, 185157–185167 (2019)
Article Google Scholar
Janet, B., Kumar, R.J.A., et al.: Malicious url detection: a comparative study. In: Proceedings of 2nd International Conference on Artificial Intelligence and Smart Systems (ICAIS’21), pp. 1147–1151 (2021)
Google Scholar
OpenDNS, L.: Phishtank: An anti-phishing site (2016). https://www.phishtank.com
Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of 14th ACM Workshop on Recurring Malcode (WORM ’07), pp. 1–8 (2007)
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Learning to detect malicious urls. ACM Trans. Intell. Syst. Technol. 2, 1–24 (2011)
Google Scholar
Felegyhazi, M., Kreibich, C., Paxson, V.: On the potential of proactive domain blacklisting. Large-Scale Exploits Emergent Threats 10, 6–6 (2010)
Google Scholar
Sinha, S., Bailey, M., Jahanian, F.: Shades of grey: on the effectiveness of reputation-based “blacklists”. In: Proceedings of 3rd International Conference on Malicious and Unwanted Software (MALWARE’08), pp. 57–64 (2008)
Google Scholar
Lu, G., Sadagopan, N., Krishnamachari, B., Goel, A.: Delay efficient sleep scheduling in wireless sensor networks. In: Proceedings of 24th Annual Joint Conference of The IEEE Computer and Communications Societies (INFOCOM’05), vol. 4, pp. 2470–2481 (2005)
Google Scholar
Do Xuan, C., Nguyen, H.D., Nikolaevich, T.V., et al.: Malicious url detection based on machine learning. Int. J. Adv. Comput. Sci. Appl. 11 (2020)
Google Scholar
Tsolas, I.E., Charles, V.: Incorporating risk into bank efficiency: a satisficing idea approach to assess the greek banking crisis. Expert Syst. Appl. 42, 3491–3500 (2015)
Article Google Scholar
Jeeva, S.C., Rajsingh, E.B.: Intelligent phishing url detection using association rule mining. Human-Centric Comput. Inf. Sci. 6, 1–19 (2016)
Article Google Scholar
Aung, E.S., Yamana, H.: Url-based phishing detection using the entropy of non-alphanumeric characters. In: Proceedings of 21st International Conference on Information Integration and Web-Based Applications and Services (IIWAS’19), pp. 385–392 (2019)
Google Scholar
Tung, S.P., Wong, K.Y., Kuzminykh, I., Bakhshi, T., Ghita, B.: Using a machine learning model for malicious url type detection. In: Internet of Things, Smart Spaces, and Next Generation Networks and Systems, pp. 493–505 (2021)
Google Scholar
Dong, H., Li, T., Ding, R., Sun, J.: A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl. Soft Comput. 65, 33–46 (2018)
Article Google Scholar
Aung, E.S., Zan, C.T., Yamana, H.: A survey of url-based phishing detection. In: Proceedings of 11th Forum on Data Engineering and Information Management (DEIM’11), pp. G2–3 (2019)
Google Scholar
Kumi, S., Lim, C., Lee, S.-G.: Malicious url detection based on associative classification. Entropy 23, 182 (2021)
Article Google Scholar
Tan, G., Zhang, P., Liu, Q., Liu, X., Zhu, C., Dou, F.: Adaptive malicious url detection: learning in the presence of concept drifts. In: Proceedings of 17th IEEE International Conference on Trust, Security and Privacy In Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 737–743 (2018)
Google Scholar
Srinivasan, S., Vinayakumar, R., Arunachalam, A., Alazab, M., Soman, K.: Durld: malicious url detection using deep learning-based character level representations. In: Malware Analysis using Artificial Intelligence and Deep Learning, pp. 535–554 (2021)
Google Scholar
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2vec: character-based distributed representations for social media. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), pp. 269–274 (2016)
Google Scholar
Anderson, H.S., Woodbridge, J., Filar, B.: Deepdga: adversarially-tuned domain generation and detection. In: Proceedings of 9th ACM Workshop on Artificial Intelligence and Security (AISEC’16), pp. 13–21 (2016)
Google Scholar
Kuzminykh, I., Shevchuk, D., Shiaeles, S., Ghita, B.: Audio interval retrieval using convolutional neural networks. In: Internet Of Things. Smart Spaces, And Next Generation Networks And Systems, pp. 229–240 (2020)
Google Scholar
Johnson, C., Khadka, B., Basnet, R.B., Doleck, T.: Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl. 11, 31–48 (2020)
Google Scholar
Li, T., Kou, G., Peng, Y.: Improving malicious urls detection via feature engineering: linear and nonlinear space transformation methods. Inf. Syst. 91, 101494 (2020)
Article Google Scholar
Vundavalli, V., Barsha, F., Masum, M., Shahriar, H., Haddad, H.: Malicious url detection using supervised machine learning techniques. In: Proceedings of 13th International Conference on Security of Information and Networks (SIN’13), pp. 1–6 (2020)
Google Scholar
Urcuqui, C.: Malicious and Benign Websites Dataset. Accessed on: March, vol. 3 (2021)
Google Scholar
Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. In: Proceedings of 2nd USENIX Conference on Web Application Development (WEBAPPS’11) (2011)
Google Scholar
Mašetic, Z., Subasi, A., Azemovic, J.: Malicious web sites detection using c4. 5 decision tree. Southeast Eur. J. Soft Comput. 5(1) (2016)
Google Scholar
Eshete, B., Villafiorita, A., Weldemariam, K., Zulkernine, M.: Einspect: evolution-guided analysis and detection of malicious web pages. In: Proceedings of 37th IEEE Annual Computer Software and Applications Conference (COMPSAC’13), pp. 375–380 (2013)
Google Scholar
Chu, W., Zhu, B.B., Xue, F., Guan, X., Cai, Z.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing urls. In: Proceedings of 19th IEEE International Conference on communications (ICC’19), pp. 1990–1994 (2013)
Google Scholar
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of 20th International Conference on World Wide Web (WWW’11), pp. 197–206 (2011)
Google Scholar
Bell, S., Komisarczuk, P.: An analysis of phishing blacklists: google safe browsing, openphish, and phishtank. In: Proceedings of 1st Australasian Computer Science Week Multiconference (ACSW’16), pp. 1–11 (2020)
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious urls. In: Proceedings of 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 1245–1254 (2009)
Google Scholar
https://www.alexa.com
https://chato.cl/webspam/datasets/
OpenPhish, P.I.: Openphish (2020)
Google Scholar
Analytics, R.: Dns-bh-malware domain blocklist (2007). http://www.malwaredomains.com
Kolari, P., Finin, T., Joshi, A., et al.: Svms for the blogosphere: Blog identification and splog detection. In: Proceedings of AAAI Spring Symposium on Computational Approaches To Analysing Weblogs (CAAW’06) (2006)
Google Scholar
Patil, D.R., Patil, J.B., et al.: Malicious urls detection using decision tree classifiers and majority voting technique. Cybern. Inf. Technol. 18, 11–29 (2018)
Google Scholar
Hou, Y.-T., Chang, Y., Chen, T., Laih, C.-S., Chen, C.-M.: Malicious web content detection by machine learning. Expert Syst. Appl. 37, 55–60 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology (IIT -BHU), Varanasi, India
Mayank Swarnkar & Neha Sharma
Pandit Deendayal Energy University, Gandhinagar, Gujarat, India
Hiren Kumar Thakkar

Authors

Mayank Swarnkar
View author publications
You can also search for this author in PubMed Google Scholar
Neha Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Hiren Kumar Thakkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mayank Swarnkar .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pandit Deendayal Energy University, Gandhinagar, Gujarat, India
Hiren Kumar Thakkar
Department of Computer Science and Engineering, Indian Institute of Technology BHU, Varanasi, Uttar Pradesh, India
Mayank Swarnkar
Department of Computer Engineering and Applications, GLA University, Mathura, Uttar Pradesh, India
Robin Singh Bhadoria

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Swarnkar, M., Sharma, N., Kumar Thakkar, H. (2023). Malicious URL Detection Using Machine Learning. In: Thakkar, H.K., Swarnkar, M., Bhadoria, R.S. (eds) Predictive Data Security using AI. Studies in Computational Intelligence, vol 1065. Springer, Singapore. https://doi.org/10.1007/978-981-19-6290-5_11

Download citation

DOI: https://doi.org/10.1007/978-981-19-6290-5_11
Published: 02 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6289-9
Online ISBN: 978-981-19-6290-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics