Abstract
In recent years cyberattacks have become destructive and targeted. With technological advancements, diverse threats are launching in a sophisticated way that targets people to defraud them. Many web applications have been struggling to improve the reliability and security of their platforms to protect users from fraud, revenue, or malware. These attacks use malicious uniform resource locators (URLs) to attack web users. These URLs host unwanted content in the form of junk emails, phishing, or unauthorized drive-by downloads. Unsuspecting people click these phishing URLs and become victims of unethical anonymous activities like identity theft (personal or financial details) and installation of viruses. Therefore, it is necessary to detect malicious URLs accurately for resolving security issues. Traditional protection method, such as blacklisting, remains a classical technique for the detection of malicious URLs due to its simplicity but cannot detect unknown malicious URLs; hence, machine learning approaches are being used for achieving better results. This chapter aims to provide a structural understanding of popular feature extraction techniques and machine learning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RWMFIi?id=101738
Kim, D.: Potential risk analysis method for malware distribution networks. IEEE Access 7, 185157–185167 (2019)
Janet, B., Kumar, R.J.A., et al.: Malicious url detection: a comparative study. In: Proceedings of 2nd International Conference on Artificial Intelligence and Smart Systems (ICAIS’21), pp. 1147–1151 (2021)
OpenDNS, L.: Phishtank: An anti-phishing site (2016). https://www.phishtank.com
Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of 14th ACM Workshop on Recurring Malcode (WORM ’07), pp. 1–8 (2007)
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Learning to detect malicious urls. ACM Trans. Intell. Syst. Technol. 2, 1–24 (2011)
Felegyhazi, M., Kreibich, C., Paxson, V.: On the potential of proactive domain blacklisting. Large-Scale Exploits Emergent Threats 10, 6–6 (2010)
Sinha, S., Bailey, M., Jahanian, F.: Shades of grey: on the effectiveness of reputation-based “blacklists”. In: Proceedings of 3rd International Conference on Malicious and Unwanted Software (MALWARE’08), pp. 57–64 (2008)
Lu, G., Sadagopan, N., Krishnamachari, B., Goel, A.: Delay efficient sleep scheduling in wireless sensor networks. In: Proceedings of 24th Annual Joint Conference of The IEEE Computer and Communications Societies (INFOCOM’05), vol. 4, pp. 2470–2481 (2005)
Do Xuan, C., Nguyen, H.D., Nikolaevich, T.V., et al.: Malicious url detection based on machine learning. Int. J. Adv. Comput. Sci. Appl. 11 (2020)
Tsolas, I.E., Charles, V.: Incorporating risk into bank efficiency: a satisficing idea approach to assess the greek banking crisis. Expert Syst. Appl. 42, 3491–3500 (2015)
Jeeva, S.C., Rajsingh, E.B.: Intelligent phishing url detection using association rule mining. Human-Centric Comput. Inf. Sci. 6, 1–19 (2016)
Aung, E.S., Yamana, H.: Url-based phishing detection using the entropy of non-alphanumeric characters. In: Proceedings of 21st International Conference on Information Integration and Web-Based Applications and Services (IIWAS’19), pp. 385–392 (2019)
Tung, S.P., Wong, K.Y., Kuzminykh, I., Bakhshi, T., Ghita, B.: Using a machine learning model for malicious url type detection. In: Internet of Things, Smart Spaces, and Next Generation Networks and Systems, pp. 493–505 (2021)
Dong, H., Li, T., Ding, R., Sun, J.: A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl. Soft Comput. 65, 33–46 (2018)
Aung, E.S., Zan, C.T., Yamana, H.: A survey of url-based phishing detection. In: Proceedings of 11th Forum on Data Engineering and Information Management (DEIM’11), pp. G2–3 (2019)
Kumi, S., Lim, C., Lee, S.-G.: Malicious url detection based on associative classification. Entropy 23, 182 (2021)
Tan, G., Zhang, P., Liu, Q., Liu, X., Zhu, C., Dou, F.: Adaptive malicious url detection: learning in the presence of concept drifts. In: Proceedings of 17th IEEE International Conference on Trust, Security and Privacy In Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 737–743 (2018)
Srinivasan, S., Vinayakumar, R., Arunachalam, A., Alazab, M., Soman, K.: Durld: malicious url detection using deep learning-based character level representations. In: Malware Analysis using Artificial Intelligence and Deep Learning, pp. 535–554 (2021)
Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2vec: character-based distributed representations for social media. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), pp. 269–274 (2016)
Anderson, H.S., Woodbridge, J., Filar, B.: Deepdga: adversarially-tuned domain generation and detection. In: Proceedings of 9th ACM Workshop on Artificial Intelligence and Security (AISEC’16), pp. 13–21 (2016)
Kuzminykh, I., Shevchuk, D., Shiaeles, S., Ghita, B.: Audio interval retrieval using convolutional neural networks. In: Internet Of Things. Smart Spaces, And Next Generation Networks And Systems, pp. 229–240 (2020)
Johnson, C., Khadka, B., Basnet, R.B., Doleck, T.: Towards detecting and classifying malicious urls using deep learning. J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl. 11, 31–48 (2020)
Li, T., Kou, G., Peng, Y.: Improving malicious urls detection via feature engineering: linear and nonlinear space transformation methods. Inf. Syst. 91, 101494 (2020)
Vundavalli, V., Barsha, F., Masum, M., Shahriar, H., Haddad, H.: Malicious url detection using supervised machine learning techniques. In: Proceedings of 13th International Conference on Security of Information and Networks (SIN’13), pp. 1–6 (2020)
Urcuqui, C.: Malicious and Benign Websites Dataset. Accessed on: March, vol. 3 (2021)
Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. In: Proceedings of 2nd USENIX Conference on Web Application Development (WEBAPPS’11) (2011)
Mašetic, Z., Subasi, A., Azemovic, J.: Malicious web sites detection using c4. 5 decision tree. Southeast Eur. J. Soft Comput. 5(1) (2016)
Eshete, B., Villafiorita, A., Weldemariam, K., Zulkernine, M.: Einspect: evolution-guided analysis and detection of malicious web pages. In: Proceedings of 37th IEEE Annual Computer Software and Applications Conference (COMPSAC’13), pp. 375–380 (2013)
Chu, W., Zhu, B.B., Xue, F., Guan, X., Cai, Z.: Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing urls. In: Proceedings of 19th IEEE International Conference on communications (ICC’19), pp. 1990–1994 (2013)
Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of 20th International Conference on World Wide Web (WWW’11), pp. 197–206 (2011)
Bell, S., Komisarczuk, P.: An analysis of phishing blacklists: google safe browsing, openphish, and phishtank. In: Proceedings of 1st Australasian Computer Science Week Multiconference (ACSW’16), pp. 1–11 (2020)
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious urls. In: Proceedings of 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), pp. 1245–1254 (2009)
OpenPhish, P.I.: Openphish (2020)
Analytics, R.: Dns-bh-malware domain blocklist (2007). http://www.malwaredomains.com
Kolari, P., Finin, T., Joshi, A., et al.: Svms for the blogosphere: Blog identification and splog detection. In: Proceedings of AAAI Spring Symposium on Computational Approaches To Analysing Weblogs (CAAW’06) (2006)
Patil, D.R., Patil, J.B., et al.: Malicious urls detection using decision tree classifiers and majority voting technique. Cybern. Inf. Technol. 18, 11–29 (2018)
Hou, Y.-T., Chang, Y., Chen, T., Laih, C.-S., Chen, C.-M.: Malicious web content detection by machine learning. Expert Syst. Appl. 37, 55–60 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Swarnkar, M., Sharma, N., Kumar Thakkar, H. (2023). Malicious URL Detection Using Machine Learning. In: Thakkar, H.K., Swarnkar, M., Bhadoria, R.S. (eds) Predictive Data Security using AI. Studies in Computational Intelligence, vol 1065. Springer, Singapore. https://doi.org/10.1007/978-981-19-6290-5_11
Download citation
DOI: https://doi.org/10.1007/978-981-19-6290-5_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6289-9
Online ISBN: 978-981-19-6290-5
eBook Packages: EngineeringEngineering (R0)