ABSTRACT
Machine learning (ML) classifiers have been widely deployed to detect Android malware, but at the same time the application of ML classifiers also faces an emerging problem. The performance of such classifiers degrades---or called ages---significantly over time given the malware evolution. Prior works have proposed to use retraining or active learning to reverse and improve aged models. However, the underlying classifier itself is still blind, unaware of malware evolution. Unsurprisingly, such evolution-insensitive retraining or active learning comes at a price, i.e., the labeling of tens of thousands of malware samples and the cost of significant human efforts. In this paper, we propose the first framework, called APIGraph, to enhance state-of-the-art malware classifiers with the similarity information among evolved Android malware in terms of semantically-equivalent or similar API usages, thus naturally slowing down classifier aging. Our evaluation shows that because of the slow-down of classifier aging, APIGraph saves significant amounts of human efforts required by active learning in labeling new malware samples.
Supplemental Material
- Yousra Aafer, Wenliang Du, and Heng Yin. 2013. DroidAPIMiner: Mining API-level Features for Robust Malware Detection in Android. In Proceedings of the International Conference on Security and Privacy in Communication Systems (SecureComm). Springer, 86--103.Google ScholarCross Ref
- Apktool. 2019. A Tool for Reverse Engineering Android APK Files . https://ibotpeaches.github.io/Apktool/.Google Scholar
- Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket.. In Proceedings of the Network and Distributed System Security Symposium (NDSS) . 23--26.Google ScholarCross Ref
- Kathy Wain Yee Au, Yi Fan Zhou, Zhen Huang, and David Lie. 2012. PScout: Analyzing the Android Permission Specification. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS). ACM, 217--228.Google ScholarDigital Library
- Michael Backes, Sven Bugiel, Erik Derr, Patrick McDaniel, Damien Octeau, and Sebastian Weisgerber. 2016. On Demystifying the Android Application Framework: Re-visiting Android Permission Specification Analysis. In Proceedings of the 25th USENIX Security Symposium (USENIX Security). 1101--1118.Google Scholar
- Paulo Barros, René Just, Suzanne Millstein, Paul Vines, Werner Dietl, Michael D Ernst, et almbox. 2015. Static Analysis of Implicit Control Flow: Resolving Java Reflection and Android Intents. In Proceeding of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 669--679.Google ScholarDigital Library
- Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD). 1247--1250.Google ScholarDigital Library
- Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the 26th Advances in Neural Information Processing Systems (NIPS) . 2787--2795.Google Scholar
- Haipeng Cai. 2020. Assessing and Improving Malware Detection Sustainability through App Evolution Studies. ACM Transactions on Software Engineering and Methodology (TOSEM) , Vol. 29, 2 (2020), 28.Google ScholarDigital Library
- Kai Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, and Peng Liu. 2015. Finding Unknown Malice in 10 seconds: Mass Vetting for New Threats at the Google-play Scale. In Proceedings of 24th USENIX Security Symposium (USENIX Security). 659--674.Google Scholar
- Lingwei Chen, Shifu Hou, Yanfang Ye, and Shouhuai Xu. 2018. DroidEye: Fortifying Security of Learning-based Classifier against Adversarial Android Malware Attacks. In Proceedings of 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) . IEEE, 782--789.Google ScholarCross Ref
- Sen Chen, Minhui Xue, Zhushou Tang, Lihua Xu, and Haojin Zhu. 2016. StormDroid: A Streaminglized Machine Learning-based System for Detecting Android Malware. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security (AsiaCCS). ACM, 377--388.Google ScholarDigital Library
- Universit d du Luxembourg. 2016. AndroZoo . https://androzoo.uni.lu/.Google Scholar
- Steven HH Ding, Benjamin CM Fung, and Philippe Charland. 2019. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). IEEE, 472--489.Google ScholarCross Ref
- Yue Duan, Mu Zhang, Abhishek Vasisht Bhaskar, Heng Yin, Xiaorui Pan, Tongxin Li, Xueqiang Wang, and XiaoFeng Wang. 2018. Things You May Not Know About Android (Un)Packers: A Systematic Study based on Whole-System Emulation. In Proceedings of the Network and Distributed System Security Symposium (NDSS) .Google ScholarCross Ref
- Luca Falsina, Yanick Fratantonio, Stefano Zanero, Christopher Kruegel, Giovanni Vigna, and Federico Maggi. 2015. Grab'n Run: Secure and Practical Dynamic Code Loading for Android Applications. In Proceedings of the 31st Annual Computer Security Applications Conference (ACSAC) . 201--210.Google ScholarDigital Library
- Google. 2020. Google - Introducing the Knowledge Graph: Things, Not Strings . https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html .Google Scholar
- Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2017. Adversarial Examples for Malware Detection. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 62--79.Google ScholarCross Ref
- Shifu Hou, Aaron Saas, Lifei Chen, and Yanfang Ye. 2016. Deep4MalDroid: A Deep Learning Framework for Android Malware Detection based on Linux Kernel System Call Graphs. In Proceedings of 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW). IEEE, 104--111.Google ScholarCross Ref
- Médéric Hurier, Guillermo Suarez-Tangil, Santanu Kumar Dash, Tegawendé F Bissyandé , Yves Le Traon, Jacques Klein, and Lorenzo Cavallaro. 2017. Euphony: Harmonious Unification of Cacophonous Anti-virus Vendor Labels for Android Malware. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR). IEEE Press, 425--435.Google ScholarDigital Library
- Roberto Jordaney, Kumar Sharad, Santanu K Dash, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro. 2017. Transcend: Detecting Concept Drift in Malware Classification Models. In Proceedings of 26th USENIX Security Symposium (USENIX Security). 625--642.Google Scholar
- ElMouatez Billah Karbab, Mourad Debbabi, Abdelouahid Derhab, and Djedjiga Mouheb. 2018. MalDozer: Automatic Framework for Android Malware Detection Using Deep Learning. Digital Investigation , Vol. 24 (2018), S48--S59.Google ScholarCross Ref
- Kaspersky. 2019. Machine Learning Methods for Malware Detection . https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf .Google Scholar
- Tao Lei, Zhan Qin, Zhibo Wang, Qi Li, and Dengpan Ye. 2019. EveDroid: Event-Aware Android Malware Detection Against Model Degrading for IoT Devices. IEEE Internet of Things Journal (IOTJ) (2019).Google Scholar
- Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 183--193.Google ScholarCross Ref
- Li Li, Tegawendé F Bissyandé, Damien Octeau, and Jacques Klein. 2016. DroidRA: Taming Reflection to Support Whole-program Analysis of Android Apps. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA) . 318--329.Google ScholarDigital Library
- Hu X Li Y, Jang J. 2019. AMD Dataset . http://amd.arguslab.org/sharing .Google Scholar
- Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI) .Google ScholarCross Ref
- Martina Lindorfer, Matthias Neugschwandtner, and Christian Platzer. 2015. Marvin: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis. In Proceedings of IEEE 39th Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. IEEE, 422--433.Google ScholarDigital Library
- Walid Maalej and Martin P Robillard. 2013. Patterns of Knowledge in API Reference Documentation. IEEE Transactions on Software Engineering (TSE) , Vol. 39, 9 (2013), 1264--1282.Google ScholarDigital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using t-SNE. Journal of machine Learning Research , Vol. 9, Nov (2008), 2579--2605.Google Scholar
- Enrico Mariconti, Lucky Onwuzurike, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, and Gianluca Stringhini. 2017. MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models. In Proceedings of the Network and Distributed System Security Symposium (NDSS) .Google ScholarCross Ref
- Niall McLaughlin, Jesus Martinez del Rincon, BooJoong Kang, Suleiman Yerima, Paul Miller, Sakir Sezer, Yeganeh Safaei, Erik Trickel, Ziming Zhao, Adam Doupé, et almbox. 2017. Deep Android Malware Detection. In Proceedings of the 7th ACM on Conference on Data and Application Security and Privacy (CODASPY). ACM, 301--308.Google ScholarDigital Library
- Trend Micro. 2018a. The Evolution of XLoader and FakeSpy: Two Interconnected Android Malware Families . https://documents.trendmicro.com/assets/pdf/wp-evolution-of-xloader-and-fakespy-two-interconnected-android-malware-families.pdf .Google Scholar
- Trend Micro. 2018b. XLoader Android Spyware and Banking Trojan Distributed via DNS Spoofing . https://blog.trendmicro.com/trendlabs-security-intelligence/xloader-android-spyware-and-banking-trojan-distributed-via-dns-spoofing/.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the Advances in Neural Information Processing Systems (NIPS) . 3111--3119.Google Scholar
- Annamalai Narayanan, Liu Yang, Lihui Chen, and Liu Jinliang. 2016. Adaptive and Scalable Android Malware Detection through Online Learning. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2484--2491.Google ScholarCross Ref
- Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N Nguyen. 2017. Exploring API Embedding for API Usages and Applications. In Proceedings of the 39th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 438--449.Google ScholarDigital Library
- Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2019. TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time. In Proceedings of the 28th USENIX Security Symposium (USENIX Security). USENIX Association, Santa Clara, CA, 729--746.Google Scholar
- Sebastian Poeplau, Yanick Fratantonio, Antonio Bianchi, Christopher Kruegel, and Giovanni Vigna. 2014. Execute This! Analyzing Unsafe and Malicious Dynamic Code Loading in Android Applications.. In Proceedings of the Network and Distributed System Security Symposium (NDSS), Vol. 14. 23--26.Google ScholarCross Ref
- scikit-learn. 2020. scikit-learn, Machine Learning in Python . https://scikit-learn.org .Google Scholar
- spaCy. 2020. spaCy - Industrial-Strength Natural Language Processing . https://spacy.io/.Google Scholar
- MA Syakur, BK Khotimah, EMS Rochman, and BD Satoto. 2018. Integration K-means Clustering Method and Elbow Method for Identification of the Best Customer Profile Cluster. In IOP Conference Series: Materials Science and Engineering, Vol. 336. IOP Publishing, 012017.Google Scholar
- TensorFlow. 2020. TensorFlow - An End-to-end Open Source Machine Learning Platform . https://www.tensorflow.org/.Google Scholar
- VirusShare. 2020. VirusShare . https://virusshare.com .Google Scholar
- VirusTotal. 2020. VirusTotal . https://virustotal.com .Google Scholar
- Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI) .Google ScholarCross Ref
- Fengguo Wei, Yuping Li, Sankardas Roy, Xinming Ou, and Wu Zhou. 2017. Deep Ground Truth Analysis of Current Android Malware. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA). Springer, Bonn, Germany, 252--276.Google ScholarCross Ref
- Ke Xu, Yingjiu Li, Robert Deng, Kai Chen, and Jiayun Xu. 2019. Droidevolver: Self-evolving android malware detection system. In 2019 IEEE European Symposium on Security and Privacy (Euro S&P). IEEE, 47--62.Google ScholarCross Ref
- Chao Yang, Zhaoyan Xu, Guofei Gu, Vinod Yegneswaran, and Phillip Porras. 2014. DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 163--182.Google ScholarDigital Library
- Wei Yang, Xusheng Xiao, Benjamin Andow, Sihan Li, Tao Xie, and William Enck. 2015. AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context. In Proceedings of the 37th International Conference on Software Engineering (ICSE). IEEE Press, 303--313.Google ScholarCross Ref
- Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014. Semantics-aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 1105--1116.Google ScholarDigital Library
- Yueqian Zhang, Xiapu Luo, and Haoyang Yin. 2015. DexHunter: Toward Extracting Hidden Code from Packed Android Applications. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 293--311.Google ScholarCross Ref
- Shuofei Zhu, Jianjun Shi, Limin Yang, Boqin Qin, Ziyi Zhang, Linhai Song, and Gang Wang. 2020. Measuring and Modeling the Label Dynamics of Online Anti-Malware Engines. In Proceedings of the29th USENIX Security Symposium (USENIX Security) .Google Scholar
Index Terms
- Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware
Recommendations
A novel Android malware detection method with API semantics extraction
AbstractDue to the continuous evolution of both the Android framework and malware, conventional malware detection methods that have been trained using outdated apps are inadequate in effectively identifying sophisticated evolved malware. To address this ...
Evading API Call Sequence Based Malware Classifiers
Information and Communications SecurityAbstractIn this paper, we present a mimicry attack to transform malware binary, which can evade detection by API call sequence based malware classifiers. While original malware was detectable by malware classifiers, transformed malware, when run, with ...
Enhancing malware detection: clients deserve more protection
Sophisticated malware is designed to spread over the network and infect as many connected client machines as possible before being detected. Network security engineers have always been challenged to detect and track down such malware before infecting ...
Comments