skip to main content
10.1145/3372297.3417291acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware

Published:02 November 2020Publication History

ABSTRACT

Machine learning (ML) classifiers have been widely deployed to detect Android malware, but at the same time the application of ML classifiers also faces an emerging problem. The performance of such classifiers degrades---or called ages---significantly over time given the malware evolution. Prior works have proposed to use retraining or active learning to reverse and improve aged models. However, the underlying classifier itself is still blind, unaware of malware evolution. Unsurprisingly, such evolution-insensitive retraining or active learning comes at a price, i.e., the labeling of tens of thousands of malware samples and the cost of significant human efforts. In this paper, we propose the first framework, called APIGraph, to enhance state-of-the-art malware classifiers with the similarity information among evolved Android malware in terms of semantically-equivalent or similar API usages, thus naturally slowing down classifier aging. Our evaluation shows that because of the slow-down of classifier aging, APIGraph saves significant amounts of human efforts required by active learning in labeling new malware samples.

Skip Supplemental Material Section

Supplemental Material

Copy of CCS2020_fpx503_XiohaonZhang - Ami Eckard-Lee.mov

mov

269.7 MB

References

  1. Yousra Aafer, Wenliang Du, and Heng Yin. 2013. DroidAPIMiner: Mining API-level Features for Robust Malware Detection in Android. In Proceedings of the International Conference on Security and Privacy in Communication Systems (SecureComm). Springer, 86--103.Google ScholarGoogle ScholarCross RefCross Ref
  2. Apktool. 2019. A Tool for Reverse Engineering Android APK Files . https://ibotpeaches.github.io/Apktool/.Google ScholarGoogle Scholar
  3. Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket.. In Proceedings of the Network and Distributed System Security Symposium (NDSS) . 23--26.Google ScholarGoogle ScholarCross RefCross Ref
  4. Kathy Wain Yee Au, Yi Fan Zhou, Zhen Huang, and David Lie. 2012. PScout: Analyzing the Android Permission Specification. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS). ACM, 217--228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael Backes, Sven Bugiel, Erik Derr, Patrick McDaniel, Damien Octeau, and Sebastian Weisgerber. 2016. On Demystifying the Android Application Framework: Re-visiting Android Permission Specification Analysis. In Proceedings of the 25th USENIX Security Symposium (USENIX Security). 1101--1118.Google ScholarGoogle Scholar
  6. Paulo Barros, René Just, Suzanne Millstein, Paul Vines, Werner Dietl, Michael D Ernst, et almbox. 2015. Static Analysis of Implicit Control Flow: Resolving Java Reflection and Android Intents. In Proceeding of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 669--679.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD). 1247--1250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the 26th Advances in Neural Information Processing Systems (NIPS) . 2787--2795.Google ScholarGoogle Scholar
  9. Haipeng Cai. 2020. Assessing and Improving Malware Detection Sustainability through App Evolution Studies. ACM Transactions on Software Engineering and Methodology (TOSEM) , Vol. 29, 2 (2020), 28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kai Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, and Peng Liu. 2015. Finding Unknown Malice in 10 seconds: Mass Vetting for New Threats at the Google-play Scale. In Proceedings of 24th USENIX Security Symposium (USENIX Security). 659--674.Google ScholarGoogle Scholar
  11. Lingwei Chen, Shifu Hou, Yanfang Ye, and Shouhuai Xu. 2018. DroidEye: Fortifying Security of Learning-based Classifier against Adversarial Android Malware Attacks. In Proceedings of 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) . IEEE, 782--789.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sen Chen, Minhui Xue, Zhushou Tang, Lihua Xu, and Haojin Zhu. 2016. StormDroid: A Streaminglized Machine Learning-based System for Detecting Android Malware. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security (AsiaCCS). ACM, 377--388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Universit d du Luxembourg. 2016. AndroZoo . https://androzoo.uni.lu/.Google ScholarGoogle Scholar
  14. Steven HH Ding, Benjamin CM Fung, and Philippe Charland. 2019. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). IEEE, 472--489.Google ScholarGoogle ScholarCross RefCross Ref
  15. Yue Duan, Mu Zhang, Abhishek Vasisht Bhaskar, Heng Yin, Xiaorui Pan, Tongxin Li, Xueqiang Wang, and XiaoFeng Wang. 2018. Things You May Not Know About Android (Un)Packers: A Systematic Study based on Whole-System Emulation. In Proceedings of the Network and Distributed System Security Symposium (NDSS) .Google ScholarGoogle ScholarCross RefCross Ref
  16. Luca Falsina, Yanick Fratantonio, Stefano Zanero, Christopher Kruegel, Giovanni Vigna, and Federico Maggi. 2015. Grab'n Run: Secure and Practical Dynamic Code Loading for Android Applications. In Proceedings of the 31st Annual Computer Security Applications Conference (ACSAC) . 201--210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Google. 2020. Google - Introducing the Knowledge Graph: Things, Not Strings . https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html .Google ScholarGoogle Scholar
  18. Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2017. Adversarial Examples for Malware Detection. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 62--79.Google ScholarGoogle ScholarCross RefCross Ref
  19. Shifu Hou, Aaron Saas, Lifei Chen, and Yanfang Ye. 2016. Deep4MalDroid: A Deep Learning Framework for Android Malware Detection based on Linux Kernel System Call Graphs. In Proceedings of 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW). IEEE, 104--111.Google ScholarGoogle ScholarCross RefCross Ref
  20. Médéric Hurier, Guillermo Suarez-Tangil, Santanu Kumar Dash, Tegawendé F Bissyandé , Yves Le Traon, Jacques Klein, and Lorenzo Cavallaro. 2017. Euphony: Harmonious Unification of Cacophonous Anti-virus Vendor Labels for Android Malware. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR). IEEE Press, 425--435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Roberto Jordaney, Kumar Sharad, Santanu K Dash, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro. 2017. Transcend: Detecting Concept Drift in Malware Classification Models. In Proceedings of 26th USENIX Security Symposium (USENIX Security). 625--642.Google ScholarGoogle Scholar
  22. ElMouatez Billah Karbab, Mourad Debbabi, Abdelouahid Derhab, and Djedjiga Mouheb. 2018. MalDozer: Automatic Framework for Android Malware Detection Using Deep Learning. Digital Investigation , Vol. 24 (2018), S48--S59.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kaspersky. 2019. Machine Learning Methods for Malware Detection . https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf .Google ScholarGoogle Scholar
  24. Tao Lei, Zhan Qin, Zhibo Wang, Qi Li, and Dengpan Ye. 2019. EveDroid: Event-Aware Android Malware Detection Against Model Degrading for IoT Devices. IEEE Internet of Things Journal (IOTJ) (2019).Google ScholarGoogle Scholar
  25. Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 183--193.Google ScholarGoogle ScholarCross RefCross Ref
  26. Li Li, Tegawendé F Bissyandé, Damien Octeau, and Jacques Klein. 2016. DroidRA: Taming Reflection to Support Whole-program Analysis of Android Apps. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA) . 318--329.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hu X Li Y, Jang J. 2019. AMD Dataset . http://amd.arguslab.org/sharing .Google ScholarGoogle Scholar
  28. Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI) .Google ScholarGoogle ScholarCross RefCross Ref
  29. Martina Lindorfer, Matthias Neugschwandtner, and Christian Platzer. 2015. Marvin: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis. In Proceedings of IEEE 39th Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. IEEE, 422--433.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Walid Maalej and Martin P Robillard. 2013. Patterns of Knowledge in API Reference Documentation. IEEE Transactions on Software Engineering (TSE) , Vol. 39, 9 (2013), 1264--1282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using t-SNE. Journal of machine Learning Research , Vol. 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  32. Enrico Mariconti, Lucky Onwuzurike, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, and Gianluca Stringhini. 2017. MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models. In Proceedings of the Network and Distributed System Security Symposium (NDSS) .Google ScholarGoogle ScholarCross RefCross Ref
  33. Niall McLaughlin, Jesus Martinez del Rincon, BooJoong Kang, Suleiman Yerima, Paul Miller, Sakir Sezer, Yeganeh Safaei, Erik Trickel, Ziming Zhao, Adam Doupé, et almbox. 2017. Deep Android Malware Detection. In Proceedings of the 7th ACM on Conference on Data and Application Security and Privacy (CODASPY). ACM, 301--308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Trend Micro. 2018a. The Evolution of XLoader and FakeSpy: Two Interconnected Android Malware Families . https://documents.trendmicro.com/assets/pdf/wp-evolution-of-xloader-and-fakespy-two-interconnected-android-malware-families.pdf .Google ScholarGoogle Scholar
  35. Trend Micro. 2018b. XLoader Android Spyware and Banking Trojan Distributed via DNS Spoofing . https://blog.trendmicro.com/trendlabs-security-intelligence/xloader-android-spyware-and-banking-trojan-distributed-via-dns-spoofing/.Google ScholarGoogle Scholar
  36. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the Advances in Neural Information Processing Systems (NIPS) . 3111--3119.Google ScholarGoogle Scholar
  37. Annamalai Narayanan, Liu Yang, Lihui Chen, and Liu Jinliang. 2016. Adaptive and Scalable Android Malware Detection through Online Learning. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2484--2491.Google ScholarGoogle ScholarCross RefCross Ref
  38. Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N Nguyen. 2017. Exploring API Embedding for API Usages and Applications. In Proceedings of the 39th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 438--449.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2019. TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time. In Proceedings of the 28th USENIX Security Symposium (USENIX Security). USENIX Association, Santa Clara, CA, 729--746.Google ScholarGoogle Scholar
  40. Sebastian Poeplau, Yanick Fratantonio, Antonio Bianchi, Christopher Kruegel, and Giovanni Vigna. 2014. Execute This! Analyzing Unsafe and Malicious Dynamic Code Loading in Android Applications.. In Proceedings of the Network and Distributed System Security Symposium (NDSS), Vol. 14. 23--26.Google ScholarGoogle ScholarCross RefCross Ref
  41. scikit-learn. 2020. scikit-learn, Machine Learning in Python . https://scikit-learn.org .Google ScholarGoogle Scholar
  42. spaCy. 2020. spaCy - Industrial-Strength Natural Language Processing . https://spacy.io/.Google ScholarGoogle Scholar
  43. MA Syakur, BK Khotimah, EMS Rochman, and BD Satoto. 2018. Integration K-means Clustering Method and Elbow Method for Identification of the Best Customer Profile Cluster. In IOP Conference Series: Materials Science and Engineering, Vol. 336. IOP Publishing, 012017.Google ScholarGoogle Scholar
  44. TensorFlow. 2020. TensorFlow - An End-to-end Open Source Machine Learning Platform . https://www.tensorflow.org/.Google ScholarGoogle Scholar
  45. VirusShare. 2020. VirusShare . https://virusshare.com .Google ScholarGoogle Scholar
  46. VirusTotal. 2020. VirusTotal . https://virustotal.com .Google ScholarGoogle Scholar
  47. Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI) .Google ScholarGoogle ScholarCross RefCross Ref
  48. Fengguo Wei, Yuping Li, Sankardas Roy, Xinming Ou, and Wu Zhou. 2017. Deep Ground Truth Analysis of Current Android Malware. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA). Springer, Bonn, Germany, 252--276.Google ScholarGoogle ScholarCross RefCross Ref
  49. Ke Xu, Yingjiu Li, Robert Deng, Kai Chen, and Jiayun Xu. 2019. Droidevolver: Self-evolving android malware detection system. In 2019 IEEE European Symposium on Security and Privacy (Euro S&P). IEEE, 47--62.Google ScholarGoogle ScholarCross RefCross Ref
  50. Chao Yang, Zhaoyan Xu, Guofei Gu, Vinod Yegneswaran, and Phillip Porras. 2014. DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 163--182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wei Yang, Xusheng Xiao, Benjamin Andow, Sihan Li, Tao Xie, and William Enck. 2015. AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context. In Proceedings of the 37th International Conference on Software Engineering (ICSE). IEEE Press, 303--313.Google ScholarGoogle ScholarCross RefCross Ref
  52. Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014. Semantics-aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 1105--1116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yueqian Zhang, Xiapu Luo, and Haoyang Yin. 2015. DexHunter: Toward Extracting Hidden Code from Packed Android Applications. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 293--311.Google ScholarGoogle ScholarCross RefCross Ref
  54. Shuofei Zhu, Jianjun Shi, Limin Yang, Boqin Qin, Ziyi Zhang, Linhai Song, and Gang Wang. 2020. Measuring and Modeling the Label Dynamics of Online Anti-Malware Engines. In Proceedings of the29th USENIX Security Symposium (USENIX Security) .Google ScholarGoogle Scholar

Index Terms

  1. Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security
        October 2020
        2180 pages
        ISBN:9781450370899
        DOI:10.1145/3372297

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 November 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader