research-article

Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware

Authors:
Xiaohan Zhang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Yuan Zhang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Ming Zhong

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Daizong Ding

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Yinzhi Cao

Johns Hopkins University, Baltimore, MD, USA

Johns Hopkins University, Baltimore, MD, USA
View Profile

,
Yukun Zhang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Mi Zhang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Min Yang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2020Pages 757–770https://doi.org/10.1145/3372297.3417291

Published:02 November 2020Publication History

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

Pages 757–770

ABSTRACT

Machine learning (ML) classifiers have been widely deployed to detect Android malware, but at the same time the application of ML classifiers also faces an emerging problem. The performance of such classifiers degrades---or called ages---significantly over time given the malware evolution. Prior works have proposed to use retraining or active learning to reverse and improve aged models. However, the underlying classifier itself is still blind, unaware of malware evolution. Unsurprisingly, such evolution-insensitive retraining or active learning comes at a price, i.e., the labeling of tens of thousands of malware samples and the cost of significant human efforts. In this paper, we propose the first framework, called APIGraph, to enhance state-of-the-art malware classifiers with the similarity information among evolved Android malware in terms of semantically-equivalent or similar API usages, thus naturally slowing down classifier aging. Our evaluation shows that because of the slow-down of classifier aging, APIGraph saves significant amounts of human efforts required by active learning in labeling new malware samples.

Supplemental Material

Copy of CCS2020_fpx503_XiohaonZhang - Ami Eckard-Lee.mov

mov

269.7 MB

Download

References

Yousra Aafer, Wenliang Du, and Heng Yin. 2013. DroidAPIMiner: Mining API-level Features for Robust Malware Detection in Android. In Proceedings of the International Conference on Security and Privacy in Communication Systems (SecureComm). Springer, 86--103.Google ScholarCross Ref
Apktool. 2019. A Tool for Reverse Engineering Android APK Files . https://ibotpeaches.github.io/Apktool/.Google Scholar
Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket.. In Proceedings of the Network and Distributed System Security Symposium (NDSS) . 23--26.Google ScholarCross Ref
Kathy Wain Yee Au, Yi Fan Zhou, Zhen Huang, and David Lie. 2012. PScout: Analyzing the Android Permission Specification. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS). ACM, 217--228.Google ScholarDigital Library
Michael Backes, Sven Bugiel, Erik Derr, Patrick McDaniel, Damien Octeau, and Sebastian Weisgerber. 2016. On Demystifying the Android Application Framework: Re-visiting Android Permission Specification Analysis. In Proceedings of the 25th USENIX Security Symposium (USENIX Security). 1101--1118.Google Scholar
Paulo Barros, René Just, Suzanne Millstein, Paul Vines, Werner Dietl, Michael D Ernst, et almbox. 2015. Static Analysis of Implicit Control Flow: Resolving Java Reflection and Android Intents. In Proceeding of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 669--679.Google ScholarDigital Library
Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD). 1247--1250.Google ScholarDigital Library
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the 26th Advances in Neural Information Processing Systems (NIPS) . 2787--2795.Google Scholar
Haipeng Cai. 2020. Assessing and Improving Malware Detection Sustainability through App Evolution Studies. ACM Transactions on Software Engineering and Methodology (TOSEM) , Vol. 29, 2 (2020), 28.Google ScholarDigital Library
Kai Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, and Peng Liu. 2015. Finding Unknown Malice in 10 seconds: Mass Vetting for New Threats at the Google-play Scale. In Proceedings of 24th USENIX Security Symposium (USENIX Security). 659--674.Google Scholar
Lingwei Chen, Shifu Hou, Yanfang Ye, and Shouhuai Xu. 2018. DroidEye: Fortifying Security of Learning-based Classifier against Adversarial Android Malware Attacks. In Proceedings of 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) . IEEE, 782--789.Google ScholarCross Ref
Sen Chen, Minhui Xue, Zhushou Tang, Lihua Xu, and Haojin Zhu. 2016. StormDroid: A Streaminglized Machine Learning-based System for Detecting Android Malware. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security (AsiaCCS). ACM, 377--388.Google ScholarDigital Library
Universit d du Luxembourg. 2016. AndroZoo . https://androzoo.uni.lu/.Google Scholar
Steven HH Ding, Benjamin CM Fung, and Philippe Charland. 2019. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). IEEE, 472--489.Google ScholarCross Ref
Yue Duan, Mu Zhang, Abhishek Vasisht Bhaskar, Heng Yin, Xiaorui Pan, Tongxin Li, Xueqiang Wang, and XiaoFeng Wang. 2018. Things You May Not Know About Android (Un)Packers: A Systematic Study based on Whole-System Emulation. In Proceedings of the Network and Distributed System Security Symposium (NDSS) .Google ScholarCross Ref
Luca Falsina, Yanick Fratantonio, Stefano Zanero, Christopher Kruegel, Giovanni Vigna, and Federico Maggi. 2015. Grab'n Run: Secure and Practical Dynamic Code Loading for Android Applications. In Proceedings of the 31st Annual Computer Security Applications Conference (ACSAC) . 201--210.Google ScholarDigital Library
Google. 2020. Google - Introducing the Knowledge Graph: Things, Not Strings . https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html .Google Scholar
Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2017. Adversarial Examples for Malware Detection. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 62--79.Google ScholarCross Ref
Shifu Hou, Aaron Saas, Lifei Chen, and Yanfang Ye. 2016. Deep4MalDroid: A Deep Learning Framework for Android Malware Detection based on Linux Kernel System Call Graphs. In Proceedings of 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW). IEEE, 104--111.Google ScholarCross Ref
Médéric Hurier, Guillermo Suarez-Tangil, Santanu Kumar Dash, Tegawendé F Bissyandé , Yves Le Traon, Jacques Klein, and Lorenzo Cavallaro. 2017. Euphony: Harmonious Unification of Cacophonous Anti-virus Vendor Labels for Android Malware. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR). IEEE Press, 425--435.Google ScholarDigital Library
Roberto Jordaney, Kumar Sharad, Santanu K Dash, Zhi Wang, Davide Papini, Ilia Nouretdinov, and Lorenzo Cavallaro. 2017. Transcend: Detecting Concept Drift in Malware Classification Models. In Proceedings of 26th USENIX Security Symposium (USENIX Security). 625--642.Google Scholar
ElMouatez Billah Karbab, Mourad Debbabi, Abdelouahid Derhab, and Djedjiga Mouheb. 2018. MalDozer: Automatic Framework for Android Malware Detection Using Deep Learning. Digital Investigation , Vol. 24 (2018), S48--S59.Google ScholarCross Ref
Kaspersky. 2019. Machine Learning Methods for Malware Detection . https://media.kaspersky.com/en/enterprise-security/Kaspersky-Lab-Whitepaper-Machine-Learning.pdf .Google Scholar
Tao Lei, Zhan Qin, Zhibo Wang, Qi Li, and Dengpan Ye. 2019. EveDroid: Event-Aware Android Malware Detection Against Model Degrading for IoT Devices. IEEE Internet of Things Journal (IOTJ) (2019).Google Scholar
Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 183--193.Google ScholarCross Ref
Li Li, Tegawendé F Bissyandé, Damien Octeau, and Jacques Klein. 2016. DroidRA: Taming Reflection to Support Whole-program Analysis of Android Apps. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA) . 318--329.Google ScholarDigital Library
Hu X Li Y, Jang J. 2019. AMD Dataset . http://amd.arguslab.org/sharing .Google Scholar
Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI) .Google ScholarCross Ref
Martina Lindorfer, Matthias Neugschwandtner, and Christian Platzer. 2015. Marvin: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis. In Proceedings of IEEE 39th Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. IEEE, 422--433.Google ScholarDigital Library
Walid Maalej and Martin P Robillard. 2013. Patterns of Knowledge in API Reference Documentation. IEEE Transactions on Software Engineering (TSE) , Vol. 39, 9 (2013), 1264--1282.Google ScholarDigital Library
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using t-SNE. Journal of machine Learning Research , Vol. 9, Nov (2008), 2579--2605.Google Scholar
Enrico Mariconti, Lucky Onwuzurike, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, and Gianluca Stringhini. 2017. MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models. In Proceedings of the Network and Distributed System Security Symposium (NDSS) .Google ScholarCross Ref
Niall McLaughlin, Jesus Martinez del Rincon, BooJoong Kang, Suleiman Yerima, Paul Miller, Sakir Sezer, Yeganeh Safaei, Erik Trickel, Ziming Zhao, Adam Doupé, et almbox. 2017. Deep Android Malware Detection. In Proceedings of the 7th ACM on Conference on Data and Application Security and Privacy (CODASPY). ACM, 301--308.Google ScholarDigital Library
Trend Micro. 2018a. The Evolution of XLoader and FakeSpy: Two Interconnected Android Malware Families . https://documents.trendmicro.com/assets/pdf/wp-evolution-of-xloader-and-fakespy-two-interconnected-android-malware-families.pdf .Google Scholar
Trend Micro. 2018b. XLoader Android Spyware and Banking Trojan Distributed via DNS Spoofing . https://blog.trendmicro.com/trendlabs-security-intelligence/xloader-android-spyware-and-banking-trojan-distributed-via-dns-spoofing/.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the Advances in Neural Information Processing Systems (NIPS) . 3111--3119.Google Scholar
Annamalai Narayanan, Liu Yang, Lihui Chen, and Liu Jinliang. 2016. Adaptive and Scalable Android Malware Detection through Online Learning. In 2016 International Joint Conference on Neural Networks (IJCNN). IEEE, 2484--2491.Google ScholarCross Ref
Trong Duc Nguyen, Anh Tuan Nguyen, Hung Dang Phan, and Tien N Nguyen. 2017. Exploring API Embedding for API Usages and Applications. In Proceedings of the 39th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 438--449.Google ScholarDigital Library
Feargus Pendlebury, Fabio Pierazzi, Roberto Jordaney, Johannes Kinder, and Lorenzo Cavallaro. 2019. TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time. In Proceedings of the 28th USENIX Security Symposium (USENIX Security). USENIX Association, Santa Clara, CA, 729--746.Google Scholar
Sebastian Poeplau, Yanick Fratantonio, Antonio Bianchi, Christopher Kruegel, and Giovanni Vigna. 2014. Execute This! Analyzing Unsafe and Malicious Dynamic Code Loading in Android Applications.. In Proceedings of the Network and Distributed System Security Symposium (NDSS), Vol. 14. 23--26.Google ScholarCross Ref
scikit-learn. 2020. scikit-learn, Machine Learning in Python . https://scikit-learn.org .Google Scholar
spaCy. 2020. spaCy - Industrial-Strength Natural Language Processing . https://spacy.io/.Google Scholar
MA Syakur, BK Khotimah, EMS Rochman, and BD Satoto. 2018. Integration K-means Clustering Method and Elbow Method for Identification of the Best Customer Profile Cluster. In IOP Conference Series: Materials Science and Engineering, Vol. 336. IOP Publishing, 012017.Google Scholar
TensorFlow. 2020. TensorFlow - An End-to-end Open Source Machine Learning Platform . https://www.tensorflow.org/.Google Scholar
VirusShare. 2020. VirusShare . https://virusshare.com .Google Scholar
VirusTotal. 2020. VirusTotal . https://virustotal.com .Google Scholar
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI) .Google ScholarCross Ref
Fengguo Wei, Yuping Li, Sankardas Roy, Xinming Ou, and Wu Zhou. 2017. Deep Ground Truth Analysis of Current Android Malware. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA). Springer, Bonn, Germany, 252--276.Google ScholarCross Ref
Ke Xu, Yingjiu Li, Robert Deng, Kai Chen, and Jiayun Xu. 2019. Droidevolver: Self-evolving android malware detection system. In 2019 IEEE European Symposium on Security and Privacy (Euro S&P). IEEE, 47--62.Google ScholarCross Ref
Chao Yang, Zhaoyan Xu, Guofei Gu, Vinod Yegneswaran, and Phillip Porras. 2014. DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 163--182.Google ScholarDigital Library
Wei Yang, Xusheng Xiao, Benjamin Andow, Sihan Li, Tao Xie, and William Enck. 2015. AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context. In Proceedings of the 37th International Conference on Software Engineering (ICSE). IEEE Press, 303--313.Google ScholarCross Ref
Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014. Semantics-aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 1105--1116.Google ScholarDigital Library
Yueqian Zhang, Xiapu Luo, and Haoyang Yin. 2015. DexHunter: Toward Extracting Hidden Code from Packed Android Applications. In Proceedings of the European Symposium on Research in Computer Security (ESORICS). Springer, 293--311.Google ScholarCross Ref
Shuofei Zhu, Jianjun Shi, Limin Yang, Boqin Qin, Ziyi Zhang, Linhai Song, and Gang Wang. 2020. Measuring and Modeling the Label Dynamics of Online Anti-Malware Engines. In Proceedings of the29th USENIX Security Symposium (USENIX Security) .Google Scholar

Index Terms

Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
    1. Malware and its mitigation
  2. Systems security
    1. Operating systems security
      1. Mobile platform security

Recommendations

A novel Android malware detection method with API semantics extraction
Abstract
Due to the continuous evolution of both the Android framework and malware, conventional malware detection methods that have been trained using outdated apps are inadequate in effectively identifying sophisticated evolved malware. To address this ...
Read More
Evading API Call Sequence Based Malware Classifiers
Information and Communications Security
Abstract
In this paper, we present a mimicry attack to transform malware binary, which can evade detection by API call sequence based malware classifiers. While original malware was detectable by malware classifiers, transformed malware, when run, with ...
Read More
Enhancing malware detection: clients deserve more protection

Sophisticated malware is designed to spread over the network and infect as many connected client machines as possible before being detected. Network security engineers have always been challenged to detect and track down such malware before infecting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security
October 2020
2180 pages
ISBN:9781450370899
DOI:10.1145/3372297
General Chairs:
Jay Ligatti
University of South Florida, USA
,
Xinming Ou
University of South Florida, USA
,
Program Chairs:
Jonathan Katz
University of Maryland, USA
,
Giovanni Vigna
University of California-Santa Barbara, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
api semantics
evolved malware detection
model aging
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,261of6,999submissions,18%
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 88
  Total Citations
  View Citations
- 2,398
  Total Downloads
- Downloads (Last 12 months)451
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A novel Android malware detection method with API semantics extraction

Evading API Call Sequence Based Malware Classifiers

Enhancing malware detection: clients deserve more protection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware

CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A novel Android malware detection method with API semantics extraction

Evading API Call Sequence Based Malware Classifiers

Enhancing malware detection: clients deserve more protection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media