Learning and Classification of Malware Behavior

Rieck, Konrad; Holz, Thorsten; Willems, Carsten; Düssel, Patrick; Laskov, Pavel

doi:10.1007/978-3-540-70542-0_6

Konrad Rieck¹,
Thorsten Holz²,
Carsten Willems²,
Patrick Düssel¹ &
…
Pavel Laskov^1,3

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 5137))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

4894 Accesses
319 Citations
4 Altmetric

Abstract

Malicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the effectiveness of classical signature-based detection. Yet variants of malware families share typical behavioral patterns reflecting its origin and purpose. We aim to exploit these shared patterns for classification of malware and propose a method for learning and discrimination of malware behavior. Our method proceeds in three stages: (a) behavior of collected malware is monitored in a sandbox environment, (b) based on a corpus of malware labeled by an anti-virus scanner a malware behavior classifier is trained using learning techniques and (c) discriminative features of the behavior models are ranked for explanation of classification decisions. Experiments with different heterogeneous test data collected over several months using honeypots demonstrate the effectiveness of our method, especially in detecting novel instances of malware families previously not recognized by commercial anti-virus software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Microsoft Security Intelligence Report (October 2007), http://www.microsoft.com/downloads/details.aspx?FamilyID=4EDE2572-1D39-46EA-94C6-4851750A2CB0
Avira. AntiVir PersonalEdition Classic (2007), http://www.avira.de/en/products/personal.html
Baecher, P., Koetter, M., Holz, T., Dornseif, M., Freiling, F.C.: The Nepenthes Platform: An Efficient Approach to Collect Malware. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 165–184. Springer, Heidelberg (2006)
Chapter Google Scholar
Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated Classification and Analysis of Internet Malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007)
Chapter Google Scholar
Bayer, U., Kruegel, C., Kirda, E.: TTAnalyze: A tool for analyzing malware. In: Proceedings of EICAR 2006 (April 2006)
Google Scholar
Bayer, U., Moser, A., Kruegel, C., Kirda, E.: Dynamic analysis of malicious code. Journal in Computer Virology 2, 67–77 (2006)
Article Google Scholar
Burges, C.: A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining 2(2), 121–167 (1998)
Article Google Scholar
Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium, p. 12(2003)
Google Scholar
Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2007)
Google Scholar
Christodorescu, M., Jha, S., Seshia, S.A., Song, D.X., Bryant, R.E.: Semantics-aware malware detection. In: IEEE Symposium on Security and Privacy, pp. 32–46 (2005)
Google Scholar
Egele, M., Kruegel, C., Kirda, E., Yin, H., Song, D.: Dynamic spyware analysis. In: Proceedings of USENIX Annual Technical Conference (June 2007)
Google Scholar
Flake, H.: Structural comparison of executable objects. In: Proceedings of Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2004) (2004)
Google Scholar
Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of the 15th USENIX Security Symposium, pp. 241–256 (2006)
Google Scholar
Hunt, G.C., Brubacker, D.: Detours: Binary interception of Win32 functions. In: Proceedings of the 3rd USENIX Windows NT Symposium, pp. 135–143 (1999)
Google Scholar
Jiang, X., Xu, D.: Collapsar: A VM-based architecture for network attack detention center. In: Proceedings of the 13th USENIX Security Symposium (2004)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142. Springer, Heidelberg (1998)
Google Scholar
Joachims, T.: Learning to Classify Text using Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2002)
Google Scholar
Karim, M., Walenstein, A., Lakhotia, A., Laxmi, P.: Malware phylogeny generation using permutations of code. Journal in Computer Virology 1(1–2), 13–23 (2005)
Article Google Scholar
Kirda, E., Kruegel, C., Banks, G., Vigna, G., Kemmerer, R.A.: Behavior-based spyware detection. In: Proceedings of the 15th USENIX Security Symposium, p. 19 (2006)
Google Scholar
Kolter, J., Maloof, M.: Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7, 2721–2744 (2006)
MathSciNet Google Scholar
Kruegel, C., Robertson, W., Vigna, G.: Detecting kernel-level rootkits through binary analysis. In: Proceedings of the 20th Annual Computer Security Applications Conference (ACSAC) (2004)
Google Scholar
Lee, T., Mody, J.J.: Behavioral classification. In: Proceedings of EICAR 2006 (April 2006)
Google Scholar
Leita, C., Dacier, M., Massicotte, F.: Automatic Handling of Protocol Dependencies and Reaction to 0-Day Attacks with ScriptGen Based Honeypots. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219. Springer, Heidelberg (2006)
Chapter Google Scholar
Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: Proceedings of 2007 IEEE Symposium on Security and Privacy (2007)
Google Scholar
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC) (to appear, 2007)
Google Scholar
Norman. Norman sandbox information center (accessed, 2007), http://sandbox.norman.no/
Platt, J.: Probabilistic outputs for Support Vector Machines and comparison to regularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. MIT Press, Cambridge (2001)
Google Scholar
Pouget, F., Dacier, M., Pham, V.H.: Leurre.com: on the advantages of deploying a large scale distributed honeypot platform. In: ECCE 2005, E-Crime and Computer Conference, March 29-30, Monaco (March 2005)
Google Scholar
Rieck, K., Laskov, P.: Linear-time computation of similarity measures for sequential data. Journal of Machine Learning Research 9, 23–48 (2008)
Google Scholar
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Google Scholar
Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. Journal of Maching Learning Research 7, 1531–1565 (2006)
Google Scholar
Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley, Reading (2005)
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)
MATH Google Scholar
Virus Bulletin. AVK tops latest AV-Test charts (August 2007), http://www.virusbtn.com/news/2007/08_22a.xml
Vrable, M., Ma, J., Chen, J., Moore, D., Vandekieft, E., Snoeren, A.C., Voelker, G.M., Savage, S.: Scalability, fidelity, and containment in the potemkin virtual honeyfarm. SIGOPS Oper. Syst. Rev. 39(5), 148–162 (2005)
Article Google Scholar
Wagner, D., Soto, P.: Mimicry attacks on host based intrusion detection systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS 2002), pp. 255–264 (2002)
Google Scholar
Willems, C., Holz, T., Freiling, F.: CWSandbox: Towards automated dynamic binary analysis. IEEE Security and Privacy 5(2) (2007)
Google Scholar
Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.: Panorama: Capturing system-wide information flow for malware detection and analysis. In: Proceedings of ACM Conference on Computer and Communication Security (October 2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Data Analysis Department, Fraunhofer Institute FIRST, Berlin, Germany
Konrad Rieck, Patrick Düssel & Pavel Laskov
Laboratory for Dependable Distributed Systems, University of Mannheim, Mannheim, Germany
Thorsten Holz & Carsten Willems
Wilhelm-Schickard-Institute for Computer Science, University of Tübingen, Tübingen, Germany
Pavel Laskov

Authors

Konrad Rieck
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Holz
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Willems
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Düssel
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Laskov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Diego Zamboni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P. (2008). Learning and Classification of Malware Behavior. In: Zamboni, D. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2008. Lecture Notes in Computer Science, vol 5137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70542-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-70542-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70541-3
Online ISBN: 978-3-540-70542-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics