Skip to main content

Learning and Classification of Malware Behavior

  • Conference paper
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2008)

Abstract

Malicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the effectiveness of classical signature-based detection. Yet variants of malware families share typical behavioral patterns reflecting its origin and purpose. We aim to exploit these shared patterns for classification of malware and propose a method for learning and discrimination of malware behavior. Our method proceeds in three stages: (a) behavior of collected malware is monitored in a sandbox environment, (b) based on a corpus of malware labeled by an anti-virus scanner a malware behavior classifier is trained using learning techniques and (c) discriminative features of the behavior models are ranked for explanation of classification decisions. Experiments with different heterogeneous test data collected over several months using honeypots demonstrate the effectiveness of our method, especially in detecting novel instances of malware families previously not recognized by commercial anti-virus software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Microsoft Security Intelligence Report (October 2007), http://www.microsoft.com/downloads/details.aspx?FamilyID=4EDE2572-1D39-46EA-94C6-4851750A2CB0

  2. Avira. AntiVir PersonalEdition Classic (2007), http://www.avira.de/en/products/personal.html

  3. Baecher, P., Koetter, M., Holz, T., Dornseif, M., Freiling, F.C.: The Nepenthes Platform: An Efficient Approach to Collect Malware. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 165–184. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated Classification and Analysis of Internet Malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Bayer, U., Kruegel, C., Kirda, E.: TTAnalyze: A tool for analyzing malware. In: Proceedings of EICAR 2006 (April 2006)

    Google Scholar 

  6. Bayer, U., Moser, A., Kruegel, C., Kirda, E.: Dynamic analysis of malicious code. Journal in Computer Virology 2, 67–77 (2006)

    Article  Google Scholar 

  7. Burges, C.: A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining 2(2), 121–167 (1998)

    Article  Google Scholar 

  8. Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium, p. 12(2003)

    Google Scholar 

  9. Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) (2007)

    Google Scholar 

  10. Christodorescu, M., Jha, S., Seshia, S.A., Song, D.X., Bryant, R.E.: Semantics-aware malware detection. In: IEEE Symposium on Security and Privacy, pp. 32–46 (2005)

    Google Scholar 

  11. Egele, M., Kruegel, C., Kirda, E., Yin, H., Song, D.: Dynamic spyware analysis. In: Proceedings of USENIX Annual Technical Conference (June 2007)

    Google Scholar 

  12. Flake, H.: Structural comparison of executable objects. In: Proceedings of Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2004) (2004)

    Google Scholar 

  13. Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., Lee, W.: Polymorphic blending attacks. In: Proceedings of the 15th USENIX Security Symposium, pp. 241–256 (2006)

    Google Scholar 

  14. Hunt, G.C., Brubacker, D.: Detours: Binary interception of Win32 functions. In: Proceedings of the 3rd USENIX Windows NT Symposium, pp. 135–143 (1999)

    Google Scholar 

  15. Jiang, X., Xu, D.: Collapsar: A VM-based architecture for network attack detention center. In: Proceedings of the 13th USENIX Security Symposium (2004)

    Google Scholar 

  16. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142. Springer, Heidelberg (1998)

    Google Scholar 

  17. Joachims, T.: Learning to Classify Text using Support Vector Machines. Kluwer Academic Publishers, Dordrecht (2002)

    Google Scholar 

  18. Karim, M., Walenstein, A., Lakhotia, A., Laxmi, P.: Malware phylogeny generation using permutations of code. Journal in Computer Virology 1(1–2), 13–23 (2005)

    Article  Google Scholar 

  19. Kirda, E., Kruegel, C., Banks, G., Vigna, G., Kemmerer, R.A.: Behavior-based spyware detection. In: Proceedings of the 15th USENIX Security Symposium, p. 19 (2006)

    Google Scholar 

  20. Kolter, J., Maloof, M.: Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7, 2721–2744 (2006)

    MathSciNet  Google Scholar 

  21. Kruegel, C., Robertson, W., Vigna, G.: Detecting kernel-level rootkits through binary analysis. In: Proceedings of the 20th Annual Computer Security Applications Conference (ACSAC) (2004)

    Google Scholar 

  22. Lee, T., Mody, J.J.: Behavioral classification. In: Proceedings of EICAR 2006 (April 2006)

    Google Scholar 

  23. Leita, C., Dacier, M., Massicotte, F.: Automatic Handling of Protocol Dependencies and Reaction to 0-Day Attacks with ScriptGen Based Honeypots. In: Zamboni, D., Krügel, C. (eds.) RAID 2006. LNCS, vol. 4219. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  24. Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: Proceedings of 2007 IEEE Symposium on Security and Privacy (2007)

    Google Scholar 

  25. Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC) (to appear, 2007)

    Google Scholar 

  26. Norman. Norman sandbox information center (accessed, 2007), http://sandbox.norman.no/

  27. Platt, J.: Probabilistic outputs for Support Vector Machines and comparison to regularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. MIT Press, Cambridge (2001)

    Google Scholar 

  28. Pouget, F., Dacier, M., Pham, V.H.: Leurre.com: on the advantages of deploying a large scale distributed honeypot platform. In: ECCE 2005, E-Crime and Computer Conference, March 29-30, Monaco (March 2005)

    Google Scholar 

  29. Rieck, K., Laskov, P.: Linear-time computation of similarity measures for sequential data. Journal of Machine Learning Research 9, 23–48 (2008)

    Google Scholar 

  30. Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  31. Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  32. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  33. Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. Journal of Maching Learning Research 7, 1531–1565 (2006)

    Google Scholar 

  34. Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley, Reading (2005)

    Google Scholar 

  35. Vapnik, V.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)

    MATH  Google Scholar 

  36. Virus Bulletin. AVK tops latest AV-Test charts (August 2007), http://www.virusbtn.com/news/2007/08_22a.xml

  37. Vrable, M., Ma, J., Chen, J., Moore, D., Vandekieft, E., Snoeren, A.C., Voelker, G.M., Savage, S.: Scalability, fidelity, and containment in the potemkin virtual honeyfarm. SIGOPS Oper. Syst. Rev. 39(5), 148–162 (2005)

    Article  Google Scholar 

  38. Wagner, D., Soto, P.: Mimicry attacks on host based intrusion detection systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS 2002), pp. 255–264 (2002)

    Google Scholar 

  39. Willems, C., Holz, T., Freiling, F.: CWSandbox: Towards automated dynamic binary analysis. IEEE Security and Privacy 5(2) (2007)

    Google Scholar 

  40. Yin, H., Song, D., Egele, M., Kruegel, C., Kirda, E.: Panorama: Capturing system-wide information flow for malware detection and analysis. In: Proceedings of ACM Conference on Computer and Communication Security (October 2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Diego Zamboni

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P. (2008). Learning and Classification of Malware Behavior. In: Zamboni, D. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2008. Lecture Notes in Computer Science, vol 5137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70542-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70542-0_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70541-3

  • Online ISBN: 978-3-540-70542-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics