Skip to main content
Log in

A boosting ensemble for the recognition of code sharing in malware

  • Original Paper
  • Published:
Journal in Computer Virology Aims and scope Submit manuscript

Abstract

Research and development efforts have recently started to compare malware variants, as it is believed that malware authors are reusing code. A number of these projects have focused on identifying functions through the use of signature-based classifiers. We introduce three new classifiers that characterize a function’s use of global data. Experiments on malware show that we can meaningfully correlate functions on the basis of their global data references even when their functions share little code. We also present an algorithm that combines existing classifiers and our new ones into an ensemble for correlating functions in two binary programs. For testing, we developed a model for comparing our work to previous signature based classifiers. We then used that model to show how our new combined ensemble classifier dominates the previously reported classifiers. The resulting ensemble can be used by malware analysts when they are comparing two binaries. This technique will allow them to correlate both functions and global data references between the two and will lead to a quick identification of any sharing that is occurring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Rozinov, K.: Reverse code engineering: an in-depth analysis of the Bagle virus. In: Systems, Man and Cybernetics (SMC) Information Assurance Workshop, 2005. Proceedings from the Sixth Annual IEEE, pp. 380–387 (2005)

  2. Gordon, J.: Lessons from virus developers: The beagle worm history through April 24 (2004)

  3. Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Rapid Advances in Intrusion Detection (RAID), pp. 207–226 (2005)

  4. Schulman, A.: Finding binary clones with opstrings and function digests. Dr. Dobbs 374, 375, 376, 69–73, 56–61, 64–70 (2005)

  5. Dullien, T., Rolles, R.: Graph-based comparison of executable objects. In: Proceedings of the Symposium sur la Sécurité des Technologies de lìnformation et des Communications (SSTIC), pp. 421–433 (2005). http://www.sstic.org/

  6. Carrera, E., Erdélyi, G.: Digital genome mapping—advanced binary malware analysis. Virus Bulletin Conference, pp. 187–197 (2004)

  7. Flake, H.: Structural comparison of executable objects. In: Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment, pp. 161–174 (2004)

  8. Sabin, T.: Comparing binaries using bindview. Technical report, Sabre (2004)

  9. Karim, E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1(1–2), 13–23 (2005)

    Article  Google Scholar 

  10. Filiol, E.: Malware pattern scanning schemes secure against black-box analysis. J. Comput. Virol. 2(1), 35–50 (2006) EICAR 2006 Special Issue

    Article  Google Scholar 

  11. Spinellis, D.: Reliable identification of bounded length viruses is NP-complete. IEEE Transactions on Information Theory, pp. 280–284 (2003)

  12. Matching global data references in related executables (2007)

  13. Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signatures for polymorphic worms. In: SP’05: Proceedings of the 2005 IEEE Symposium on Security and Privacy, Washington, DC, USA, IEEE Computer Society, pp. 226–241 (2005)

  14. Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of the 10th ACM Conference on Computer and Communication Security, pp. 290–299 (2003)

  15. Wehner, S.: Analyzing worms and network traffic using compression. J. Comput. Secur. 15(3), 303–320 (2007)

    Google Scholar 

  16. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, Berlin (1997)

    MATH  Google Scholar 

  17. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156 (1996)

  18. Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)

    Article  Google Scholar 

  19. Capriono, G.: REC Reverse Engineering Compiler, Version 1.6 (2000)

  20. Toran, J.: On the hardness of graph isomorphism. SIAM J. Comput. 33(5), 1093–1108 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  21. Vx heavens website (2006)

  22. Labs, R.: Rar Compression Homepage (2006)

  23. UPX: Upx Homepage (2007)

  24. Filiol, E., Helenius, M., Zanero, S.: Open problems in computer virology. J. Comput. Virol. 1(3–4), 55–66 (2006)

    Article  Google Scholar 

  25. Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453 (1998)

  26. Kuittinen, M., Sajaniemi, J.: Teaching roles of variables in elementary programming courses. In: ITiCSE’04: Proceedings of the 9th annual SIGCSE conference on Innovation and technology in computer science education, New York, NY, USA, pp. 57–61. ACM Press, New York (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stanley J. Barr.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barr, S.J., Cardman, S.J. & Martin, D.M. A boosting ensemble for the recognition of code sharing in malware. J Comput Virol 4, 335–345 (2008). https://doi.org/10.1007/s11416-008-0087-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-008-0087-z

Keywords

Navigation