Abstract
Research and development efforts have recently started to compare malware variants, as it is believed that malware authors are reusing code. A number of these projects have focused on identifying functions through the use of signature-based classifiers. We introduce three new classifiers that characterize a function’s use of global data. Experiments on malware show that we can meaningfully correlate functions on the basis of their global data references even when their functions share little code. We also present an algorithm that combines existing classifiers and our new ones into an ensemble for correlating functions in two binary programs. For testing, we developed a model for comparing our work to previous signature based classifiers. We then used that model to show how our new combined ensemble classifier dominates the previously reported classifiers. The resulting ensemble can be used by malware analysts when they are comparing two binaries. This technique will allow them to correlate both functions and global data references between the two and will lead to a quick identification of any sharing that is occurring.
Similar content being viewed by others
References
Rozinov, K.: Reverse code engineering: an in-depth analysis of the Bagle virus. In: Systems, Man and Cybernetics (SMC) Information Assurance Workshop, 2005. Proceedings from the Sixth Annual IEEE, pp. 380–387 (2005)
Gordon, J.: Lessons from virus developers: The beagle worm history through April 24 (2004)
Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Rapid Advances in Intrusion Detection (RAID), pp. 207–226 (2005)
Schulman, A.: Finding binary clones with opstrings and function digests. Dr. Dobbs 374, 375, 376, 69–73, 56–61, 64–70 (2005)
Dullien, T., Rolles, R.: Graph-based comparison of executable objects. In: Proceedings of the Symposium sur la Sécurité des Technologies de lìnformation et des Communications (SSTIC), pp. 421–433 (2005). http://www.sstic.org/
Carrera, E., Erdélyi, G.: Digital genome mapping—advanced binary malware analysis. Virus Bulletin Conference, pp. 187–197 (2004)
Flake, H.: Structural comparison of executable objects. In: Proceedings of the Conference on Detection of Intrusions and Malware & Vulnerability Assessment, pp. 161–174 (2004)
Sabin, T.: Comparing binaries using bindview. Technical report, Sabre (2004)
Karim, E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1(1–2), 13–23 (2005)
Filiol, E.: Malware pattern scanning schemes secure against black-box analysis. J. Comput. Virol. 2(1), 35–50 (2006) EICAR 2006 Special Issue
Spinellis, D.: Reliable identification of bounded length viruses is NP-complete. IEEE Transactions on Information Theory, pp. 280–284 (2003)
Matching global data references in related executables (2007)
Newsome, J., Karp, B., Song, D.: Polygraph: Automatically generating signatures for polymorphic worms. In: SP’05: Proceedings of the 2005 IEEE Symposium on Security and Privacy, Washington, DC, USA, IEEE Computer Society, pp. 226–241 (2005)
Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of the 10th ACM Conference on Computer and Communication Security, pp. 290–299 (2003)
Wehner, S.: Analyzing worms and network traffic using compression. J. Comput. Secur. 15(3), 303–320 (2007)
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, Berlin (1997)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156 (1996)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)
Capriono, G.: REC Reverse Engineering Compiler, Version 1.6 (2000)
Toran, J.: On the hardness of graph isomorphism. SIAM J. Comput. 33(5), 1093–1108 (2004)
Vx heavens website (2006)
Labs, R.: Rar Compression Homepage (2006)
UPX: Upx Homepage (2007)
Filiol, E., Helenius, M., Zanero, S.: Open problems in computer virology. J. Comput. Virol. 1(3–4), 55–66 (2006)
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453 (1998)
Kuittinen, M., Sajaniemi, J.: Teaching roles of variables in elementary programming courses. In: ITiCSE’04: Proceedings of the 9th annual SIGCSE conference on Innovation and technology in computer science education, New York, NY, USA, pp. 57–61. ACM Press, New York (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barr, S.J., Cardman, S.J. & Martin, D.M. A boosting ensemble for the recognition of code sharing in malware. J Comput Virol 4, 335–345 (2008). https://doi.org/10.1007/s11416-008-0087-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-008-0087-z