ABSTRACT
Different Java compilers and compiler versions, e.g., javac or ecj, produce different bytecode from the same source code. This makes it hard to trace if the bytecode of an open-source library really matches the provided source code. Moreover, it prevents one from detecting which open-source libraries have been re-compiled and rebundled into a single jar, which is a common way to distribute an application. Such rebundling is problematic because it prevents one to check if the jar file contains open-source libraries with known vulnerabilities. To cope with these problems, we propose the tool SootDiff that uses Soot's intermediate representation Jimple, in combination with code clone detection techniques, to reduce dissimilarities introduced by different compilers, and to identify clones. Our results show that SootDiff successfully identifies clones in 102 of 144 cases, whereas bytecode comparison succeeds in 58 cases only.
- Brenda S. Baker and Udi Manber. 1998. Deducing Similarities in Java Sources from Bytecodes. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC ’98) . USENIX Association, Berkeley, CA, USA, 15–15. http://dl.acm.org/citation.cfm?id=1268256. 1268271 Google ScholarDigital Library
- V. Bauer, L. Heinemann, and F. Deissenboeck. 2012. A structured approach to assess third-party library usage. In 2012 28th IEEE International Conference on Software Maintenance (ICSM) . 483–492. Google ScholarDigital Library
- Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2015. How the Apache community upgrades dependencies: an evolutionary study. Empirical Software Engineering 20, 5 (oct 2015), 1275–1317. Google ScholarDigital Library
- Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant’Anna, and Lorraine Bier. 1998. Clone Detection Using Abstract Syntax Trees. In Proceedings of the International Conference on Software Maintenance (ICSM ’98) . IEEE Computer Society, Washington, DC, USA, 368–. Google ScholarDigital Library
- Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini. 2017. CodeMatch: Obfuscation Won’T Conceal Your Repackaged App. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017) . ACM, New York, NY, USA, 638–648. Google ScholarDigital Library
- Lars Heinemann, Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel, and Maximilian Irlbeck. 2011. On the Extent and Nature of Software Reuse in Open Source Java Projects. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Klaus Schmid (Ed.). Vol. 6727 LNCS. Springer, Berlin, Heidelberg, Berlin, Heidelberg, 207– 222. Google ScholarDigital Library
- Prof. Robert H. (Bob) Sloan University Illion. {n.d.}. Java Example Program. Retrieved 2019-03-16 from https://www.cs.uic.edu/~sloan/ CLASSES/java/Google Scholar
- J. Howard Johnson. 1994. Substring matching for clone detection and change tracking. In Proceedings International Conference on Software Maintenance ICSM-94 . IEEE Comput. Soc. Press, 120–126. Google ScholarDigital Library
- T. Kamiya, S. Kusumoto, and K. Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (jul 2002), 654–670. Google ScholarDigital Library
- Rainer Koschke. 2007. Survey of Research on Software Clones. In Duplication, Redundancy, and Similarity in Software (Dagstuhl Seminar Proceedings), Rainer Koschke, Ettore Merlo, and Andrew Walenstein (Eds.). Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, Dagstuhl, Germany. http://drops.dagstuhl.de/opus/volltexte/2007/962Google Scholar
- Raula Gaikovina Kula, Daniel M. German, Ali Ouni, Takashi Ishio, and Katsuro Inoue. 2018. Do Developers Update Their Library Dependencies? Empirical Softw. Engg. 23, 1 (Feb. 2018), 384–417. Google ScholarDigital Library
- Patrick Lam, Eric Bodden, Ondrej Lhotak, and Laurie Hendren. 2011. The Soot framework for Java program analysis: a retrospective. Cetus ’11 October 2011 (2011). https://sable.github.io/soot/resources/ lblh11soot.pdfGoogle Scholar
- Mayrand, Leblanc, and Merlo. 1996. Experiment on the automatic detection of function clones in a software system using metrics. In Proceedings of International Conference on Software Maintenance ICSM-96 . IEEE, 244–253. Google ScholarDigital Library
- Eugene W. Myers. 1986. An O(ND) difference algorithm and its variations. Algorithmica 1, 1-4 (nov 1986), 251–266.Google ScholarDigital Library
- Oracle Corporation. {n.d.}. The Java programming language Compiler Group. Retrieved 2019-03-16 from http://openjdk.java.net/groups/ compiler/Google Scholar
- Chaiyong Ragkhitwetsagul, Jens Krinke, and David Clark. 2018. A Comparison of Code Similarity Analysers. Empirical Softw. Engg. 23, 4 (aug 2018), 2464–2519. Google ScholarDigital Library
- Gehan M.K. Selim, King Chun Foo, and Ying Zou. 2010. Enhancing Source-Based Clone Detection Using Intermediate Representation. In 2010 17th Working Conference on Reverse Engineering . IEEE, 227–236.Google Scholar
- Raja Vallée-Rai and Laurie Hendren. 1998. Jimple: Simplifying Java Bytecode for Analyses and Transformations . Technical Report. McGill University, Montreal, Canada. 1–15 pages. http://www.sable.mcgill. ca/publications/techreports/sable-tr-1998-4.psGoogle Scholar
Index Terms
- SootDiff: bytecode comparison across different Java compilers
Recommendations
Decompiling Java Bytecode: Problems, Traps and Pitfalls
CC '02: Proceedings of the 11th International Conference on Compiler ConstructionJava virtual machines execute Java bytecode instructions. Since this bytecode is a higher level representation than traditional object code, it is possible to decompile it back to Java source. Many such decompilers have been developed and the ...
From CIL to Java bytecode: Semantics-based translation for static analysis leveraging
Highlights- A formal translation of CIL (.Net) bytecode into Java bytecode is introduced and proved sound w.r.t. the language semantics
AbstractA formal translation of CIL (i.e., .Net) bytecode into Java bytecode is introduced and proved sound with respect to the language semantics. The resulting code is then analyzed with Julia, an industrial static analyzer of Java bytecode. ...
Soot - a Java bytecode optimization framework
CASCON '99: Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative researchThis paper presents Soot, a framework for optimizing Java bytecode. The framework is implemented in Java and supports three intermediate representations for representing Java bytecode: Baf, a streamlined representation of bytecode which is simple to ...
Comments