skip to main content
10.1145/3315568.3329966acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

SootDiff: bytecode comparison across different Java compilers

Published:22 June 2019Publication History

ABSTRACT

Different Java compilers and compiler versions, e.g., javac or ecj, produce different bytecode from the same source code. This makes it hard to trace if the bytecode of an open-source library really matches the provided source code. Moreover, it prevents one from detecting which open-source libraries have been re-compiled and rebundled into a single jar, which is a common way to distribute an application. Such rebundling is problematic because it prevents one to check if the jar file contains open-source libraries with known vulnerabilities. To cope with these problems, we propose the tool SootDiff that uses Soot's intermediate representation Jimple, in combination with code clone detection techniques, to reduce dissimilarities introduced by different compilers, and to identify clones. Our results show that SootDiff successfully identifies clones in 102 of 144 cases, whereas bytecode comparison succeeds in 58 cases only.

References

  1. Brenda S. Baker and Udi Manber. 1998. Deducing Similarities in Java Sources from Bytecodes. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC ’98) . USENIX Association, Berkeley, CA, USA, 15–15. http://dl.acm.org/citation.cfm?id=1268256. 1268271 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Bauer, L. Heinemann, and F. Deissenboeck. 2012. A structured approach to assess third-party library usage. In 2012 28th IEEE International Conference on Software Maintenance (ICSM) . 483–492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2015. How the Apache community upgrades dependencies: an evolutionary study. Empirical Software Engineering 20, 5 (oct 2015), 1275–1317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ira D. Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant’Anna, and Lorraine Bier. 1998. Clone Detection Using Abstract Syntax Trees. In Proceedings of the International Conference on Software Maintenance (ICSM ’98) . IEEE Computer Society, Washington, DC, USA, 368–. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Leonid Glanz, Sven Amann, Michael Eichberg, Michael Reif, Ben Hermann, Johannes Lerch, and Mira Mezini. 2017. CodeMatch: Obfuscation Won’T Conceal Your Repackaged App. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017) . ACM, New York, NY, USA, 638–648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lars Heinemann, Florian Deissenboeck, Mario Gleirscher, Benjamin Hummel, and Maximilian Irlbeck. 2011. On the Extent and Nature of Software Reuse in Open Source Java Projects. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Klaus Schmid (Ed.). Vol. 6727 LNCS. Springer, Berlin, Heidelberg, Berlin, Heidelberg, 207– 222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Prof. Robert H. (Bob) Sloan University Illion. {n.d.}. Java Example Program. Retrieved 2019-03-16 from https://www.cs.uic.edu/~sloan/ CLASSES/java/Google ScholarGoogle Scholar
  8. J. Howard Johnson. 1994. Substring matching for clone detection and change tracking. In Proceedings International Conference on Software Maintenance ICSM-94 . IEEE Comput. Soc. Press, 120–126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Kamiya, S. Kusumoto, and K. Inoue. 2002. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7 (jul 2002), 654–670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rainer Koschke. 2007. Survey of Research on Software Clones. In Duplication, Redundancy, and Similarity in Software (Dagstuhl Seminar Proceedings), Rainer Koschke, Ettore Merlo, and Andrew Walenstein (Eds.). Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, Dagstuhl, Germany. http://drops.dagstuhl.de/opus/volltexte/2007/962Google ScholarGoogle Scholar
  11. Raula Gaikovina Kula, Daniel M. German, Ali Ouni, Takashi Ishio, and Katsuro Inoue. 2018. Do Developers Update Their Library Dependencies? Empirical Softw. Engg. 23, 1 (Feb. 2018), 384–417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Patrick Lam, Eric Bodden, Ondrej Lhotak, and Laurie Hendren. 2011. The Soot framework for Java program analysis: a retrospective. Cetus ’11 October 2011 (2011). https://sable.github.io/soot/resources/ lblh11soot.pdfGoogle ScholarGoogle Scholar
  13. Mayrand, Leblanc, and Merlo. 1996. Experiment on the automatic detection of function clones in a software system using metrics. In Proceedings of International Conference on Software Maintenance ICSM-96 . IEEE, 244–253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Eugene W. Myers. 1986. An O(ND) difference algorithm and its variations. Algorithmica 1, 1-4 (nov 1986), 251–266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Oracle Corporation. {n.d.}. The Java programming language Compiler Group. Retrieved 2019-03-16 from http://openjdk.java.net/groups/ compiler/Google ScholarGoogle Scholar
  16. Chaiyong Ragkhitwetsagul, Jens Krinke, and David Clark. 2018. A Comparison of Code Similarity Analysers. Empirical Softw. Engg. 23, 4 (aug 2018), 2464–2519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gehan M.K. Selim, King Chun Foo, and Ying Zou. 2010. Enhancing Source-Based Clone Detection Using Intermediate Representation. In 2010 17th Working Conference on Reverse Engineering . IEEE, 227–236.Google ScholarGoogle Scholar
  18. Raja Vallée-Rai and Laurie Hendren. 1998. Jimple: Simplifying Java Bytecode for Analyses and Transformations . Technical Report. McGill University, Montreal, Canada. 1–15 pages. http://www.sable.mcgill. ca/publications/techreports/sable-tr-1998-4.psGoogle ScholarGoogle Scholar

Index Terms

  1. SootDiff: bytecode comparison across different Java compilers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SOAP 2019: Proceedings of the 8th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis
        June 2019
        43 pages
        ISBN:9781450367202
        DOI:10.1145/3315568

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate11of11submissions,100%

        Upcoming Conference

        PLDI '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader