Abstract
Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.
- ACE Associated Computer Experts. SuperTest C/C+ compiler test and validation suite. http://www.ace.nl/compiler/supertest.htmlGoogle Scholar
- F. Bellard. TCC: Tiny C compiler, ver. 0.9.25, May 2009. http://bellard.org/tcc/.Google Scholar
- C. L. Biffle. Undefined behavior in Google NaCl, Jan. 2010. http://code.google.com/p/nativeclient/issues/detail?id=245.Google Scholar
- A. S. Boujarwah and K. Saleh. Compiler test case generation methods: a survey and assessment. Information and Software Technology, 39(9):617--625, 1997.Google ScholarCross Ref
- C. J. Burgess and M. Saidi. The automatic generation of test cases for optimizing Fortran compilers. Information and Software Technology, 38(2):111--119, 1996.Google ScholarCross Ref
- E. Eide and J. Regehr. Volatiles are miscompiled, and what to do about it. In Proc. EMSOFT, pages 255--264, Oct. 2008. Google ScholarDigital Library
- X. Feng and A. J. Hu. Cutpoints for formal equivalence verification of embedded software. In Proc. EMSOFT, pages 307--316, Sept. 2005. Google ScholarDigital Library
- P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based whitebox fuzzing. In Proc. PLDI, pages 206--215, June 2008. Google ScholarDigital Library
- R. Hamlet. Random testing. In J. Marciniak, editor, Encyclopedia of Software Engineering. Wiley, second edition, 2001.Google Scholar
- K. V. Hanford. Automatic generation of test cases. IBM Systems Journal, 9(4):242--257, Dec. 1970. Google ScholarDigital Library
- International Organization for Standardization. ISO/IEC 9899:TC2: Programming Languages-C, May 2005. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf.Google Scholar
- G. Klein et al. seL4: Formal verification of an OS kernel. In Proc. SOSP, pages 207--220, Oct. 2009. Google ScholarDigital Library
- J. C. Knight and N. G. Leveson. An experimental evaluation of the assumption of independence in multiversion programming. IEEE Trans. Software Eng., 12(1):96--109, Jan. 1986. Google ScholarDigital Library
- X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107--115, July 2009. Google ScholarDigital Library
- C. Lindig. Random testing of C calling conventions. In Proc. AADEBUG, pages 3--12, Sept. 2005. Google ScholarDigital Library
- W. M. McKeeman. Differential testing for software. Digital Technical Journal, 10(1):100--107, Dec. 1998.Google Scholar
- B. P. Miller, L. Fredriksen, and B. So. An empirical study of the reliability of UNIX utilities. Commun. ACM, 33(12):32--44, Dec. 1990. Google ScholarDigital Library
- G. Misherghi and Z. Su. HDD: Hierarchical delta debugging. In Proc. ICSE, pages 142--151, May 2006. Google ScholarDigital Library
- Perennial, Inc. ACVS ANSI/ISO/FIPS-160 C validation suite, ver. 4.5, Jan. 1998. http://www.peren.com/pages/acvs_set.htm.Google Scholar
- Plum Hall, Inc. The Plum Hall validation suite for C. http://www.plumhall.com/stec.html.Google Scholar
- P. Purdom. A sentence generator for testing parsers. BIT Numerical Mathematics, 12(3):366--375, 1972.Google ScholarDigital Library
- R. L. Sauder. A general test data generator for COBOL. In AFIPS Joint Computer Conferences, pages 317--323, May 1962. Google ScholarDigital Library
- F. Sheridan. Practical testing of a C99 compiler using output comparison. Software--Practice and Experience, 37(14):1475--1488, Nov. 2007. Google ScholarDigital Library
- J. Souyris, V. Wiels, D. Delmas, and H. Delseny. Formal verification of avionics software products. In Proc. FM, pages 532--546, Nov. 2009. Google ScholarDigital Library
- S. Summit. comp.lang.c frequently asked questions. http://c-faq.com/. Google ScholarDigital Library
- Z. Tatlock and S. Lerner. Bringing extensibility to verified compilers. In Proc. PLDI, pages 111--121, June 2010. Google ScholarDigital Library
- B. Turner. Random Program Generator, Jan. 2007. http://sites.google.com/site/brturn2/randomcprogramgenerator.Google Scholar
- B. White et al. An integrated experimental environment for distributed systems and networks. In Proc. OSDI, pages 255--270, Dec. 2002. Google ScholarDigital Library
- D. S. Wilkerson. Delta ver. 2006.08.03, Aug. 2006. http://delta.tigris.org/.Google Scholar
- M. Wolfe. How compilers and tools differ for embedded systems. In Proc. CASES, Sept. 2005. Keynote address. http://www.pgroup.com/lit/articles/pgi_article_cases.pdf. Google ScholarDigital Library
- A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Trans. Software Eng., 28(2):183--200, Feb. 2002. Google ScholarDigital Library
- C. Zhao et al. Automated test program generation for an industrial optimizing compiler. In Proc. ICSE Workshop on Automation of Software Test, pages 36--43, May 2009.Google Scholar
Index Terms
- Finding and understanding bugs in C compilers
Recommendations
Random testing for C and C++ compilers with YARPGen
Compilers should not crash and they should not miscompile applications. Random testing is an effective method for finding compiler bugs that have escaped other kinds of testing. This paper presents Yet Another Random Program Generator (YARPGen), a ...
Finding and understanding bugs in C compilers
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationCompilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to ...
Volatiles are miscompiled, and what to do about it
EMSOFT '08: Proceedings of the 8th ACM international conference on Embedded softwareC's volatile qualifier is intended to provide a reliable link between operations at the source-code level and operations at the memory-system level. We tested thirteen production-quality C compilers and, for each, found situations in which the compiler ...
Comments