ABSTRACT
We present a ``negative'' semantics of the C11 language---a semantics that does not just give meaning to correct programs, but also rejects undefined programs. We investigate undefined behavior in C and discuss the techniques and special considerations needed for formally specifying it. We have used these techniques to modify and extend a semantics of C into one that captures undefined behavior. The amount of semantic infrastructure and effort required to achieve this was unexpectedly high, in the end nearly doubling the size of the original semantics. From our semantics, we have automatically extracted an undefinedness checker, which we evaluate against other popular analysis tools, using our own test suite in addition to a third-party test suite. Our checker is capable of detecting examples of all 77 categories of core language undefinedness appearing in the C11 standard, more than any other tool we considered. Based on this evaluation, we argue that our work is the most comprehensive and complete semantic treatment of undefined behavior in C, and thus of the C language itself.
- S. Blazy and X. Leroy. Mechanized semantics for the Clight subset of the C language. Journal of Automated Reasoning, 43(3):263–288, 2009. URL http://dx.doi.org/10.1007/s10817-009-9148-3.Google ScholarCross Ref
- B. Campbell. An executable semantics for CompCert C. In Certified Programs and Proofs, volume 7679 of Lecture Notes in Computer Science, pages 60–75. Springer, 2012. URL http://dx.doi.org/ 10.1007/978-3-642-35308-6_8. Google ScholarDigital Library
- G. Canet, P. Cuoq, and B. Monate. A value analysis for C programs. In Conf. on Source Code Analysis and Manipulation (SCAM’09), pages 123–124. IEEE, 2009. URL http://dx.doi.org/10.1109/SCAM. 2009.22. Google ScholarDigital Library
- P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, D. Monniaux, and X. Rival. The ASTRÉE analyzer. In Programming Languages and Systems, volume 3444 of Lecture Notes in Computer Science, pages 21– 30. Springer Berlin Heidelberg, 2005. URL http://dx.doi.org/10. 1007/978-3-540-31987-0_3.Google Scholar
- C. Ellison. A Formal Semantics of C with Applications. PhD thesis, University of Illinois, July 2012. URL http://hdl.handle.net/ 2142/34297.Google Scholar
- C. Ellison and G. Ros, u. An executable formal semantics of C with applications. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’12), pages 533–544, 2012. URL http://dx.doi.org/10.1145/2103656.2103719. Google ScholarDigital Library
- ISO /IEC JTC 1, SC 22, WG 14. Rationale for international standard— programming languages—C. Technical Report 5.10, Intl. Org. for Standardization, 2003. URL http://www.open-std.org/jtc1/ sc22/wg14/www/C99RationaleV5.10.pdf.Google Scholar
- ISO /IEC JTC 1, SC 22, WG 14. Defect report #260. Technical report, 2004. URL http://www.open-std.org/jtc1/sc22/wg14/www/ docs/dr_260.htm.Google Scholar
- ISO /IEC JTC 1, SC 22, WG 14. ISO/IEC 9899:2011: Programming languages—C. Technical report, Intl. Org. for Standardization, 2012.Google Scholar
- ISO /IEC JTC 1, SC 22, WG 14. ISO/IEC TS 17961:2013 C secure coding rules. Technical report, Intl. Org. for Standardization, 2013.Google Scholar
- T. Jim, J. G. Morrisett, D. Grossman, M. W. Hicks, J. Cheney, and Y. Wang. Cyclone: A safe dialect of C. In USENIX Annual Technical Conference (ATEC’02), pages 275–288. USENIX Association, 2002. URL http://dl.acm.org/citation.cfm?id=647057.713871. Google ScholarDigital Library
- R. Krebbers. Aliasing restrictions of C11 formalized in Coq. In Certified Programs and Proofs, volume 8307 of Lecture Notes in Computer Science, pages 50–65. Springer, 2013. URL http://dx. doi.org/10.1007/978-3-319-03545-1_4. Google ScholarDigital Library
- R. Krebbers. An operational and axiomatic semantics for nondeterminism and sequence points in C. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’14), pages 101–112. ACM, 2014. URL http://dx.doi.org/10.1145/ 2535838.2535878. Google ScholarDigital Library
- C. Lattner. What every C programmer should know about undefined behavior, 2011. URL http://blog.llvm.org/2011/05/ what-every-c-programmer-should-know.html.Google Scholar
- X. Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107–115, 2009. URL http://dx.doi.org/10. 1145/1538788.1538814. Google ScholarDigital Library
- X. Leroy. The CompCert C verified compiler: Documentation and user’s manual, version 2.3. Technical report, INRIA Paris-Rocquencourt, 2014.Google Scholar
- J. Meseguer. Conditional rewriting logic as a unified model of concurrency. Theoretical Computer Science, 96(1):73–155, 1992. URL http://dx.doi.org/10.1016/0304-3975(92)90182-F. Google ScholarDigital Library
- MISRA. MISRA-C: 2004—Guidelines for the use of the C language in critical systems. Technical report, MIRA Ltd., 2004.Google Scholar
- MITRE. The common weakness enumeration (CWE) initiative, 2012. URL http://cwe.mitre.org/.Google Scholar
- T. Nagel. Troubles with GCC signed integer overflow optimization, 2010. URL http://thiemonagel.de/2010/01/ signed-integer-overflow/.Google Scholar
- G. C. Necula, S. McPeak, and W. Weimer. CCured: type-safe retrofitting of legacy code. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02), pages 128– 139. ACM, 2002. URL http://dx.doi.org/10.1145/503272. Google ScholarDigital Library
- 503286.Google Scholar
- N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07), pages 89–100. ACM, 2007. URL http://dx.doi.org/10.1145/ 1250734.1250746. Google ScholarDigital Library
- NIST. Juliet test suite for C /C ++, version 1.0, 2010. URL http: //samate.nist.gov/SRD/testsuite.php.Google Scholar
- M. Norrish. C formalised in HOL. Technical Report UCAM-CL-TR- 453, University of Cambridge, 1998.Google Scholar
- N. S. Papaspyrou. Denotational semantics of ANSI C. Computer Standards and Interfaces, 23(3):169–185, 2001. Google ScholarDigital Library
- J. Regehr. A guide to undefined behavior in C and C ++, 2010. URL http://blog.regehr.org/archives/213.Google Scholar
- G. Ros, u and T. F. S, erbănut, ă. An overview of the K semantic framework. J. Logic and Algebraic Programming, 79(6):397–434, 2010. URL http://dx.doi.org/10.1016/j.jlap.2010.03.012.Google ScholarCross Ref
- G. Ros, u, W. Schulte, and T. F. S, erbănut, ă. Runtime verification of C memory safety. In Runtime Verification (RV’09), volume 5779, pages 132–152. Springer, 2009. URL http://dx.doi.org/10.1007/ 978-3-642-04694-0_10.Google Scholar
- R. C. Seacord. The CERT C Coding Standard, Second Edition: 98 Rules for Developing Safe, Reliable, and Secure Systems. 2014. Google ScholarDigital Library
- X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In ACM Symposium on Operating Systems Principles (SOSP’13), pages 260–275. ACM, 2013. URL http://dx.doi.org/ 10.1145/2517349.2522728. Google ScholarDigital Library
- Introduction Undefined Behavior A Semantics for Catching Undefinedness Expressions Type Modifiers The Translation Phase Memory Model Pointer Provenance Evaluation Future Work and ConclusionGoogle Scholar
Index Terms
- Defining the undefinedness of C
Recommendations
Defining the undefinedness of C
PLDI '15We present a ``negative'' semantics of the C11 language---a semantics that does not just give meaning to correct programs, but also rejects undefined programs. We investigate undefined behavior in C and discuss the techniques and special considerations ...
Equivalence of formal semantics definition methods
AbstractThere are numerous methods of formally defining the semantics of computer languages. Each method has been designed to fulfil a different purpose. For example, some have been designed to make reasoning about languages as easy as possible; others ...
Validating Formal Semantics by Property-Based Cross-Testing
IFL '20: Proceedings of the 32nd Symposium on Implementation and Application of Functional LanguagesTo describe the behaviour of programs in a programming language we can define a formal semantics for the language, and formalise it in a proof assistant. From this semantics we can derive the behaviour of each particular program in the language. But ...
Comments