skip to main content
10.1145/2837614.2837637acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article

Overhauling SC atomics in C11 and OpenCL

Published:11 January 2016Publication History

ABSTRACT

Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A consequence of our simplification is that the SC operations in an execution no longer need to be totally ordered. This relaxation enables, for the first time, efficient and exhaustive simulation of litmus tests that use SC atomics. We extend our improved C11 model to obtain the first rigorous memory model formalisation for OpenCL (which extends C11 with support for heterogeneous many-core programming). In the OpenCL setting, we refine the SC axioms still further to give a sensible semantics to SC operations that employ a ‘memory scope’ to restrict their visibility to specific threads. Our overhaul requires slight strengthenings of both the C11 and the OpenCL memory models, causing some behaviours to become disallowed. We argue that these strengthenings are natural, and that all of the formalised C11 and OpenCL compilation schemes of which we are aware (Power and x86 CPUs for C11, AMD GPUs for OpenCL) remain valid in our revised models. Using the HERD memory model simulator, we show that our overhaul leads to an exponential improvement in simulation time for C11 litmus tests compared with the original model, making *exhaustive* simulation competitive, time-wise, with the *non-exhaustive* CDSChecker tool.

References

  1. J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Fences in weak memory models. In CAV, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: modelling, simulation, testing, and data-mining for weak memory. TOPLAS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: weak behaviours and programming assumptions. In ASPLOS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. AMD Developer Central. AMD APP SDK 3.0 released, featuring OpenCL 2.0, 2015. URL http: //developer.amd.com/community/blog/2015/08/ 26/introducing-app-sdk-30-opencl-2/.Google ScholarGoogle Scholar
  5. E. Bardsley and A. F. Donaldson. Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In NASA Formal Methods, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Batty. The C11 and C++11 Concurrency Model. PhD thesis, University of Cambridge, October 2014.Google ScholarGoogle Scholar
  7. M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++ concurrency. In POPL, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell. Clarifying and compiling C/C++ concurrency: from C++11 to POWER. In POPL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Batty, M. Dodds, and A. Gotsman. Library abstraction for C/C++ concurrency. In POPL, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Batty, K. Memarian, K. Nienhuis, J. Pichon-Pharabod, and P. Sewell. The problem of programming language concurrency semantics. In ESOP, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Batty, A. F. Donaldson, and J. Wickerson. Overhauling SC atomics in C11 and OpenCL – companion webpage, 2016. URL http://multicore.doc.ic.ac.uk/overhauling.Google ScholarGoogle Scholar
  12. A. Betts, N. Chong, A. F. Donaldson, J. Ketema, S. Qadeer, P. Thomson, and J. Wickerson. The design and implementation of a verification technique for GPU kernels. TOPLAS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. C. Blanchette, T. Weber, M. Batty, S. Owens, and S. Sarkar. Nitpicking C++ concurrency. In PPDP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. W. Dijkstra. Cooperating sequential processes (1965).Google ScholarGoogle Scholar
  15. In P. Brinch Hansen, editor, The Origin of Concurrent Programming, pages 65–138. Springer, 2002.Google ScholarGoogle Scholar
  16. C. Flanagan and P. Godefroid. Dynamic partial-order reduction for model checking software. In POPL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Flur, K. E. Gray, C. Pulte, S. Sarkar, A. Sezgin, L. Maranget, W. Deacon, and P. Sewell. Modelling the ARMv8 architecture, operationally: Concurrency and ISA. In POPL, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. R. Gaster, D. R. Hower, and L. Howes. HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous memory models. ACM Transactions on Architecture and Code Optimization, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. R. Hower, B. M. Beckmann, B. R. Gaster, B. A. Hechtman, M. D. Hill, S. K. Reinhardt, and D. A. Wood. Adapting data-race-free memory consistency for heterogeneous systems. In ASPLOS, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Intel Developer Zone. OpenCL 2.0 is here!, 2014. URL https:// software.intel.com/en-us/forums/opencl/topic/531074.Google ScholarGoogle Scholar
  21. ISO/IEC. Programming languages – C++. International standard 14882:2011, 2011.Google ScholarGoogle Scholar
  22. ISO/IEC. Programming languages – C. International standard 9899:2011, 2011.Google ScholarGoogle Scholar
  23. ISO/IEC. Programming languages – C++. International standard 14882:2014, 2014.Google ScholarGoogle Scholar
  24. Khronos Group. The OpenCL Specification. Version 2.1, Revision 8, 2015.Google ScholarGoogle Scholar
  25. Khronos Group News Archives. Freescale to spark innovation and open development for autonomous driving systems with OpenCL, 2014. URL https://www.khronos.org/news/archives/2014/11.Google ScholarGoogle Scholar
  26. O. Lahav, N. Giannarakis, and V. Vafeiadis. Taming release-acquire consistency. In POPL, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9), 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Morriset, P. Pawan, and F. Zappa Nardelli. Compiler testing via a theory of sound optimisations in the C11/C++11 memory model. InGoogle ScholarGoogle Scholar
  29. R. Morriset, P. Pawan, and F. Zappa Nardelli. Compiler testing via a theory of sound optimisations in the C11/C++11 memory model. In PLDI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. P. Mulligan, S. Owens, K. E. Gray, T. Ridge, and P. Sewell. Lem: reusable engineering of real-world semantics. In ICFP, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Norris and B. Demsky. CDSChecker: Checking concurrent data structures written with C/C++ atomics. In OOPSLA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. S. Orr, S. Che, A. Yilmazer, B. M. Beckmann, M. D. Hill, and D. A. Wood. Synchronization using remote-scope promotion. In ASPLOS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In TPHOLs, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. TOPLAS, 10(2), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Steuwer and S. Gorlatch. High-level programming for medical imaging on multi-GPU systems using the SkelCL library. In ICCS, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  37. J. M. Stone and R. P. Fitzgerald. Storage in the PowerPC. In IEEE Micro, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Tarski. On the calculus of relations. Journal of Symbolic Logic, 6 (3):73–89, 1941.Google ScholarGoogle ScholarCross RefCross Ref
  39. A. Turon, V. Vafeiadis, and D. Dreyer. GPS: Navigating weak memory with ghosts, protocols, and separation. In OOPSLA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. V. Vafeiadis and C. Narayan. Relaxed separation logic: A program logic for C11 concurrency. In OOPSLA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. V. Vafeiadis, T. Balabonski, S. Chakraborty, R. Morisset, and F. Zappa Nardelli. Common compiler optimisations are invalid in the C11 memory model and what we can do about it. In POPL, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Ševˇcík and D. Aspinall. On validity of program transformations in the Java memory model. In ECOOP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Wickerson, M. Batty, A. F. Donaldson, and B. M. Beckmann. Remotescope promotion: clarified, rectified, and verified. In OOPSLA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Williams. C++ Concurrency in Action. Manning, 2012.Google ScholarGoogle Scholar

Index Terms

  1. Overhauling SC atomics in C11 and OpenCL

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
            January 2016
            815 pages
            ISBN:9781450335492
            DOI:10.1145/2837614
            • cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 51, Issue 1
              POPL '16
              January 2016
              815 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/2914770
              • Editor:
              • Andy Gill
              Issue’s Table of Contents

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 January 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate824of4,130submissions,20%

            Upcoming Conference

            POPL '25

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader