ABSTRACT
Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A consequence of our simplification is that the SC operations in an execution no longer need to be totally ordered. This relaxation enables, for the first time, efficient and exhaustive simulation of litmus tests that use SC atomics. We extend our improved C11 model to obtain the first rigorous memory model formalisation for OpenCL (which extends C11 with support for heterogeneous many-core programming). In the OpenCL setting, we refine the SC axioms still further to give a sensible semantics to SC operations that employ a ‘memory scope’ to restrict their visibility to specific threads. Our overhaul requires slight strengthenings of both the C11 and the OpenCL memory models, causing some behaviours to become disallowed. We argue that these strengthenings are natural, and that all of the formalised C11 and OpenCL compilation schemes of which we are aware (Power and x86 CPUs for C11, AMD GPUs for OpenCL) remain valid in our revised models. Using the HERD memory model simulator, we show that our overhaul leads to an exponential improvement in simulation time for C11 litmus tests compared with the original model, making *exhaustive* simulation competitive, time-wise, with the *non-exhaustive* CDSChecker tool.
- J. Alglave, L. Maranget, S. Sarkar, and P. Sewell. Fences in weak memory models. In CAV, 2010. Google ScholarDigital Library
- J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: modelling, simulation, testing, and data-mining for weak memory. TOPLAS, 2014. Google ScholarDigital Library
- J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: weak behaviours and programming assumptions. In ASPLOS, 2015. Google ScholarDigital Library
- AMD Developer Central. AMD APP SDK 3.0 released, featuring OpenCL 2.0, 2015. URL http: //developer.amd.com/community/blog/2015/08/ 26/introducing-app-sdk-30-opencl-2/.Google Scholar
- E. Bardsley and A. F. Donaldson. Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In NASA Formal Methods, 2014.Google ScholarDigital Library
- M. Batty. The C11 and C++11 Concurrency Model. PhD thesis, University of Cambridge, October 2014.Google Scholar
- M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++ concurrency. In POPL, 2011. Google ScholarDigital Library
- M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell. Clarifying and compiling C/C++ concurrency: from C++11 to POWER. In POPL, 2012. Google ScholarDigital Library
- M. Batty, M. Dodds, and A. Gotsman. Library abstraction for C/C++ concurrency. In POPL, 2013. Google ScholarDigital Library
- M. Batty, K. Memarian, K. Nienhuis, J. Pichon-Pharabod, and P. Sewell. The problem of programming language concurrency semantics. In ESOP, 2015.Google ScholarCross Ref
- M. Batty, A. F. Donaldson, and J. Wickerson. Overhauling SC atomics in C11 and OpenCL – companion webpage, 2016. URL http://multicore.doc.ic.ac.uk/overhauling.Google Scholar
- A. Betts, N. Chong, A. F. Donaldson, J. Ketema, S. Qadeer, P. Thomson, and J. Wickerson. The design and implementation of a verification technique for GPU kernels. TOPLAS, 2015. Google ScholarDigital Library
- J. C. Blanchette, T. Weber, M. Batty, S. Owens, and S. Sarkar. Nitpicking C++ concurrency. In PPDP, 2011. Google ScholarDigital Library
- E. W. Dijkstra. Cooperating sequential processes (1965).Google Scholar
- In P. Brinch Hansen, editor, The Origin of Concurrent Programming, pages 65–138. Springer, 2002.Google Scholar
- C. Flanagan and P. Godefroid. Dynamic partial-order reduction for model checking software. In POPL, 2005. Google ScholarDigital Library
- S. Flur, K. E. Gray, C. Pulte, S. Sarkar, A. Sezgin, L. Maranget, W. Deacon, and P. Sewell. Modelling the ARMv8 architecture, operationally: Concurrency and ISA. In POPL, 2016. Google ScholarDigital Library
- B. R. Gaster, D. R. Hower, and L. Howes. HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous memory models. ACM Transactions on Architecture and Code Optimization, 2015. Google ScholarDigital Library
- D. R. Hower, B. M. Beckmann, B. R. Gaster, B. A. Hechtman, M. D. Hill, S. K. Reinhardt, and D. A. Wood. Adapting data-race-free memory consistency for heterogeneous systems. In ASPLOS, 2014.Google ScholarDigital Library
- Intel Developer Zone. OpenCL 2.0 is here!, 2014. URL https:// software.intel.com/en-us/forums/opencl/topic/531074.Google Scholar
- ISO/IEC. Programming languages – C++. International standard 14882:2011, 2011.Google Scholar
- ISO/IEC. Programming languages – C. International standard 9899:2011, 2011.Google Scholar
- ISO/IEC. Programming languages – C++. International standard 14882:2014, 2014.Google Scholar
- Khronos Group. The OpenCL Specification. Version 2.1, Revision 8, 2015.Google Scholar
- Khronos Group News Archives. Freescale to spark innovation and open development for autonomous driving systems with OpenCL, 2014. URL https://www.khronos.org/news/archives/2014/11.Google Scholar
- O. Lahav, N. Giannarakis, and V. Vafeiadis. Taming release-acquire consistency. In POPL, 2016. Google ScholarDigital Library
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9), 1979. Google ScholarDigital Library
- R. Morriset, P. Pawan, and F. Zappa Nardelli. Compiler testing via a theory of sound optimisations in the C11/C++11 memory model. InGoogle Scholar
- R. Morriset, P. Pawan, and F. Zappa Nardelli. Compiler testing via a theory of sound optimisations in the C11/C++11 memory model. In PLDI, 2013. Google ScholarDigital Library
- D. P. Mulligan, S. Owens, K. E. Gray, T. Ridge, and P. Sewell. Lem: reusable engineering of real-world semantics. In ICFP, 2014. Google ScholarDigital Library
- B. Norris and B. Demsky. CDSChecker: Checking concurrent data structures written with C/C++ atomics. In OOPSLA, 2013. Google ScholarDigital Library
- M. S. Orr, S. Che, A. Yilmazer, B. M. Beckmann, M. D. Hill, and D. A. Wood. Synchronization using remote-scope promotion. In ASPLOS, 2015. Google ScholarDigital Library
- S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In TPHOLs, 2009. Google ScholarDigital Library
- S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWER multiprocessors. In PLDI, 2011. Google ScholarDigital Library
- D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. TOPLAS, 10(2), 1988. Google ScholarDigital Library
- M. Steuwer and S. Gorlatch. High-level programming for medical imaging on multi-GPU systems using the SkelCL library. In ICCS, 2013.Google ScholarCross Ref
- J. M. Stone and R. P. Fitzgerald. Storage in the PowerPC. In IEEE Micro, 1995. Google ScholarDigital Library
- A. Tarski. On the calculus of relations. Journal of Symbolic Logic, 6 (3):73–89, 1941.Google ScholarCross Ref
- A. Turon, V. Vafeiadis, and D. Dreyer. GPS: Navigating weak memory with ghosts, protocols, and separation. In OOPSLA, 2014. Google ScholarDigital Library
- V. Vafeiadis and C. Narayan. Relaxed separation logic: A program logic for C11 concurrency. In OOPSLA, 2013. Google ScholarDigital Library
- V. Vafeiadis, T. Balabonski, S. Chakraborty, R. Morisset, and F. Zappa Nardelli. Common compiler optimisations are invalid in the C11 memory model and what we can do about it. In POPL, 2015. Google ScholarDigital Library
- J. Ševˇcík and D. Aspinall. On validity of program transformations in the Java memory model. In ECOOP, 2008. Google ScholarDigital Library
- J. Wickerson, M. Batty, A. F. Donaldson, and B. M. Beckmann. Remotescope promotion: clarified, rectified, and verified. In OOPSLA, 2015. Google ScholarDigital Library
- A. Williams. C++ Concurrency in Action. Manning, 2012.Google Scholar
Index Terms
- Overhauling SC atomics in C11 and OpenCL
Recommendations
Overhauling SC atomics in C11 and OpenCL
POPL '16Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic ...
Remote-scope promotion: clarified, rectified, and verified
OOPSLA '15Modern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to ...
Remote-scope promotion: clarified, rectified, and verified
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsModern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to ...
Comments