Skip to main content

LC-MEMENTO: A Memory Model for Accelerated Architectures

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13181))

  • 312 Accesses

Abstract

With the advent of heterogeneous architectures, in particular, with the ubiquity of multi-GPU systems, it is becoming increasingly important to manage device memory efficiently in order to reap the benefits of the additional core count. To date, such responsibility mainly falls on the programmer where device-to-host data communication (and vice versa), if not done properly, may incur costly memory transfer operations and synchronization. The problem may be compounded by additional requirement to maintain system-wide memory consistency that may involve expensive synchronization overhead. In this paper, we present Location Consistency Memory Model for Enhanced Transfer Operations (LC-MEMENTO). This framework considers incorporating runtime techniques for multi-GPU memory management to support relaxed synchronization semantics and memory transfer operations automatically. Specifically, we implement a relaxed form of a memory consistency model based on the Location Consistency (LC) in an Asynchronous Many-Task Runtime (ARTS) and demonstrate that, this memory model enables additional optimization opportunities for the three representative applications encompassing different computational patterns (scientific computation, graphs, data streaming, etc.).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Their ancillary functionality can be set and reset across different program’s phases.

  2. 2.

    A practical memory model is one that can be used by application developers to write non-chaotic codes since all of its non-determinism can be contained by special operators.

  3. 3.

    A concept in computer science and mathematics in which operators can be applied multiple times without changing the results/state of the computation after the first application.

References

  1. Abdolrashidi, A., et al.: WIREFRAME: supporting data-dependent parallelism through dependency graph execution in GPUs. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 600–611 (2017)

    Google Scholar 

  2. Abdolrashidi, A., et al.: BlockMaestro: enabling programmer-transparent task-based execution in GPU systems. In: 2021 48th Annual IEEE/ACM International Symposium on Computer Architecture (ISCA). IEEE (2021)

    Google Scholar 

  3. Adve, S.V., Gharachorloo, K.: Shared memory consistency models: a tutorial. Computer 29(12), 66–76 (1996)

    Article  Google Scholar 

  4. Ben-Nun, T., et al.: Groute: an asynchronous multi-GPU programming model for irregular computations. ACM SIGPLAN Notices 52(8), 235–248 (2017)

    Article  Google Scholar 

  5. Bershad, B.N., Zekauskas, M.J.: Midway: shared memory parallel programming with entry consistency for distributed memory multiprocessors. Technical report (1991)

    Google Scholar 

  6. Chen, G., et al.: EffiSha: a software framework for enabling effficient preemptive scheduling of GPU. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 3–16 (2017)

    Google Scholar 

  7. Droco, M., et al.: Global Memory and Threading (GMT). https://github.com/pnnl/gmt

  8. Firoz, J.S., Zalewski, M., Kanewala, T., Lumsdaine, A.: Synchronization-avoiding graph algorithms. In: 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pp. 52–61. IEEE (2018)

    Google Scholar 

  9. Modelado Foundation: Open Community Runtime. https://xstackwiki.modelado.org/Open_Community_Runtime

  10. Gao, G.R., Sarkar, V.: Location consistency-a new memory model and cache consistency protocol. IEEE Trans. Comput. 49(8), 798–813 (2000)

    Article  Google Scholar 

  11. Hechtman, B.A., Sorin, D.J.: Exploring memory consistency for massively-threaded throughput-oriented processors. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, pp. 201–212 (2013)

    Google Scholar 

  12. Jeon, M., et al.: Analysis of large-scale multi-tenant GPU clusters for DNN training workloads. In: 2019 USENIX Annual Technical Conference (USENIX ATC 19), pp. 947–960 (2019)

    Google Scholar 

  13. Landwehr, J., et al.: Designing scalable distributed memory models: a case study. In: Proceedings of the Computing Frontiers Conference, CF 2017, pp. 174–182. Association for Computing Machinery, New York (2017)

    Google Scholar 

  14. Lenoski, D., et al.: The directory-based cache coherence protocol for the DASH multiprocessor. In: Proceedings of the 17th Annual International Symposium on Computer Architecture, ISCA 1990, pp. 148–159. ACM, New York (1990)

    Google Scholar 

  15. Long, G., et al.: Location consistency model revisited: problem, solution and prospects. In: 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 91–98 (2008)

    Google Scholar 

  16. Lustig, D., et al.: A formal analysis of the NVIDIA PTX memory consistency model. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, pp. 257–270. Association for Computing Machinery, New York (2019)

    Google Scholar 

  17. Luszczek, P.R., et al.: The HPC Challenge (HPCC) benchmark suite. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006, p. 213-es. Association for Computing Machinery, New York (2006)

    Google Scholar 

  18. Protiae, J., Milutinoviae, V.: Entry consistency versus lazy release consistency in DSM systems: analytical comparison and a new hybrid solution. In: Proceedings of the Sixth IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems, 1997, pp. 78–83, October 1997

    Google Scholar 

  19. Ranganath, K., et al.: Speeding up collective communications through inter-GPU re-routing. IEEE Comput. Archit. Lett. 18(2), 128–131 (2019)

    Article  Google Scholar 

  20. Ranganath, K., et al.: MAPA: multi-accelerator pattern allocation policy for multi-tenant GPU servers. In: SC21: International Conference for High Performance Computing, Networking, Storage and Analysis. ACM (2021)

    Google Scholar 

  21. Ren, X., Lis, M.: Efficient sequential consistency in GPUs via relativistic cache coherence. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 625–636. IEEE (2017)

    Google Scholar 

  22. Ren, X., Lustig, D., Bolotin, E., Jaleel, A., Villa, O., Nellans, D.: HMG: extending cache coherence protocols across modern hierarchical multi-GPU systems. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 582–595. IEEE (2020)

    Google Scholar 

  23. Singh, A., Aga, S., Narayanasamy, S.: Efficiently enforcing strong memory ordering in GPUs. In: Proceedings of the 48th International Symposium on Microarchitecture, pp. 699–712 (2015)

    Google Scholar 

  24. Rennich, S.: Streams and Concurrency. https://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

  25. Suetterlein, J., et al.: The Abstract Runtime System: ARTS. https://github.com/pnnl/ARTS

  26. Tripathy, D., et al.: LocalityGuru: a PTX analyzer for extracting thread block-level locality in GPGPUs. In: Proceedings of the 15th IEEE/ACM International Conference on Networking, Architecture, and Storage (2021, To appear)

    Google Scholar 

  27. Trott, C.R., Edwards, H.C.: Kokkos: the C++ performance portability programming model. Technical report, Sandia National Lab. (SNL-NM), Albuquerque, NM, United States (2017)

    Google Scholar 

  28. Ueno, K., Suzumura, T.: Highly scalable graph search for the Graph500 benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012, pp. 149–160. Association for Computing Machinery, New York (2012)

    Google Scholar 

  29. Vergara, M., et al.: Scaling the summit: deploying the world’s fastest supercomputer. In: International Workshop on OpenPOWER for HPC (IWOPH 2019) (2019)

    Google Scholar 

  30. Willcock, J.J., et al.: AM++: a generalized active message framework. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, pp. 401–410. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1854273.1854323

  31. Xiao, W., et al.: Gandiva: introspective cluster scheduling for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 595–610 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kiran Ranganath or Joshua Suetterlein .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ranganath, K. et al. (2022). LC-MEMENTO: A Memory Model for Accelerated Architectures. In: Li, X., Chandrasekaran, S. (eds) Languages and Compilers for Parallel Computing. LCPC 2021. Lecture Notes in Computer Science, vol 13181. Springer, Cham. https://doi.org/10.1007/978-3-030-99372-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99372-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99371-9

  • Online ISBN: 978-3-030-99372-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics