skip to main content
10.1145/3307650.3322222acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Linebacker: preserving victim cache lines in idle register files of GPUs

Published:22 June 2019Publication History

ABSTRACT

Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp throttling leaves several registers to be dynamically unused whenever a warp is throttled. Given the stringent cache size limitation in GPUs this work proposes a new cache management technique named Linebacker (LB) that improves GPU performance by utilizing idle register file space as victim cache space. Whenever a CTA becomes inactive, linebacker backs up the registers of the throttled CTA to the off-chip memory. Then, linebacker utilizes the corresponding register file space as victim cache space. If any load instruction finds data in the victim cache line, the data is directly copied to the destination register through a simple register-register move operation. To further improve the efficiency of victim cache linebacker allocates victim cache space only to a select few load instructions that exhibit high data locality. Through a careful design of victim cache indexing and management scheme linebacker provides 29.0% of speedup compared to the previously proposed warp throttling techniques.

References

  1. N. Agarwal, D. Nellans, M. O'Connor, S. W. Keckler, and T. F. Wenisch. 2015. Unlocking bandwidth for GPUs in CC-NUMA systems. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  2. Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W. Keckler. 2015. Page Placement Strategies for GPUs Within Heterogeneous Memory Systems. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Hodjat Asghari Esfeden, Farzad Khorasani, Hyeran Jeon, Daniel Wong, and Nael Abu-Ghazaleh. 2019. CORF: Coalescing Operand Register File for GPUs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). ACM, New York, NY, USA, 701--714.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bakhoda, G.L. Yuan, W.W.L. Fung, H. Wong, and T.M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '09).Google ScholarGoogle Scholar
  6. N. Chatterjee, M. O'Connor, G.H. Loh, N. Jayasena, and R. Balasubramonia. 2014. Managing DRAM Latency Divergence in Irregular GPGPU Applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Shuai Che, M. Boyer, Jiayuan Meng, D. Tarjan, J.W. Sheaffer, Sang-Ha Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009. IEEE International Symposium on Workload Characterization (IISWC '09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xuhao Chen, Li-Wen Chang, C.I. Rodrigues, Jie Lv, Zhiying Wang, and Wen-Mei Hwu. 2014. Adaptive Cache Management for Energy-Efficient GPU Computing. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ahmed ElTantawy and Tor M. Aamodt. 2018. Warp Scheduling for Fine-Grained Synchronization. In Proceedings of the 2018 IEEE 24th International Symposium on High Performance Computer Architecture (HPCA '18).Google ScholarGoogle Scholar
  10. J. Gaur, A. R. Alameldeen, and S. Subramoney. 2016. Base-Victim Compression: An Opportunistic Cache Compression Architecture. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mark Gebhart, Stephen W. Keckler, Brucek Khailany, Ronny Krashinsky, and William J. Dally. 2012. Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In 2012 Innovative Parallel Computing (InPar).Google ScholarGoogle Scholar
  13. Sunpyo Hong and Hyesoon Kim. 2010. An Integrated GPU Power and Performance Model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Hsieh, E. Ebrahim, G. Kim, N. Chatterjee, M. O'Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16).Google ScholarGoogle Scholar
  15. Hyeran Jeon, Gokul Subramanian Ravi, Nam Sung Kim, and Murali Annavaram. 2015. GPU Register File Virtualization. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Wenhao Jia, Kelly A. Shaw, and Margaret Martonosi. 2012. Characterizing and Improving the Use of Demand-fetched Caches in GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS '12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Jia, K. A. Shaw, and M. Martonosi. 2014. MRPB: Memory request prioritization for massively parallel processors. In Proceedings of the 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA '14).Google ScholarGoogle Scholar
  18. N. Jing, J. Wang, F. Fan, W. Yu, L. Jiang, C. Li, and X. Liang. 2016. Cache-emulated register file: An integrated on-chip memory architecture for high performance GPGPUs. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Adwait Jog, Onur Kayiran, Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. Orchestrated Scheduling and Prefetching for GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13).Google ScholarGoogle Scholar
  21. Norman P. Jouppi. 1990. Improving Direct-mapped Cache Performance by the Addition of a Small Fully-associative Cache and Prefetch Buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA '90). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O. Kayiran, A. Jog, M.T. Kandemir, and C.R. Das. 2013. Neither more nor less: Optimizing thread-level parallelism for GPGPUs. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT '13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Samira M. Khan, Daniel A. Jiménez, Doug Burger, and Babak Falsafi. 2010. Using Dead Blocks As a Virtual Victim Cache. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Khorasani, H. Asghari Esfeden, A. Farmahini-Farahani, N. Jayasena, and V. Sarkar. 2018. RegMutex: Inter-Warp GPU Register Time-Sharing. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA '18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane Compression: Transforming Data for Better Compression in Many-core Architectures. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Kim, S. Lee, M. K. Yoon, G. Koo, W. W. Ro, and M. Annavaram. 2016. Warpedpreexecution: A GPU pre-execution approach for improving latency hiding. In Proceedings of the 2016 IEEE 22nd International Symposium on High Performance Computer Architecture (HPCA '16).Google ScholarGoogle Scholar
  27. John Kloosterman, Jonathan Beaumont, D. Anoushe Jamshidi, Jonathan Bailey, Trevor Mudge, and Scott Mahlke. 2017. Regless: Just-in-time Operand Staging for GPUs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have Your Scratchpad and Cache It Too. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Koo, H. Jeon, and M. Annavaram. 2015. Revealing Critical Loads and Hidden Data Locality in GPGPU Applications. In 2015 IEEE International Symposium on Workload Characterization (IISWC '15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Gunjae Koo, Yunho Oh, Won Woo Ro, and Murali Annavaram. 2017. Access Pattern-Aware Cache Management for Improving Data Utilization in GPU. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Won Woo Ro, and Murali Annavaram. 2015. Warped-compression: Enabling Power Efficient GPUs Through Register Compression. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shin-Ying Lee, Akhil Arunkumar, and Carole-Jean Wu. 2015. CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration of GPGPU Workloads. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Shin-Ying Lee and Carole-Jean Wu. 2014. CAWS: Criticality-aware Warp Scheduling for GPGPU Workloads. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling Energy Optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, and Henk Corporaal. 2017. Locality-Aware CTA Clustering for Modern GPUs. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, 297--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ang Li, Gert-Jan van den Braak, Akash Kumar, and Henk Corporaal. 2015. Adaptive and Transparent Cache Bypassing for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chao Li, Shuaiwen Leon Song, Hongwen Dai, Albert Sidelnik, Siva Kumar Sastry Hari, and Huiyang Zhou. 2015. Locality-Driven Dynamic GPU Cache Bypassing. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Li, M. Rhu, D. R. Johnson, M. O'Connor, M. Erez, D. Burger, D. S. Fussell, and S. W. Keckler. 2015. Priority-based cache allocation in throughput processors. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15).Google ScholarGoogle Scholar
  39. Lingda Li, Ari B. Hayes, Shuaiwen Leon Song, and Eddy Z. Zhang. 2016. Tag-Split Cache for Efficient GPGPU Cache Utilization. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16).Google ScholarGoogle Scholar
  40. Jieun Lim, Nagesh B. Lakshminarayana, Hyesoon Kim, William Song, Sudhakar Yalamanchili, and Wonyong Sung. 2014. Power Modeling for GPU Architectures Using McPAT. ACM Trans. Des. Autom. Electron. Syst. 19, 3 (June 2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mengjie Mao, Jingtong Hu, Yiran Chen, and Hai Li. 2015. VWS: A versatile warp scheduler for exploring diverse cache localities of GPGPU applications. In Design Automation Conference, 2015 52nd ACM/EDAC/IEEE (DAC '15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ugljesa Milic, Oreste Villa, Evgeny Bolotin, Akhil Arunkumar, Eiman Ebrahimi, Aamer Jaleel, Alex Ramirez, and David Nellans. 2017. Beyond the Socket: NUMA-aware GPUs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. NVIDIA. 2014. NVIDIA GeForce GTX 980: Featuring Maxwell, The Most Advanced GPU Ever Made.Google ScholarGoogle Scholar
  44. NVIDIA. 2016. NVIDIA CUDA SDK Code Sample 4.0.Google ScholarGoogle Scholar
  45. NVIDIA. 2016. NVIDIA GeForce GTX 1080: Gaming Perfected.Google ScholarGoogle Scholar
  46. NVIDIA. 2016. NVIDIA Tesla P100: The Most Advanced Datacenter Accelerator Ever Built.Google ScholarGoogle Scholar
  47. NVIDIA. 2017. NVIDIA TESLA V100 GPU ARCHITECTURE.Google ScholarGoogle Scholar
  48. NVIDIA. 2018. NVIDIA Turing GPU Architecture: Graphics reinvented.Google ScholarGoogle Scholar
  49. Yunho Oh, Keunsoo Kim, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, Won Woo Ro, and Murali Annavaram. 2016. APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yunho Oh, Myung Kuk Yoon, William J. Song, and Won Woo Ro. 2018. FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput. In 2018 Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-51).Google ScholarGoogle Scholar
  51. Sreepathi Pai, R. Govindarajan, and Matthew J. Thazhuthaveetil. 2014. Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative Preemption for Multitasking on a Shared GPU. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). 593--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. G. Pekhimenko, E. Bolotin, N. Vijaykumar, O. Mutlu, T. C. Mowry, and S. W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA '16).Google ScholarGoogle Scholar
  54. Minsoo Rhu, Michael Sullivan, Jingwen Leng, and Mattan Erez. 2013. A Locality-aware Memory Hierarchy for Energy-efficient GPU Architectures. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46).Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Timothy G. Rogers, Mike O'Connor, and Tor M. Aamodt. 2012. Cache-Conscious Wavefront Scheduling. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Timothy G. Rogers, Mike O'Connor, and Tor M. Aamodt. 2013. Divergence-aware Warp Scheduling. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Mohammad Sadrosadati, Amirhossein Mirhosseini, Seyed Borna Ehsani, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, and Onur Mutlu. 2018. LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. A. Sethia, D. A. Jamshidi, and S. Mahlke. 2015. Mascar: Speeding up GPU warps by reducing memory pitstops. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15).Google ScholarGoogle Scholar
  59. D. Stiliadis and A. Varma. 1997. Selective victim caching: a method to improve the performance of direct-mapped caches. IEEE Trans. Comput. 46, 5 (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. A. Stratton, C. Rodrigues, I. J. Sung, N. Obeid, L. W. Chang, N. Anssari, G. D. Liu, and W. W. Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing (2012).Google ScholarGoogle Scholar
  61. I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero. 2014. Enabling preemptive multiprogramming on GPUs. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Shyamkumar Thoziyoor, Naveen Muralimanohar, and Jung Ho Ahn. 2008. Cacti 5.1. Technical Report. Hewlett-Packard Laboratories.Google ScholarGoogle Scholar
  63. Nandita Vijaykumar, Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, and Onur Mutlu. 2018. The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality in GPUs. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Bin Wang, Weikuan Yu, Xian-He Sun, and Xinning Wang. 2015. DaCache: Memory Divergence-Aware GPU Cache Management. In Proceedings of the 29th ACM International Conference on Supercomputing (ICS '15).Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Bin Wang, Yue Zhu, and Weikuan Yu. 2016. OAWS: Memory Occlusion Aware Warp Scheduling. In Proceedings of the 25th International Conference on Parallel Architectures and Compilation Techniques (PACT '16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Haonan Wang, Fan Leon Luo, Mohamed Ibrahim, Onur Kayiran, and Adwait Jog. 2018. Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management. In Proceedings of the 2018 IEEE 24th International Symposium on High Performance Computer Architecture (HPCA '18).Google ScholarGoogle ScholarCross RefCross Ref
  67. Xiaolong Xie, Yun Liang, Xiuhong Li, Yudong Wu, Guangyu Sun, Tao Wang, and Dongrui Fan. 2015. Enabling Coordinated Register Allocation and Thread-level Parallelism Optimization for GPUs. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. X. Xie, Y. Liang, Y. Wang, G. Sun, and T. Wang. 2015. Coordinated static and dynamic cache bypassing for GPUs. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15).Google ScholarGoogle Scholar
  69. Q. Xu, H. Jeon, K. Kim, W. W. Ro, and M. Annavaram. 2016. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 230--242.Google ScholarGoogle Scholar
  70. Myung Kuk Yoon, Keunsoo Kim, Sangpil Lee, Won Woo Ro, and Murali Annavaram. 2016. Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Michael Zhang and Krste Asanovic. 2005. Victim Replication: Maximizing Capacity While Hiding Wire Delay in Tiled Chip Multiprocessors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA '05). Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture
    June 2019
    849 pages
    ISBN:9781450366694
    DOI:10.1145/3307650

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 22 June 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    ISCA '19 Paper Acceptance Rate62of365submissions,17%Overall Acceptance Rate543of3,203submissions,17%

    Upcoming Conference

    ISCA '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader