research-article

Linebacker: preserving victim cache lines in idle register files of GPUs

Authors:
Yunho Oh

EPFL, Lausanne, Switzerland

EPFL, Lausanne, Switzerland
View Profile

,
Gunjae Koo

Hongik University, Seoul, Korea

Hongik University, Seoul, Korea
View Profile

,
Murali Annavaram

University of Southern California

University of Southern California
View Profile

,
Won Woo Ro

Yonsei University, Seoul, Korea

Yonsei University, Seoul, Korea
View Profile

ISCA '19: Proceedings of the 46th International Symposium on Computer ArchitectureJune 2019Pages 183–196https://doi.org/10.1145/3307650.3322222

Published:22 June 2019Publication History

ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

Pages 183–196

ABSTRACT

Modern GPUs suffer from cache contention due to the limited cache size that is shared across tens of concurrently running warps. To increase the per-warp cache size prior techniques proposed warp throttling which limits the number of active warps. Warp throttling leaves several registers to be dynamically unused whenever a warp is throttled. Given the stringent cache size limitation in GPUs this work proposes a new cache management technique named Linebacker (LB) that improves GPU performance by utilizing idle register file space as victim cache space. Whenever a CTA becomes inactive, linebacker backs up the registers of the throttled CTA to the off-chip memory. Then, linebacker utilizes the corresponding register file space as victim cache space. If any load instruction finds data in the victim cache line, the data is directly copied to the destination register through a simple register-register move operation. To further improve the efficiency of victim cache linebacker allocates victim cache space only to a select few load instructions that exhibit high data locality. Through a careful design of victim cache indexing and management scheme linebacker provides 29.0% of speedup compared to the previously proposed warp throttling techniques.

References

N. Agarwal, D. Nellans, M. O'Connor, S. W. Keckler, and T. F. Wenisch. 2015. Unlocking bandwidth for GPUs in CC-NUMA systems. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).Google Scholar
Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W. Keckler. 2015. Page Placement Strategies for GPUs Within Heterogeneous Memory Systems. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). Google ScholarDigital Library
Akhil Arunkumar, Evgeny Bolotin, Benjamin Cho, Ugljesa Milic, Eiman Ebrahimi, Oreste Villa, Aamer Jaleel, Carole-Jean Wu, and David Nellans. 2017. MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Google ScholarDigital Library
Hodjat Asghari Esfeden, Farzad Khorasani, Hyeran Jeon, Daniel Wong, and Nael Abu-Ghazaleh. 2019. CORF: Coalescing Operand Register File for GPUs. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). ACM, New York, NY, USA, 701--714.Google ScholarDigital Library
A. Bakhoda, G.L. Yuan, W.W.L. Fung, H. Wong, and T.M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '09).Google Scholar
N. Chatterjee, M. O'Connor, G.H. Loh, N. Jayasena, and R. Balasubramonia. 2014. Managing DRAM Latency Divergence in Irregular GPGPU Applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). Google ScholarDigital Library
Shuai Che, M. Boyer, Jiayuan Meng, D. Tarjan, J.W. Sheaffer, Sang-Ha Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009. IEEE International Symposium on Workload Characterization (IISWC '09). Google ScholarDigital Library
Xuhao Chen, Li-Wen Chang, C.I. Rodrigues, Jie Lv, Zhiying Wang, and Wen-Mei Hwu. 2014. Adaptive Cache Management for Energy-Efficient GPU Computing. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47).Google ScholarDigital Library
Ahmed ElTantawy and Tor M. Aamodt. 2018. Warp Scheduling for Fine-Grained Synchronization. In Proceedings of the 2018 IEEE 24th International Symposium on High Performance Computer Architecture (HPCA '18).Google Scholar
J. Gaur, A. R. Alameldeen, and S. Subramoney. 2016. Base-Victim Compression: An Opportunistic Cache Compression Architecture. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
Mark Gebhart, Stephen W. Keckler, Brucek Khailany, Ronny Krashinsky, and William J. Dally. 2012. Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). Google ScholarDigital Library
S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In 2012 Innovative Parallel Computing (InPar).Google Scholar
Sunpyo Hong and Hyesoon Kim. 2010. An Integrated GPU Power and Performance Model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Google ScholarDigital Library
K. Hsieh, E. Ebrahim, G. Kim, N. Chatterjee, M. O'Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler. 2016. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16).Google Scholar
Hyeran Jeon, Gokul Subramanian Ravi, Nam Sung Kim, and Murali Annavaram. 2015. GPU Register File Virtualization. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48).Google ScholarDigital Library
Wenhao Jia, Kelly A. Shaw, and Margaret Martonosi. 2012. Characterizing and Improving the Use of Demand-fetched Caches in GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS '12). Google ScholarDigital Library
W. Jia, K. A. Shaw, and M. Martonosi. 2014. MRPB: Memory request prioritization for massively parallel processors. In Proceedings of the 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA '14).Google Scholar
N. Jing, J. Wang, F. Fan, W. Yu, L. Jiang, C. Li, and X. Liang. 2016. Cache-emulated register file: An integrated on-chip memory architecture for high performance GPGPUs. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49). Google ScholarDigital Library
Adwait Jog, Onur Kayiran, Nachiappan Chidambaram Nachiappan, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. OWL: Cooperative Thread Array Aware Scheduling Techniques for Improving GPGPU Performance. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). Google ScholarDigital Library
Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, Ravishankar Iyer, and Chita R. Das. 2013. Orchestrated Scheduling and Prefetching for GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13).Google Scholar
Norman P. Jouppi. 1990. Improving Direct-mapped Cache Performance by the Addition of a Small Fully-associative Cache and Prefetch Buffers. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA '90). Google ScholarDigital Library
O. Kayiran, A. Jog, M.T. Kandemir, and C.R. Das. 2013. Neither more nor less: Optimizing thread-level parallelism for GPGPUs. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT '13). Google ScholarDigital Library
Samira M. Khan, Daniel A. Jiménez, Doug Burger, and Babak Falsafi. 2010. Using Dead Blocks As a Virtual Victim Cache. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10). Google ScholarDigital Library
F. Khorasani, H. Asghari Esfeden, A. Farmahini-Farahani, N. Jayasena, and V. Sarkar. 2018. RegMutex: Inter-Warp GPU Register Time-Sharing. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA '18). Google ScholarDigital Library
Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane Compression: Transforming Data for Better Compression in Many-core Architectures. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). Google ScholarDigital Library
K. Kim, S. Lee, M. K. Yoon, G. Koo, W. W. Ro, and M. Annavaram. 2016. Warpedpreexecution: A GPU pre-execution approach for improving latency hiding. In Proceedings of the 2016 IEEE 22nd International Symposium on High Performance Computer Architecture (HPCA '16).Google Scholar
John Kloosterman, Jonathan Beaumont, D. Anoushe Jamshidi, Jonathan Bailey, Trevor Mudge, and Scott Mahlke. 2017. Regless: Just-in-time Operand Staging for GPUs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50). Google ScholarDigital Library
Rakesh Komuravelli, Matthew D. Sinclair, Johnathan Alsop, Muhammad Huzaifa, Maria Kotsifakou, Prakalp Srivastava, Sarita V. Adve, and Vikram S. Adve. 2015. Stash: Have Your Scratchpad and Cache It Too. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). Google ScholarDigital Library
G. Koo, H. Jeon, and M. Annavaram. 2015. Revealing Critical Loads and Hidden Data Locality in GPGPU Applications. In 2015 IEEE International Symposium on Workload Characterization (IISWC '15). Google ScholarDigital Library
Gunjae Koo, Yunho Oh, Won Woo Ro, and Murali Annavaram. 2017. Access Pattern-Aware Cache Management for Improving Data Utilization in GPU. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Google ScholarDigital Library
Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Won Woo Ro, and Murali Annavaram. 2015. Warped-compression: Enabling Power Efficient GPUs Through Register Compression. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). Google ScholarDigital Library
Shin-Ying Lee, Akhil Arunkumar, and Carole-Jean Wu. 2015. CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration of GPGPU Workloads. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Google ScholarDigital Library
Shin-Ying Lee and Carole-Jean Wu. 2014. CAWS: Criticality-aware Warp Scheduling for GPGPU Workloads. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). Google ScholarDigital Library
Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling Energy Optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13).Google ScholarDigital Library
Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, and Henk Corporaal. 2017. Locality-Aware CTA Clustering for Modern GPUs. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, 297--311. Google ScholarDigital Library
Ang Li, Gert-Jan van den Braak, Akash Kumar, and Henk Corporaal. 2015. Adaptive and Transparent Cache Bypassing for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). Google ScholarDigital Library
Chao Li, Shuaiwen Leon Song, Hongwen Dai, Albert Sidelnik, Siva Kumar Sastry Hari, and Huiyang Zhou. 2015. Locality-Driven Dynamic GPU Cache Bypassing. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). Google ScholarDigital Library
D. Li, M. Rhu, D. R. Johnson, M. O'Connor, M. Erez, D. Burger, D. S. Fussell, and S. W. Keckler. 2015. Priority-based cache allocation in throughput processors. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15).Google Scholar
Lingda Li, Ari B. Hayes, Shuaiwen Leon Song, and Eddy Z. Zhang. 2016. Tag-Split Cache for Efficient GPGPU Cache Utilization. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16).Google Scholar
Jieun Lim, Nagesh B. Lakshminarayana, Hyesoon Kim, William Song, Sudhakar Yalamanchili, and Wonyong Sung. 2014. Power Modeling for GPU Architectures Using McPAT. ACM Trans. Des. Autom. Electron. Syst. 19, 3 (June 2014).Google ScholarDigital Library
Mengjie Mao, Jingtong Hu, Yiran Chen, and Hai Li. 2015. VWS: A versatile warp scheduler for exploring diverse cache localities of GPGPU applications. In Design Automation Conference, 2015 52nd ACM/EDAC/IEEE (DAC '15). Google ScholarDigital Library
Ugljesa Milic, Oreste Villa, Evgeny Bolotin, Akhil Arunkumar, Eiman Ebrahimi, Aamer Jaleel, Alex Ramirez, and David Nellans. 2017. Beyond the Socket: NUMA-aware GPUs. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50). Google ScholarDigital Library
NVIDIA. 2014. NVIDIA GeForce GTX 980: Featuring Maxwell, The Most Advanced GPU Ever Made.Google Scholar
NVIDIA. 2016. NVIDIA CUDA SDK Code Sample 4.0.Google Scholar
NVIDIA. 2016. NVIDIA GeForce GTX 1080: Gaming Perfected.Google Scholar
NVIDIA. 2016. NVIDIA Tesla P100: The Most Advanced Datacenter Accelerator Ever Built.Google Scholar
NVIDIA. 2017. NVIDIA TESLA V100 GPU ARCHITECTURE.Google Scholar
NVIDIA. 2018. NVIDIA Turing GPU Architecture: Graphics reinvented.Google Scholar
Yunho Oh, Keunsoo Kim, Myung Kuk Yoon, Jong Hyun Park, Yongjun Park, Won Woo Ro, and Murali Annavaram. 2016. APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). Google ScholarDigital Library
Yunho Oh, Myung Kuk Yoon, William J. Song, and Won Woo Ro. 2018. FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput. In 2018 Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-51).Google Scholar
Sreepathi Pai, R. Govindarajan, and Matthew J. Thazhuthaveetil. 2014. Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). Google ScholarDigital Library
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative Preemption for Multitasking on a Shared GPU. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). 593--606. Google ScholarDigital Library
G. Pekhimenko, E. Bolotin, N. Vijaykumar, O. Mutlu, T. C. Mowry, and S. W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA '16).Google Scholar
Minsoo Rhu, Michael Sullivan, Jingwen Leng, and Mattan Erez. 2013. A Locality-aware Memory Hierarchy for Energy-efficient GPU Architectures. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46).Google ScholarDigital Library
Timothy G. Rogers, Mike O'Connor, and Tor M. Aamodt. 2012. Cache-Conscious Wavefront Scheduling. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). Google ScholarDigital Library
Timothy G. Rogers, Mike O'Connor, and Tor M. Aamodt. 2013. Divergence-aware Warp Scheduling. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). Google ScholarDigital Library
Mohammad Sadrosadati, Amirhossein Mirhosseini, Seyed Borna Ehsani, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, and Onur Mutlu. 2018. LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18).Google ScholarDigital Library
A. Sethia, D. A. Jamshidi, and S. Mahlke. 2015. Mascar: Speeding up GPU warps by reducing memory pitstops. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15).Google Scholar
D. Stiliadis and A. Varma. 1997. Selective victim caching: a method to improve the performance of direct-mapped caches. IEEE Trans. Comput. 46, 5 (1997). Google ScholarDigital Library
J. A. Stratton, C. Rodrigues, I. J. Sung, N. Obeid, L. W. Chang, N. Anssari, G. D. Liu, and W. W. Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing (2012).Google Scholar
I. Tanasic, I. Gelado, J. Cabezas, A. Ramirez, N. Navarro, and M. Valero. 2014. Enabling preemptive multiprogramming on GPUs. In 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). 193--204. Google ScholarDigital Library
Shyamkumar Thoziyoor, Naveen Muralimanohar, and Jung Ho Ahn. 2008. Cacti 5.1. Technical Report. Hewlett-Packard Laboratories.Google Scholar
Nandita Vijaykumar, Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, and Onur Mutlu. 2018. The Locality Descriptor: A Holistic Cross-Layer Abstraction to Express Data Locality in GPUs. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18). Google ScholarDigital Library
Bin Wang, Weikuan Yu, Xian-He Sun, and Xinning Wang. 2015. DaCache: Memory Divergence-Aware GPU Cache Management. In Proceedings of the 29th ACM International Conference on Supercomputing (ICS '15).Google ScholarDigital Library
Bin Wang, Yue Zhu, and Weikuan Yu. 2016. OAWS: Memory Occlusion Aware Warp Scheduling. In Proceedings of the 25th International Conference on Parallel Architectures and Compilation Techniques (PACT '16).Google ScholarDigital Library
Haonan Wang, Fan Leon Luo, Mohamed Ibrahim, Onur Kayiran, and Adwait Jog. 2018. Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management. In Proceedings of the 2018 IEEE 24th International Symposium on High Performance Computer Architecture (HPCA '18).Google ScholarCross Ref
Xiaolong Xie, Yun Liang, Xiuhong Li, Yudong Wu, Guangyu Sun, Tao Wang, and Dongrui Fan. 2015. Enabling Coordinated Register Allocation and Thread-level Parallelism Optimization for GPUs. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). Google ScholarDigital Library
X. Xie, Y. Liang, Y. Wang, G. Sun, and T. Wang. 2015. Coordinated static and dynamic cache bypassing for GPUs. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA '15).Google Scholar
Q. Xu, H. Jeon, K. Kim, W. W. Ro, and M. Annavaram. 2016. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 230--242.Google Scholar
Myung Kuk Yoon, Keunsoo Kim, Sangpil Lee, Won Woo Ro, and Murali Annavaram. 2016. Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). Google ScholarDigital Library
Michael Zhang and Krste Asanovic. 2005. Victim Replication: Maximizing Capacity While Hiding Wire Delay in Tiled Chip Multiprocessors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA '05). Google ScholarDigital Library

Recommendations

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures
Abstract
GPUs provide megabytes of registers and shared memories to maintain the contexts for thousands of threads and enable fast data sharing amongst threads of a thread block, respectively. Besides, GPUs employ L1 cache to provide the high ...
Read More
REMOC: efficient request managements for on-chip memories of GPUs
CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers

The on-chip memories of GPUs, including the register file, shared memory and L1 cache, can provide high bandwidth and low latency access for the temporary storage of data. The capacity of L1 cache can be increased by using the registers/shared memory ...
Read More
Exploring cache bypassing and partitioning for multi-tasking on GPUs
ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design

Graphics Processing Units (GPUs) computing has become ubiquitous for embedded system, evidenced by its wide adoption for various general purpose applications. As more and more applications are accelerated by GPUs, multi-tasking scenario starts to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture
June 2019
849 pages
ISBN:9781450366694
DOI:10.1145/3307650
General Chair:
Srilatha (Bobbie) Manne
Microsoft
,
Program Chairs:
Hillery Hunter
IBM
,
Erik Altman
IBM Research
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CTA scheduling
GPU
cache
register file
Qualifiers
- research-article
Conference

Acceptance Rates
ISCA '19 Paper Acceptance Rate62of365submissions,17%Overall Acceptance Rate543of3,203submissions,17%
More
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 904
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Linebacker: preserving victim cache lines in idle register files of GPUs

ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Recommendations

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

REMOC: efficient request managements for on-chip memories of GPUs

Exploring cache bypassing and partitioning for multi-tasking on GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Linebacker: preserving victim cache lines in idle register files of GPUs

ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Recommendations

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

REMOC: efficient request managements for on-chip memories of GPUs

Exploring cache bypassing and partitioning for multi-tasking on GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media