ABSTRACT
Diverse IP cores are integrated on a modern system-on-chip and share resources. Off-chip memory bandwidth is often the scarcest resource and requires careful allocation. Two of the most important cores, the CPU and the GPU, can both simultaneously demand high bandwidth. We demonstrate that conventional quality-of-service allocation techniques can severely constrict GPU performance by allowing the CPU to occasionally monopolize shared bandwidth. We propose to dynamically adapt the priority of CPU and GPU memory requests based on a novel mechanism that tracks progress of GPU workloads. Our evaluation shows that the proposed mechanism significantly improves GPU performance with only minimal impact on the CPU.
- 3DMarkMobile ES 2.0. http://www.futuremark.com/products/3dmarkmobile, 2011.Google Scholar
- B. Akesson, K. Goossens, and M. Ringhofer. Predator: A Predicatable SDRAM Memory Controller. In Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis - CODES+ISSS '07, page 251, New York, New York, USA, Sept. 2007. ACM Press. Google ScholarDigital Library
- R. Ausavarungnirun, G. Loh, K. Chang, L. Subramanian, and O. Mutlu. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems. In Proc. the 39th Ann. Int'l Symp. Computer Architecture (ISCA), ISCA '12, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39:1--7, Aug. 2011. Google ScholarDigital Library
- Y. Gu and S. Chakraborty. A Hybrid DVS Scheme for Interactive 3D Games. In 2008 IEEE Real-Time and Embedded Technology and Applications Symposium, pages 3--12. IEEE, Apr. 2008. Google ScholarDigital Library
- M. K. Jeong, D. H. Yoon, and M. Erez. DrSim: A platform for flexible DRAM system research. http://lph.ece.utexas.edu/public/DrSim.Google Scholar
- T. Karkhanis and J. E. Smith. A day in the life of a data cache miss. In Workshop on Memory Performance Issues, 2002.Google Scholar
- Kishonti Informatics Ltd. GLBenchmark. http://www.glbenchmark.com, 2011.Google Scholar
- A. J. KleinOsowski and D. J. Lilja. Minnespec: A new spec benchmark workload for simulation-based computer architecture research. IEEE Comput. Archit. Lett., 1:7--, January 2002. Google ScholarDigital Library
- Micron Corp. Micron 2 Gb x 16, x 32, Mobile LPDDR2 SDRAM S4, 2011.Google Scholar
- O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In International Symposium on Microarchitecture, pages 146--160, 2007. Google ScholarDigital Library
- K. Nesbit, N. Aggarwal, J. Laudon, and J. Smith. Fair queuing memory systems. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 208--222. IEEE Computer Society, 2006. Google ScholarDigital Library
- S. Rixner, W. J. Dally, U. J. Kapasi, P. R. Mattson, and J. D. Owens. Memory access scheduling. In Proc. the 27th Ann. Int'l Symp. Computer Architecture (ISCA), Jun. 2000. Google ScholarDigital Library
- B. Silpa, G. Krishnaiah, and P. R. Panda. Rank based dynamic voltage and frequency scaling for tiled graphics processors. In Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, CODES/ISSS '10, pages 3--12, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- A. Stevens. Qos for high-performance and power-efficient hd multimedia. Technical report, Arm, 2010.Google Scholar
- A. Tune and A. Bruce. How to tune your SoC to avoid traffic congestion. In DesignCon, 2010.Google Scholar
- Z. Zhang, Z. Zhu, and X. Zhang. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality. In Proc. the 33rd IEEE/ACM Int'l Symp. Microarchitecture (MICRO), Dec. 2000. Google ScholarDigital Library
Index Terms
- A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC
Recommendations
Mars: Accelerating MapReduce with Graphics Processors
We design and implement Mars, a MapReduce runtime system accelerated with graphics processing units (GPUs). MapReduce is a simple and flexible parallel programming paradigm originally proposed by Google, for the ease of large-scale data processing on ...
Adaptive bandwidth management and QoS provisioning in IPVPNs
An IP Virtual Private Network (VPN) uses a major share of physical resources of a network to satisfy customer's demand for secure connectivity and Quality of Service (QoS) over the Internet. Service Level Agreements (SLAs) are often used to provide ...
QoS-aware bandwidth provisioning for IP network links
Current bandwidth provisioning procedures for IP network links are mostly based on simple rules of thumb, using coarse traffic measurements made on a time scale of e.g., 5 or 15min. A crucial question, however, is whether such coarse measurements give ...
Comments