ABSTRACT
Chip multiprocessors (CMPs) share a large portion of the memory subsystem among multiple cores. Recent proposals have addressed high-performance and fair management of these shared resources; however, none of them take into account prefetch requests. Without prefetching, significant performance is lost, which is why existing systems prefetch. By not taking into account prefetch requests, recent shared-resource management proposals often significantly degrade both performance and fairness, rather than improve them in the presence of prefetching.
This paper is the first to propose mechanisms that both manage the shared resources of a multi-core chip to obtain high-performance and fairness, and also exploit prefetching. We apply our proposed mechanisms to two resource-based management techniques for memory scheduling and one source-throttling-based management technique for the entire shared memory system. We show that our mechanisms improve the performance of a 4-core system that uses network fair queuing, parallelism-aware batch scheduling, and fairness via source throttling by 11.0%, 10.9%, and 11.3% respectively, while also significantly improving fairness.
Supplemental Material
- J. Baer and T. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of Supercomputing '91, 1991. Google ScholarDigital Library
- J. Casazza. First the Tick, Now the Tock: Intel Microarchitecture (Nehalem) -- White Paper. Intel, 2009.Google Scholar
- R. Das et al. Application-aware prioritization mechanisms for on-chip networks. In MICRO-42, 2009. Google ScholarDigital Library
- R. Das et al. Aergia: Exploiting packet latency slack in on-chip networks. In ISCA-37, 2010. Google ScholarDigital Library
- J. Doweck. Inside Intel Core Microarchitecture and Smart Memory Access -- White Paper. Intel.Google Scholar
- E. Ebrahimi et al. Coordinated control of multiple prefetchers in multi-core systems. In MICRO, 2009. Google ScholarDigital Library
- E. Ebrahimi et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.Google ScholarCross Ref
- E. Ebrahimi et al. Fairness via source throttling: A configrable and high-performance fairness substrate for multi-core memory systems. In ASPLOS-XV, 2010. Google ScholarDigital Library
- E. Ebrahimi et al. Prefetch-aware shared-resource management for multi-core systems. Technical Report TR-HPS-2010-005, The University of Texas at Austin, 2010.Google Scholar
- R. Gabor et al. Fairness and throughput in switch on even multithreading. In MICRO-39, 2006. Google ScholarDigital Library
- B. Grot et al. Preemptive virtual clock: A flexible, efficient, and cost-effective QoS scheme for networks-on-a-chip. In MICRO-42, 2009. Google ScholarDigital Library
- L. R. Hsu et al. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In PACT-15, 2006. Google ScholarDigital Library
- R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS'07. Google ScholarDigital Library
- S. Kim et al. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT-13, 2004. Google ScholarDigital Library
- Y. Kim et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16, 2010.Google Scholar
- Y. Kim et al. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO-43, 2010. Google ScholarDigital Library
- H. Q. Le et al. IBM POWER6 microarchitecture. IBM Journal of Research and Development, 51:639--662, 2007. Google ScholarDigital Library
- C. J. Lee et al. Prefetch-aware DRAM controllers. In MICRO-41, 2008. Google ScholarDigital Library
- C. J. Lee et al. Improving memory bank-level parallelism in the presence of prefetching. In MICRO-42, 2009. Google ScholarDigital Library
- J. W. Lee et al. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In ISCA-35, 2008. Google ScholarDigital Library
- W.-F. Lin et al. Filtering superfluous prefetches using density vectors. In ICCD, 2001.Google Scholar
- K. Luo et al. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google Scholar
- Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 - 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google Scholar
- T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008. Google ScholarDigital Library
- K. J. Nesbit et al. Virtual private caches. In ISCA-34. Google ScholarDigital Library
- K. J. Nesbit et al. AC/DC: An adaptive data cache prefetcher. In PACT-13, 2004. Google ScholarDigital Library
- K. J. Nesbit et al. Fair queuing memory systems. In MICRO-39, 2006. Google ScholarDigital Library
- J. Owen and M. Steinman. Northbridge architecture of AMD's Griffin microprocessor family. IEEE Micro, 28(2), 2008. Google ScholarDigital Library
- H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarDigital Library
- M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006. Google ScholarDigital Library
- N. Rafique et al. Architectural support for operating system-driven CMP cache management. In PACT-15, 2006. Google ScholarDigital Library
- S. Rixner et al. Memory access scheduling. In ISCA-27, 2000. Google ScholarDigital Library
- A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS-IX, 2000. Google ScholarDigital Library
- S. Srinath et al. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007. Google ScholarDigital Library
- V. Srinivasan et al. A static filter for reducing prefetch traffic. Technical Report CSE-TR-400-99, University of Michigan, 1999.Google Scholar
- J. Tendler et al. POWER4 system microarchitecture. IBM Technical White Paper, 2001.Google Scholar
- X. Zhuang and H.-H. S. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP-32, 2003.Google ScholarCross Ref
Index Terms
- Prefetch-aware shared resource management for multi-core systems
Recommendations
Prefetch-aware shared resource management for multi-core systems
ISCA '11Chip multiprocessors (CMPs) share a large portion of the memory subsystem among multiple cores. Recent proposals have addressed high-performance and fair management of these shared resources; however, none of them take into account prefetch requests. ...
Coordinated control of multiple prefetchers in multi-core systems
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on MicroarchitectureAggressive prefetching is very beneficial for memory latency tolerance of many applications. However, it faces significant challenges in multi-core systems. Prefetchers of different cores on a chip multiprocessor (CMP) can cause significant interference ...
PACP: A Prefetch-aware Multi-core Shared Cache Partitioning Strategy
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial IntelligenceIn multi-core systems, hardware prefetchers aggravate the preemption of some access-intensive programs for shared last level cache (LLC) resources, resulting in lower system performance. As a solution, we propose a prefetch-aware multi-core shared cache ...
Comments