research-article

Prefetch-aware shared resource management for multi-core systems

Authors:
Eiman Ebrahimi

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Chang Joo Lee

Intel Corporation, Austin, TX, USA

Intel Corporation, Austin, TX, USA
View Profile

,
Onur Mutlu

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Yale N. Patt

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

ISCA '11: Proceedings of the 38th annual international symposium on Computer architectureJune 2011Pages 141–152https://doi.org/10.1145/2000064.2000081

Published:04 June 2011Publication History

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

Pages 141–152

ABSTRACT

Chip multiprocessors (CMPs) share a large portion of the memory subsystem among multiple cores. Recent proposals have addressed high-performance and fair management of these shared resources; however, none of them take into account prefetch requests. Without prefetching, significant performance is lost, which is why existing systems prefetch. By not taking into account prefetch requests, recent shared-resource management proposals often significantly degrade both performance and fairness, rather than improve them in the presence of prefetching.

This paper is the first to propose mechanisms that both manage the shared resources of a multi-core chip to obtain high-performance and fairness, and also exploit prefetching. We apply our proposed mechanisms to two resource-based management techniques for memory scheduling and one source-throttling-based management technique for the entire shared memory system. We show that our mechanisms improve the performance of a 4-core system that uses network fair queuing, parallelism-aware batch scheduling, and fairness via source throttling by 11.0%, 10.9%, and 11.3% respectively, while also significantly improving fairness.

Supplemental Material

isca_3b_4.mp4

mp4

389.7 MB

Download

References

J. Baer and T. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of Supercomputing '91, 1991. Google ScholarDigital Library
J. Casazza. First the Tick, Now the Tock: Intel Microarchitecture (Nehalem) -- White Paper. Intel, 2009.Google Scholar
R. Das et al. Application-aware prioritization mechanisms for on-chip networks. In MICRO-42, 2009. Google ScholarDigital Library
R. Das et al. Aergia: Exploiting packet latency slack in on-chip networks. In ISCA-37, 2010. Google ScholarDigital Library
J. Doweck. Inside Intel Core Microarchitecture and Smart Memory Access -- White Paper. Intel.Google Scholar
E. Ebrahimi et al. Coordinated control of multiple prefetchers in multi-core systems. In MICRO, 2009. Google ScholarDigital Library
E. Ebrahimi et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.Google ScholarCross Ref
E. Ebrahimi et al. Fairness via source throttling: A configrable and high-performance fairness substrate for multi-core memory systems. In ASPLOS-XV, 2010. Google ScholarDigital Library
E. Ebrahimi et al. Prefetch-aware shared-resource management for multi-core systems. Technical Report TR-HPS-2010-005, The University of Texas at Austin, 2010.Google Scholar
R. Gabor et al. Fairness and throughput in switch on even multithreading. In MICRO-39, 2006. Google ScholarDigital Library
B. Grot et al. Preemptive virtual clock: A flexible, efficient, and cost-effective QoS scheme for networks-on-a-chip. In MICRO-42, 2009. Google ScholarDigital Library
L. R. Hsu et al. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In PACT-15, 2006. Google ScholarDigital Library
R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS'07. Google ScholarDigital Library
S. Kim et al. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT-13, 2004. Google ScholarDigital Library
Y. Kim et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16, 2010.Google Scholar
Y. Kim et al. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO-43, 2010. Google ScholarDigital Library
H. Q. Le et al. IBM POWER6 microarchitecture. IBM Journal of Research and Development, 51:639--662, 2007. Google ScholarDigital Library
C. J. Lee et al. Prefetch-aware DRAM controllers. In MICRO-41, 2008. Google ScholarDigital Library
C. J. Lee et al. Improving memory bank-level parallelism in the presence of prefetching. In MICRO-42, 2009. Google ScholarDigital Library
J. W. Lee et al. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In ISCA-35, 2008. Google ScholarDigital Library
W.-F. Lin et al. Filtering superfluous prefetches using density vectors. In ICCD, 2001.Google Scholar
K. Luo et al. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google Scholar
Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 - 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google Scholar
T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008. Google ScholarDigital Library
K. J. Nesbit et al. Virtual private caches. In ISCA-34. Google ScholarDigital Library
K. J. Nesbit et al. AC/DC: An adaptive data cache prefetcher. In PACT-13, 2004. Google ScholarDigital Library
K. J. Nesbit et al. Fair queuing memory systems. In MICRO-39, 2006. Google ScholarDigital Library
J. Owen and M. Steinman. Northbridge architecture of AMD's Griffin microprocessor family. IEEE Micro, 28(2), 2008. Google ScholarDigital Library
H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarDigital Library
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006. Google ScholarDigital Library
N. Rafique et al. Architectural support for operating system-driven CMP cache management. In PACT-15, 2006. Google ScholarDigital Library
S. Rixner et al. Memory access scheduling. In ISCA-27, 2000. Google ScholarDigital Library
A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS-IX, 2000. Google ScholarDigital Library
S. Srinath et al. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007. Google ScholarDigital Library
V. Srinivasan et al. A static filter for reducing prefetch traffic. Technical Report CSE-TR-400-99, University of Michigan, 1999.Google Scholar
J. Tendler et al. POWER4 system microarchitecture. IBM Technical White Paper, 2001.Google Scholar
X. Zhuang and H.-H. S. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP-32, 2003.Google ScholarCross Ref

Index Terms

Prefetch-aware shared resource management for multi-core systems
1. Applied computing
  1. Computers in other domains
    1. Personal computers and PC applications
      1. Microcomputers
2. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data

Recommendations

Prefetch-aware shared resource management for multi-core systems
ISCA '11

Chip multiprocessors (CMPs) share a large portion of the memory subsystem among multiple cores. Recent proposals have addressed high-performance and fair management of these shared resources; however, none of them take into account prefetch requests. ...
Read More
Coordinated control of multiple prefetchers in multi-core systems
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Aggressive prefetching is very beneficial for memory latency tolerance of many applications. However, it faces significant challenges in multi-core systems. Prefetchers of different cores on a chip multiprocessor (CMP) can cause significant interference ...
Read More
PACP: A Prefetch-aware Multi-core Shared Cache Partitioning Strategy
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence

In multi-core systems, hardware prefetchers aggravate the preemption of some access-intensive programs for shared last level cache (LLC) resources, resulting in lower system performance. As a solution, we propose a prefetch-aware multi-core shared cache ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
June 2011
488 pages
ISBN:9781450304726
DOI:10.1145/2000064
General Chairs:
Ravi Iyer
Intel
,
Qing Yang
University of Rhode Island
,
Program Chair:
Antonio González
Intel and UPC
ACM SIGARCH Computer Architecture News Volume 39, Issue 3
ISCA '11
June 2011
462 pages
ISSN:0163-5964
DOI:10.1145/2024723
Issue’s Table of Contents
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fairness
multi-core
prefetching
shared resources
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 104
  Total Citations
  View Citations
- 1,236
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Prefetch-aware shared resource management for multi-core systems

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Prefetch-aware shared resource management for multi-core systems

Coordinated control of multiple prefetchers in multi-core systems

PACP: A Prefetch-aware Multi-core Shared Cache Partitioning Strategy