research-article

Multi-grain coherence directories

Authors:
Jason Zebchuk

University of Toronto

University of Toronto
View Profile

,
Babak Falsafi

EcoCloud, EPFL

EcoCloud, EPFL
View Profile

,
Andreas Moshovos

University of Toronto

University of Toronto
View Profile

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on MicroarchitectureDecember 2013Pages 359–370https://doi.org/10.1145/2540708.2540739

Published:07 December 2013Publication History

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 359–370

ABSTRACT

Conventional directory coherence operates at the finest granularity possible, that of a cache block. While simple, this organization fails to exploit frequent application behavior: at any given point in time, large, continuous chunks of memory are often accessed only by a single core.

We take advantage of this behavior and investigate reducing the coherence directory size by tracking coherence at multiple different granularities. We show that such a Multi-grain Directory (MGD) can significantly reduce the required number of directory entries across a variety of different workloads. Our analysis shows a simple dual-grain directory (DGD) obtains the majority of the benefit while tracking individual cache blocks and coarse-grain regions of 1KB to 8KB. We propose a practical DGD design that is transparent to software, requires no changes to the coherence protocol, and has no unnecessary bandwidth overhead. This design can reduce the coherence directory size by 41% to 66% with no statistically significant performance loss.

References

First the tick, now the tock: Next generation Intel microarchitecture (Nehalem). White Paper, 2008.Google Scholar
OpenSPARC#8482; system-on-chip (SoC) microarchitecture specification, May 2008.Google Scholar
A. Agarwal et al. An evaluation of directory schemes for cache coherence. In Proc. of the Int'l Symposium on Computer Architecture, June 1988. Google ScholarDigital Library
M. Alisafaee. Spatiotemporal coherence tracking. In Proc of the Int'l Symposium on Microarchitecture, Dec. 2012. Google ScholarDigital Library
L. A. Barroso et al. Piranha: a scalable architecture base on single-chip multiprocessing. In Proc. of the Int'l Symposium on Computer Architecture, June 2005. Google ScholarDigital Library
C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarDigital Library
J. F. Cantin, M. H. Lipasti, and J. E. Smith. Improving multiprocessor performance with coarse-grain coherence tracking. In Proc. of the Int'l Symposium on Computer Architecture, June 2005. Google ScholarDigital Library
J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In Proc. of the Ninth Annual ACM Symposium on Theory of Computing, 1977. Google ScholarDigital Library
J. H. Choi and K. H. Park. Segment directory enhancing the limited directory cache coherence schemes. In Proc. of the Int'l Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, pages 258--267, Apr 1999. Google ScholarDigital Library
G. Chrysos. Intel^® many integrated core architecture: The first Intel^® Xeon Phi coprocessor (codenamed Knights Corner). presented at Hot Chips 24, Stanford, CA, Aug. 2012.Google Scholar
B. A. Cuesta et al. Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In Proc. of the Int'l Symposium on Computer Architecture, 2011. Google ScholarDigital Library
M. Ferdman et al. Cuckoo directory: A scalable directory for many-core systems. In Proc. of the Int'l Symposium on High Performance Computer Architecture, Feb. 2011. Google ScholarDigital Library
M. Ferdman et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proc. of the Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, 2012. Google ScholarDigital Library
G. Grohoski. Niagara2: A highly-threaded server-on-a-chip. presented at Hot Chips 18, Stanford, CA, Aug. 2006.Google Scholar
S.-L. Guo et al. Hierarchical cache directory for CMP. Journal of Computer Science and Technology, 25:246--256, 2010.Google ScholarCross Ref
A. Gupta, W.-D. Weber, and T. Mowry. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In Proc. of the Int'l Conf. on Parallel Processing, 1990.Google Scholar
N. Hardavellas et al. Reactive NUCA: near-optimal block placement and replication in distributed caches. In Proc. of the Int'l Symposium on Computer Architecture, 2009. Google ScholarDigital Library
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA highly scalable server. In Proc. of the Int'l Symposium on Computer Architecture, June 1997. Google ScholarDigital Library
P. Magnusson et al. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, Feb. 2002. Google ScholarDigital Library
M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, July 2012. Google ScholarDigital Library
A. Moshovos. RegionScout: Exploiting coarse grain sharing in snoop-based coherence. In Proc. of the Int'l Symposium on Computer Architecture, June 2005. Google ScholarDigital Library
A. Ros and S. Kaxiras. Complexity-effective multicore coherence. In Proc of the Int'l Conf. on Parallel Architectures and Compilation Techniques, 2012. Google ScholarDigital Library
D. Sanchez and C. Kozyrakis. The ZCache: Decoupling ways and associativity. In Proc. of the Int'l Symp. on Microarchitecture, Dec. 2010. Google ScholarDigital Library
D. Sanchez and C. Kozyrakis. SCD: A scalable coherence directory with flexible sharer set encoding. In Proc. of the Int'l Symposium on High-Performance Computer Architecture, Feb. 2012. Google ScholarDigital Library
A. Seznec. A case for two-way skewed-associative caches. In Proc. of the Int'l Symposium on Computer Architecture, 1993. Google ScholarDigital Library
S. Turullols and R. Sivaramakrishnan. SPARC T5: 16-core CMT processor with glueless 1-hop scaling to 8-sockets. presented at Hot Chips 24, Stanford, CA, Aug. 2012.Google ScholarCross Ref
D. A. Wallach. PHD: A hierarchical cache coherent protocol. Technical report, Cambridge, MA, USA, 1992. Google ScholarDigital Library
T. F. Wenisch et al. SimFlex: statistical sampling of computer system simulation. IEEE Micro, 26(4):18--31, 2006. Google ScholarDigital Library
B. Wheeler. Tilera sees opening in clouds. Microprocessor Report, 25(7):13--16, July 2011.Google Scholar
R. E. Wunderlich et al. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proc. of the Int'l Symposium on Computer Architecture, June 2003. Google ScholarDigital Library
Q. Yang, G. Thangadurai, and L. M. Bhuyan. Design of an adaptive cache coherence protocol for large scale multiprocessors. IEEE Trans. Parallel Distrib. Syst., 3(3):281--293, May 1992. Google ScholarDigital Library
J. Zebchuk et al. A tagless coherence directory. In Proc. of the Int'l Symposium on Microarchitecture, Dec. 2009. Google ScholarDigital Library
H. Zhao et al. SPACE: sharing pattern-based directory coherence for multicore scalability. In Proc. of the Int'l Conf. on Parallel Architectures and Compilation Techniques, 2010. Google ScholarDigital Library
H. Zhao et al. Spatl: Honey, i shrunk the coherence directory. In Proc of the 2011 Int'l Conf. on Parallel Architectures and Compilation Techniques, 2011. Google ScholarDigital Library

Index Terms

Multi-grain coherence directories
1. Computer systems organization
  1. Architectures
    1. Parallel architectures

Recommendations

A tagless coherence directory
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

A key challenge in architecting a CMP with many cores is maintaining cache coherence in an efficient manner. Directory-based protocols avoid the bandwidth overhead of snoop-based protocols, and therefore scale to a large number of cores. Unfortunately, ...
Read More
Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table
CF '16: Proceedings of the ACM International Conference on Computing Frontiers

Chip multiprocessors (CMPs) require effective cache coherence protocols as well as fast virtual-to-physical address translation mechanisms for high performance. Directory-based cache coherence protocols are the state-of-the-art approaches in many-core ...
Read More
Filtering directory lookups in CMPs

Coherence protocols consume an important fraction of power to determine which coherence action to perform. Specifically, on CMPs with shared cache and directory-based coherence protocol implemented as a duplicate of local caches tags, we have observed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
December 2013
498 pages
ISBN:9781450326384
DOI:10.1145/2540708
General Chair:
Matthew Farrens
UC Davis
,
Program Chair:
Christos Kozyrakis
Stanford University
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache coherence
coherence directory
Qualifiers
- research-article
Conference

Acceptance Rates
MICRO-46 Paper Acceptance Rate39of239submissions,16%Overall Acceptance Rate484of2,242submissions,22%
More
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 566
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-grain coherence directories

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

A tagless coherence directory

Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table

Filtering directory lookups in CMPs