skip to main content
10.1145/2540708.2540739acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Multi-grain coherence directories

Published:07 December 2013Publication History

ABSTRACT

Conventional directory coherence operates at the finest granularity possible, that of a cache block. While simple, this organization fails to exploit frequent application behavior: at any given point in time, large, continuous chunks of memory are often accessed only by a single core.

We take advantage of this behavior and investigate reducing the coherence directory size by tracking coherence at multiple different granularities. We show that such a Multi-grain Directory (MGD) can significantly reduce the required number of directory entries across a variety of different workloads. Our analysis shows a simple dual-grain directory (DGD) obtains the majority of the benefit while tracking individual cache blocks and coarse-grain regions of 1KB to 8KB. We propose a practical DGD design that is transparent to software, requires no changes to the coherence protocol, and has no unnecessary bandwidth overhead. This design can reduce the coherence directory size by 41% to 66% with no statistically significant performance loss.

References

  1. First the tick, now the tock: Next generation Intel microarchitecture (Nehalem). White Paper, 2008.Google ScholarGoogle Scholar
  2. OpenSPARC#8482; system-on-chip (SoC) microarchitecture specification, May 2008.Google ScholarGoogle Scholar
  3. A. Agarwal et al. An evaluation of directory schemes for cache coherence. In Proc. of the Int'l Symposium on Computer Architecture, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Alisafaee. Spatiotemporal coherence tracking. In Proc of the Int'l Symposium on Microarchitecture, Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. A. Barroso et al. Piranha: a scalable architecture base on single-chip multiprocessing. In Proc. of the Int'l Symposium on Computer Architecture, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. F. Cantin, M. H. Lipasti, and J. E. Smith. Improving multiprocessor performance with coarse-grain coherence tracking. In Proc. of the Int'l Symposium on Computer Architecture, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In Proc. of the Ninth Annual ACM Symposium on Theory of Computing, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. H. Choi and K. H. Park. Segment directory enhancing the limited directory cache coherence schemes. In Proc. of the Int'l Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, pages 258--267, Apr 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Chrysos. Intel® many integrated core architecture: The first Intel® Xeon Phi coprocessor (codenamed Knights Corner). presented at Hot Chips 24, Stanford, CA, Aug. 2012.Google ScholarGoogle Scholar
  11. B. A. Cuesta et al. Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks. In Proc. of the Int'l Symposium on Computer Architecture, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Ferdman et al. Cuckoo directory: A scalable directory for many-core systems. In Proc. of the Int'l Symposium on High Performance Computer Architecture, Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Ferdman et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proc. of the Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Grohoski. Niagara2: A highly-threaded server-on-a-chip. presented at Hot Chips 18, Stanford, CA, Aug. 2006.Google ScholarGoogle Scholar
  15. S.-L. Guo et al. Hierarchical cache directory for CMP. Journal of Computer Science and Technology, 25:246--256, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Gupta, W.-D. Weber, and T. Mowry. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In Proc. of the Int'l Conf. on Parallel Processing, 1990.Google ScholarGoogle Scholar
  17. N. Hardavellas et al. Reactive NUCA: near-optimal block placement and replication in distributed caches. In Proc. of the Int'l Symposium on Computer Architecture, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA highly scalable server. In Proc. of the Int'l Symposium on Computer Architecture, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Magnusson et al. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. M. K. Martin, M. D. Hill, and D. J. Sorin. Why on-chip cache coherence is here to stay. Commun. ACM, 55(7):78--89, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Moshovos. RegionScout: Exploiting coarse grain sharing in snoop-based coherence. In Proc. of the Int'l Symposium on Computer Architecture, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Ros and S. Kaxiras. Complexity-effective multicore coherence. In Proc of the Int'l Conf. on Parallel Architectures and Compilation Techniques, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Sanchez and C. Kozyrakis. The ZCache: Decoupling ways and associativity. In Proc. of the Int'l Symp. on Microarchitecture, Dec. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Sanchez and C. Kozyrakis. SCD: A scalable coherence directory with flexible sharer set encoding. In Proc. of the Int'l Symposium on High-Performance Computer Architecture, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Seznec. A case for two-way skewed-associative caches. In Proc. of the Int'l Symposium on Computer Architecture, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Turullols and R. Sivaramakrishnan. SPARC T5: 16-core CMT processor with glueless 1-hop scaling to 8-sockets. presented at Hot Chips 24, Stanford, CA, Aug. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  27. D. A. Wallach. PHD: A hierarchical cache coherent protocol. Technical report, Cambridge, MA, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. F. Wenisch et al. SimFlex: statistical sampling of computer system simulation. IEEE Micro, 26(4):18--31, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Wheeler. Tilera sees opening in clouds. Microprocessor Report, 25(7):13--16, July 2011.Google ScholarGoogle Scholar
  30. R. E. Wunderlich et al. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proc. of the Int'l Symposium on Computer Architecture, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Q. Yang, G. Thangadurai, and L. M. Bhuyan. Design of an adaptive cache coherence protocol for large scale multiprocessors. IEEE Trans. Parallel Distrib. Syst., 3(3):281--293, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Zebchuk et al. A tagless coherence directory. In Proc. of the Int'l Symposium on Microarchitecture, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Zhao et al. SPACE: sharing pattern-based directory coherence for multicore scalability. In Proc. of the Int'l Conf. on Parallel Architectures and Compilation Techniques, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Zhao et al. Spatl: Honey, i shrunk the coherence directory. In Proc of the 2011 Int'l Conf. on Parallel Architectures and Compilation Techniques, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-grain coherence directories

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
      December 2013
      498 pages
      ISBN:9781450326384
      DOI:10.1145/2540708

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 December 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      MICRO-46 Paper Acceptance Rate39of239submissions,16%Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader