skip to main content
10.1145/2000064.2000081acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Prefetch-aware shared resource management for multi-core systems

Published:04 June 2011Publication History

ABSTRACT

Chip multiprocessors (CMPs) share a large portion of the memory subsystem among multiple cores. Recent proposals have addressed high-performance and fair management of these shared resources; however, none of them take into account prefetch requests. Without prefetching, significant performance is lost, which is why existing systems prefetch. By not taking into account prefetch requests, recent shared-resource management proposals often significantly degrade both performance and fairness, rather than improve them in the presence of prefetching.

This paper is the first to propose mechanisms that both manage the shared resources of a multi-core chip to obtain high-performance and fairness, and also exploit prefetching. We apply our proposed mechanisms to two resource-based management techniques for memory scheduling and one source-throttling-based management technique for the entire shared memory system. We show that our mechanisms improve the performance of a 4-core system that uses network fair queuing, parallelism-aware batch scheduling, and fairness via source throttling by 11.0%, 10.9%, and 11.3% respectively, while also significantly improving fairness.

Skip Supplemental Material Section

Supplemental Material

isca_3b_4.mp4

mp4

389.7 MB

References

  1. J. Baer and T. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of Supercomputing '91, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Casazza. First the Tick, Now the Tock: Intel Microarchitecture (Nehalem) -- White Paper. Intel, 2009.Google ScholarGoogle Scholar
  3. R. Das et al. Application-aware prioritization mechanisms for on-chip networks. In MICRO-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Das et al. Aergia: Exploiting packet latency slack in on-chip networks. In ISCA-37, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Doweck. Inside Intel Core Microarchitecture and Smart Memory Access -- White Paper. Intel.Google ScholarGoogle Scholar
  6. E. Ebrahimi et al. Coordinated control of multiple prefetchers in multi-core systems. In MICRO, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Ebrahimi et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA-15, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  8. E. Ebrahimi et al. Fairness via source throttling: A configrable and high-performance fairness substrate for multi-core memory systems. In ASPLOS-XV, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Ebrahimi et al. Prefetch-aware shared-resource management for multi-core systems. Technical Report TR-HPS-2010-005, The University of Texas at Austin, 2010.Google ScholarGoogle Scholar
  10. R. Gabor et al. Fairness and throughput in switch on even multithreading. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Grot et al. Preemptive virtual clock: A flexible, efficient, and cost-effective QoS scheme for networks-on-a-chip. In MICRO-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. R. Hsu et al. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In PACT-15, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Iyer et al. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kim et al. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT-13, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Kim et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In HPCA-16, 2010.Google ScholarGoogle Scholar
  16. Y. Kim et al. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In MICRO-43, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Q. Le et al. IBM POWER6 microarchitecture. IBM Journal of Research and Development, 51:639--662, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. J. Lee et al. Prefetch-aware DRAM controllers. In MICRO-41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. J. Lee et al. Improving memory bank-level parallelism in the presence of prefetching. In MICRO-42, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. W. Lee et al. Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In ISCA-35, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W.-F. Lin et al. Filtering superfluous prefetches using density vectors. In ICCD, 2001.Google ScholarGoogle Scholar
  22. K. Luo et al. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google ScholarGoogle Scholar
  23. Micron. Datasheet: 2Gb DDR3 SDRAM, MT41J512M4 - 64 Meg x 4 x 8 banks, http://download.micron.com/pdf/datasheets/dram/ddr3.Google ScholarGoogle Scholar
  24. T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA-35, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. J. Nesbit et al. Virtual private caches. In ISCA-34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. J. Nesbit et al. AC/DC: An adaptive data cache prefetcher. In PACT-13, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. J. Nesbit et al. Fair queuing memory systems. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Owen and M. Steinman. Northbridge architecture of AMD's Griffin microprocessor family. IEEE Micro, 28(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO-39, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. Rafique et al. Architectural support for operating system-driven CMP cache management. In PACT-15, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Rixner et al. Memory access scheduling. In ISCA-27, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Snavely and D. M. Tullsen. Symbiotic job scheduling for a simultaneous multithreading processor. In ASPLOS-IX, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Srinath et al. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA-13, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. V. Srinivasan et al. A static filter for reducing prefetch traffic. Technical Report CSE-TR-400-99, University of Michigan, 1999.Google ScholarGoogle Scholar
  38. J. Tendler et al. POWER4 system microarchitecture. IBM Technical White Paper, 2001.Google ScholarGoogle Scholar
  39. X. Zhuang and H.-H. S. Lee. A hardware-based cache pollution filtering mechanism for aggressive prefetches. In ICPP-32, 2003.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Prefetch-aware shared resource management for multi-core systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
        June 2011
        488 pages
        ISBN:9781450304726
        DOI:10.1145/2000064
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 39, Issue 3
          ISCA '11
          June 2011
          462 pages
          ISSN:0163-5964
          DOI:10.1145/2024723
          Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 June 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate543of3,203submissions,17%

        Upcoming Conference

        ISCA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader