skip to main content
10.1145/1006209.1006246acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

CQoS: a framework for enabling QoS in shared caches of CMP platforms

Published:26 June 2004Publication History

ABSTRACT

Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and their workloads range from single-threaded and multithreaded applications to complex virtual machines (VMs), a shared cache resource will be consumed by these different entities generating heterogeneous memory access streams exhibiting different locality properties and varying memory sensitivity. As a result, conventional cache management approaches that treat all memory accesses equally are bound to result in inefficient space utilization and poor performance even for applications with good locality properties. To address this problem, this paper presents a new cache management framework (CQoS) that (1) recognizes the heterogeneity in memory access streams, (2) introduces the notion of QoS to handle the varying degrees of locality and latency sensitivity and (3) assigns and enforces priorities to streams based on latency sensitivity, locality degree and application performance needs. To achieve this, we propose CQoS options for priority classification, priority assignment and priority enforcement. We briefly describe CQoS priority classification and assignment options -- ranging from user-driven and developer-driven to compiler-detected and flow-based approaches. Our focus in this paper is on CQoS mechanisms for priority enforcement -- these include (1) selective cache allocation, (2) static/dynamic set partitioning and (3) heterogeneous cache regions. We discuss the architectural design and implementation complexity of these CQoS options. To evaluate the performance trade-offs for these options, we have modeled these CQoS options in a cache simulator and evaluated their performance in CMP platforms running network-intensive server workloads. Our simulation results show the effectiveness of our proposed options and make the case for CQoS in future multi-threaded/multi-core platforms since it improves shared cache efficiency and increases overall system performance as a result.

References

  1. H. Abdel-Shafi, et al., "An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors," Proceedings of the 3rd International Symposium on High-Performance Computer Architecture, February 1997, 204--215.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Beyls, "Faster Computing through Software-Controlled Cache Replacement," http://escher.elis.ugent.be/publ/Edocs/DOC/P102_118.pdf]]Google ScholarGoogle Scholar
  3. F. Bodin, A. Seznec, "Skewed Associativity improves performance and enhances predictability", IEEE Transactions on Computers, May 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Clark et. al., "An analysis of TCP Processing overhead", IEEE Communications, June 1989.]]Google ScholarGoogle Scholar
  5. T. Garfinkel , Ben Pfaff , Jim Chow , Mendel Rosenblum , Dan Boneh, "Terra: a virtual machine-based platform for trusted computing," Proceedings of the 9th ACM symposium on Operating Systems Principles, Oct 2003, NY, USA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Iyer, "CASPER: Cache Architecture, Simulation and Performance Exploration using Re-streams," Intel's Design and Test Technology Conference (DTTC), 2001.]]Google ScholarGoogle Scholar
  7. R. Iyer, "On Modeling and Analyzing Cache Hierarchies using CASPER," MASCOTS-11, 2003.]]Google ScholarGoogle Scholar
  8. P. Jain, et al., "Software Assisted Cache Replacement and Prefetching Pollution Control," http://www.csail.mit.edu/research/abstracts/abstracts03/architecture/24jain.pdf]]Google ScholarGoogle Scholar
  9. N.P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," Proceedings of 17th International Symposium on Computer Architecture, pages 364--373. IEEE, June 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S.T. King, George W. Dunlap, Peter M. Chen, "Operating System Support for Virtual Machines", Proceedings of the 2003 Annual USENIX Technical Conference, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Koufaty, et.al, "Data Forwarding in Scalable Shared Memory Multiprocessors, IEEE TPDS, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Lilja and P-C. Yew, "Combining hardware and software cache coherence strategies," International Conference on Supercomputing, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Makineni and R. Iyer, "Architectural Characterization of TCP/IP Packet Processing on the Pentium® M microprocessor," HPCA-10, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Makineni and R. Iyer, "Performance Characterization of TCP/IP Packet Processing in Commercial Workloads," IEEE WWC-6, 2003.]]Google ScholarGoogle Scholar
  15. D. Marr et al., "Hyper-Threading Technology Architecture and Microarchitecture" Intel Technology Journal, 2002. http://www.intel.com/technology/itj/2002/volume06issue01/]]Google ScholarGoogle Scholar
  16. M. Martin, et al., "Token Coherence: A New Framework for Shared-Memory Multiprocessors," IEEE Micro Special Issue, Nov-Dec 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Megido, "Adaptive Replacement Cache," IBM T.J. Watson Research Center, http://www.almaden.ibm.com/cs/people/dmodha/arc-fast.pdf]]Google ScholarGoogle Scholar
  18. D. Minturn, et al., "Exploiting Architectural Techniques for Improving TCP/IP Processing Performance," submitted to a conference.]]Google ScholarGoogle Scholar
  19. B. Nayfeh, K. Olukotun and J.P. Singh, "The Impact of Shared Cache Clustering in Small-Scale Shared Memory Multiprocessors," Int'l Conference on High Performance Computer Architecture (HPCA-1), Feb 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. B. Postel, "Transmission Control Protocol", RFC 793, Information Sciences Institute, Sept. 1981.]]Google ScholarGoogle Scholar
  21. D.K. Poulsen and P.C. Yew, "Integrating Fine Grained Message Passing in Cache Coherent Shared Memory Multiprocessors," Journal of Parallel and Distributed Computing, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Ranganathan, et al., "The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems," 24th International Symposium on Computer Architecture, June 1997, 144--156.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Seznec, "Decoupled Sectored Caches", IEEE Transactions on Computers, Feb. 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. SimpleScalar LLC, http://www.simplescalar.com]]Google ScholarGoogle Scholar
  25. Y. Solihin, J. Lee, and Josep Torrellas. "Using a User-Level Memory Thread for Correlation Prefetching", The 29th Annual International Symposium on Computer Architecture (ISCA 2002), Anchorage, Alaska, May 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. "SPECweb99 Design Document," available at http://www.specbench.org/osg/web99/docs/whitepaper.html]]Google ScholarGoogle Scholar
  27. P. Stenstrom, "A Survey of Cache Coherence Protocols," IEEE Computer, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Suh, L. Rudolph and S. Devadas, "Dynamic Partitioning of Shared Cache Memory," Journal of Supercomputing, July 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. "The TTCP Benchmark", http://ftp.arl.mil/~mike/ttcp.html]]Google ScholarGoogle Scholar
  30. D. M. Tullsen and S. J. Eggers. "Limitations of Cache Prefetching on a Bus-Based Multiprocessor," Proc. 20th Annual Int. Symposium on Computer Architecture, pp.278--288, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D.M. Tullsen, S.J. Eggers, and H.M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," 22nd International Symposium on Computer Architecture, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. VMware Inc., "VMware is Virtual Infrastructure", http://www.vmware.com/vinfrastructure/]]Google ScholarGoogle Scholar
  33. C. A. Waldspurger, "Memory Resource Management in VMware ESX Server," 5th Symposium on OSDI, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. A. Wulf and S. A. McKee. "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, 23(1):20--24, Mar 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Zhao, et al., "Efficient Cache Structures and Policies for Server Network Acceleration," submitted to a conference.]]Google ScholarGoogle Scholar

Index Terms

  1. CQoS: a framework for enabling QoS in shared caches of CMP platforms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICS '04: Proceedings of the 18th annual international conference on Supercomputing
      June 2004
      360 pages
      ISBN:1581138393
      DOI:10.1145/1006209

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 June 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate584of2,055submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader