skip to main content
10.1145/1519065.1519076acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Towards practical page coloring-based multicore cache management

Published:01 April 2009Publication History

ABSTRACT

Modern multi-core processors present new resource management challenges due to the subtle interactions of simultaneously executing processes sharing on-chip resources (particularly the L2 cache). Recent research demonstrates that the operating system may use the page coloring mechanism to control cache partitioning, and consequently to achieve fair and efficient cache utilization. However, page coloring places additional constraints on memory space allocation, which may conflict with application memory needs. Further, adaptive adjustments of cache partitioning policies in a multi-programmed execution environment may incur substantial overhead for page recoloring (or copying). This paper proposes a hot-page coloring approach enforcing coloring on only a small set of frequently accessed (or hot) pages for each process. The cost of identifying hot pages online is reduced by leveraging the knowledge of spatial locality during a page table scan of access bits. Our results demonstrate that hot page identification and selective coloring can significantly alleviate the coloring-induced adverse effects in practice. However, we also reach the somewhat negative conclusion that without additional hardware support, adaptive page coloring is only beneficial when recoloring is performed infrequently (meaning long scheduling time quanta in multi-programmed executions).

References

  1. Amazon. Amazon elastic compute cloud. http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  2. AMD64-manual. AMD-64 architecture programmer's manual, 2008.Google ScholarGoogle Scholar
  3. E. Bugnion, J. M. Anderson, and M. S. Lam. Compiler-directed page coloring for multiprocessors. In 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 244--255, Cambridge, MA, October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In 37th Conf. on Design Automation, pages 416--419, Los Angeles, CA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In 39th Int'l Symp. on Microarchitecture (Micro), pages 455--468, Orlando, FL, December 2006. Google ScholarGoogle Scholar
  6. A. Fedorova, M. Seltzer, and M.D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In 16th Int'l Conf. on Parallel Architecture and Compilation Techniques (PACT), pages 25--36, Brasov, Romania, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. GoGrid. http://www.gogrid.com.Google ScholarGoogle Scholar
  8. L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 13--22, 2006. Google ScholarGoogle Scholar
  9. IA32-manual. IA-32 Intel architecture software developer's manual, 2008. http://www.intel.com&/products/processor/manuals/.Google ScholarGoogle Scholar
  10. Intel. TLBs, paging-structure caches, and their invalidation, 2008. http://www.intel.com/design/processor/applnots/317080.pdf.Google ScholarGoogle Scholar
  11. R. Iyer, L. Zhao, F. Guo, R. Illikkal, Don Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS policies and architecture for cache/memroy in CMP platforms. In ACM SIGMETRICS, pages 25--36, San Diego, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. E. Kessler and M. D. Hill. Page placement algorithms for large real-indexed caches. ACM Trans. on Computer Systems, 10 (4): 338--359, November 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 111--122, 2004. Google ScholarGoogle Scholar
  14. D. Lee, J. Choi, J. H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. on Computers, 50 (12): 1352--1361, December 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Int'l Symp. on High-Performance Computer Architecture (HPCA), pages 367--378, Salt Lake, UT, February 2008.Google ScholarGoogle Scholar
  16. P. Lu and K. Shen. Virtual machine memory access tracing with hypervisor exclusive cache. In USENIX Annual Technical Conf. (USENIX), pages 29--43, Santa Clara, CA, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. A. Patterson. Latency lags bandwith. Communications of the ACM, 47 (10): 71--75, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, hight-performance, runtime mechanism to partition shared caches. In 39th Int'l Symp. on Microarchitecture (Micro), pages 423--432, Orlando, FL, December 2006. Google ScholarGoogle Scholar
  19. N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 2--12, 2006. Google ScholarGoogle Scholar
  20. A. Raghuraman. Miss-ratio curve directed memory management for high performance and low energy. Master's thesis, Dept. of Computer Science, UIUC, 2003.Google ScholarGoogle Scholar
  21. T. H. Romer, D. Lee, B. N. Bershad, and J. B. Chen. Dynamic page mapping policies for cache conflict resolution on standard hardware. In First USENIX Symp. on Operating Systems Design and Implementation (OSDI), pages 255--266, Monterey, CA, November 1994.Google ScholarGoogle Scholar
  22. T. Sherwood, B. Calder, and J. Emer. Reducing cache misses using hardware and software page replacement. In 13th Int'l Conf. on Supercomputing (ICS), pages 155--164, Rhodes, Greece, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In 41th Int'l Symp. on Microarchitecture (Micro), pages 258--269, Lake Como, ITALY, November 2008. Google ScholarGoogle Scholar
  24. L. B. Sokolinsky. LFU-K: An effective buffer management replacement algorithm. In 9th International Conference on Database Systems for Advanced Applications, pages 670--681, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  25. H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Trans. on Computers, 41 (9): 1054--1068, September 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. E. Suh, L. Rudolph, and Srini Devadas. Dynamic cache partitioning for simultaneous multithreading systems. In Int'l Conf. on Parallel and Distributed Computing and Systems, pages 116--127, Anaheim, CA, August 2001.Google ScholarGoogle Scholar
  27. D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared L2 caches on multicore systems in software. In Workshop on the Interaction between Operating Systems and Computer Architecture, San Diego, CA, June 2007.Google ScholarGoogle Scholar
  28. G. Taylor, P. Davies, and M. Farmwald. The TLB slice a low-cost high-speed address translation mechanism. In 17th Int'l Symp. on Computer Architecture (ISCA), pages 355--363, Seattle, WA, June 1990. Google ScholarGoogle Scholar
  29. C. A. Waldspurger. Memory resource management in vmware ESX server. In 5th USENIX Symp. on Operating Systems Design and Implementation (OSDI), pages 181--194, Boston, MA, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor hardware counter statistics as a first-class system resource. In 11th Workshop on Hot Topics in Operating Systems (HotOS), San Diego, CA, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Zhao, R. Iyer, R. Illikkal, J. Moses, D. Newell, and S. Makineni. CacheScouts: Fine-grain monitoring of shared caches in CMP platforms. In 16th Int'l Conf. on Parallel Architecture and Compilation Techniques (PACT), pages 339--352, Brasov, Romania, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In 11th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 177--188, Boston, MA, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards practical page coloring-based multicore cache management

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems
          April 2009
          342 pages
          ISBN:9781605584829
          DOI:10.1145/1519065

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 April 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate241of1,308submissions,18%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader