research-article

Towards practical page coloring-based multicore cache management

Authors:
Xiao Zhang

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Sandhya Dwarkadas

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Kai Shen

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

EuroSys '09: Proceedings of the 4th ACM European conference on Computer systemsApril 2009Pages 89–102https://doi.org/10.1145/1519065.1519076

Published:01 April 2009Publication History

EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems

Pages 89–102

ABSTRACT

Modern multi-core processors present new resource management challenges due to the subtle interactions of simultaneously executing processes sharing on-chip resources (particularly the L2 cache). Recent research demonstrates that the operating system may use the page coloring mechanism to control cache partitioning, and consequently to achieve fair and efficient cache utilization. However, page coloring places additional constraints on memory space allocation, which may conflict with application memory needs. Further, adaptive adjustments of cache partitioning policies in a multi-programmed execution environment may incur substantial overhead for page recoloring (or copying). This paper proposes a hot-page coloring approach enforcing coloring on only a small set of frequently accessed (or hot) pages for each process. The cost of identifying hot pages online is reduced by leveraging the knowledge of spatial locality during a page table scan of access bits. Our results demonstrate that hot page identification and selective coloring can significantly alleviate the coloring-induced adverse effects in practice. However, we also reach the somewhat negative conclusion that without additional hardware support, adaptive page coloring is only beneficial when recoloring is performed infrequently (meaning long scheduling time quanta in multi-programmed executions).

References

Amazon. Amazon elastic compute cloud. http://aws.amazon.com/ec2/.Google Scholar
AMD64-manual. AMD-64 architecture programmer's manual, 2008.Google Scholar
E. Bugnion, J. M. Anderson, and M. S. Lam. Compiler-directed page coloring for multiprocessors. In 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 244--255, Cambridge, MA, October 1996. Google ScholarDigital Library
D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In 37th Conf. on Design Automation, pages 416--419, Los Angeles, CA, 2000. Google ScholarDigital Library
S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In 39th Int'l Symp. on Microarchitecture (Micro), pages 455--468, Orlando, FL, December 2006. Google Scholar
A. Fedorova, M. Seltzer, and M.D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In 16th Int'l Conf. on Parallel Architecture and Compilation Techniques (PACT), pages 25--36, Brasov, Romania, September 2007. Google ScholarDigital Library
GoGrid. http://www.gogrid.com.Google Scholar
L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 13--22, 2006. Google Scholar
IA32-manual. IA-32 Intel architecture software developer's manual, 2008. http://www.intel.com&/products/processor/manuals/.Google Scholar
Intel. TLBs, paging-structure caches, and their invalidation, 2008. http://www.intel.com/design/processor/applnots/317080.pdf.Google Scholar
R. Iyer, L. Zhao, F. Guo, R. Illikkal, Don Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS policies and architecture for cache/memroy in CMP platforms. In ACM SIGMETRICS, pages 25--36, San Diego, June 2007. Google ScholarDigital Library
R. E. Kessler and M. D. Hill. Page placement algorithms for large real-indexed caches. ACM Trans. on Computer Systems, 10 (4): 338--359, November 1992. Google ScholarDigital Library
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 111--122, 2004. Google Scholar
D. Lee, J. Choi, J. H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. on Computers, 50 (12): 1352--1361, December 2001. Google ScholarDigital Library
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Int'l Symp. on High-Performance Computer Architecture (HPCA), pages 367--378, Salt Lake, UT, February 2008.Google Scholar
P. Lu and K. Shen. Virtual machine memory access tracing with hypervisor exclusive cache. In USENIX Annual Technical Conf. (USENIX), pages 29--43, Santa Clara, CA, June 2007. Google ScholarDigital Library
D. A. Patterson. Latency lags bandwith. Communications of the ACM, 47 (10): 71--75, October 2004. Google ScholarDigital Library
M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, hight-performance, runtime mechanism to partition shared caches. In 39th Int'l Symp. on Microarchitecture (Micro), pages 423--432, Orlando, FL, December 2006. Google Scholar
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 2--12, 2006. Google Scholar
A. Raghuraman. Miss-ratio curve directed memory management for high performance and low energy. Master's thesis, Dept. of Computer Science, UIUC, 2003.Google Scholar
T. H. Romer, D. Lee, B. N. Bershad, and J. B. Chen. Dynamic page mapping policies for cache conflict resolution on standard hardware. In First USENIX Symp. on Operating Systems Design and Implementation (OSDI), pages 255--266, Monterey, CA, November 1994.Google Scholar
T. Sherwood, B. Calder, and J. Emer. Reducing cache misses using hardware and software page replacement. In 13th Int'l Conf. on Supercomputing (ICS), pages 155--164, Rhodes, Greece, June 1999. Google ScholarDigital Library
L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In 41th Int'l Symp. on Microarchitecture (Micro), pages 258--269, Lake Como, ITALY, November 2008. Google Scholar
L. B. Sokolinsky. LFU-K: An effective buffer management replacement algorithm. In 9th International Conference on Database Systems for Advanced Applications, pages 670--681, 2004.Google ScholarCross Ref
H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Trans. on Computers, 41 (9): 1054--1068, September 1992. Google ScholarDigital Library
G. E. Suh, L. Rudolph, and Srini Devadas. Dynamic cache partitioning for simultaneous multithreading systems. In Int'l Conf. on Parallel and Distributed Computing and Systems, pages 116--127, Anaheim, CA, August 2001.Google Scholar
D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared L2 caches on multicore systems in software. In Workshop on the Interaction between Operating Systems and Computer Architecture, San Diego, CA, June 2007.Google Scholar
G. Taylor, P. Davies, and M. Farmwald. The TLB slice a low-cost high-speed address translation mechanism. In 17th Int'l Symp. on Computer Architecture (ISCA), pages 355--363, Seattle, WA, June 1990. Google Scholar
C. A. Waldspurger. Memory resource management in vmware ESX server. In 5th USENIX Symp. on Operating Systems Design and Implementation (OSDI), pages 181--194, Boston, MA, December 2002. Google ScholarDigital Library
X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor hardware counter statistics as a first-class system resource. In 11th Workshop on Hot Topics in Operating Systems (HotOS), San Diego, CA, May 2007. Google ScholarDigital Library
L. Zhao, R. Iyer, R. Illikkal, J. Moses, D. Newell, and S. Makineni. CacheScouts: Fine-grain monitoring of shared caches in CMP platforms. In 16th Int'l Conf. on Parallel Architecture and Compilation Techniques (PACT), pages 339--352, Brasov, Romania, September 2007. Google ScholarDigital Library
P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In 11th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 177--188, Boston, MA, October 2004. Google ScholarDigital Library

Index Terms

Towards practical page coloring-based multicore cache management
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

Code-based cache partitioning for improving hardware cache performance
ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication

Recently, improving hardware cache performance is getting more important, because the performance gap between processor and memory has caused "memory wall" problem. Most cache designs are based on the LRU replacement policy which is effective for high-...
Read More
Combining Process-Based Cache Partitioning and Pollute Region Isolation to Improve Shared Last Level Cache Utilization on Multicore Systems
TRUSTCOM '13: Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications

Shared last level cache has been widely used in modern multicore processors. However, uncontrolled cache sharing on multicore leads to more serious cache pollution than that on single-core processor. A process with weak locality can evict strong ...
Read More
A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems
ECRTS '13: Proceedings of the 2013 25th Euromicro Conference on Real-Time Systems

Many modern multi-core processors sport a large shared cache with the primary goal of enhancing the statistic performance of computing workloads. However, due to resulting cache interference among tasks, the uncontrolled use of such a shared cache can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems
April 2009
342 pages
ISBN:9781605584829
DOI:10.1145/1519065
General Chair:
Wolfgang Schröder-Preikschat
FAU Erlangen-Nuremberg, Germany
,
Program Chairs:
John Wilkes
Google, Inc., USA
,
Rebecca Isaacs
Microsoft Research, Cambridge, UK
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 April 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache partitioning
multi-core
page coloring
resource management
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate241of1,308submissions,18%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 190
  Total Citations
  View Citations
- 1,571
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards practical page coloring-based multicore cache management

EuroSys '09: Proceedings of the 4th ACM European conference on Computer systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Code-based cache partitioning for improving hardware cache performance

Combining Process-Based Cache Partitioning and Pollute Region Isolation to Improve Shared Last Level Cache Utilization on Multicore Systems

A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems