ABSTRACT
Modern multi-core processors present new resource management challenges due to the subtle interactions of simultaneously executing processes sharing on-chip resources (particularly the L2 cache). Recent research demonstrates that the operating system may use the page coloring mechanism to control cache partitioning, and consequently to achieve fair and efficient cache utilization. However, page coloring places additional constraints on memory space allocation, which may conflict with application memory needs. Further, adaptive adjustments of cache partitioning policies in a multi-programmed execution environment may incur substantial overhead for page recoloring (or copying). This paper proposes a hot-page coloring approach enforcing coloring on only a small set of frequently accessed (or hot) pages for each process. The cost of identifying hot pages online is reduced by leveraging the knowledge of spatial locality during a page table scan of access bits. Our results demonstrate that hot page identification and selective coloring can significantly alleviate the coloring-induced adverse effects in practice. However, we also reach the somewhat negative conclusion that without additional hardware support, adaptive page coloring is only beneficial when recoloring is performed infrequently (meaning long scheduling time quanta in multi-programmed executions).
- Amazon. Amazon elastic compute cloud. http://aws.amazon.com/ec2/.Google Scholar
- AMD64-manual. AMD-64 architecture programmer's manual, 2008.Google Scholar
- E. Bugnion, J. M. Anderson, and M. S. Lam. Compiler-directed page coloring for multiprocessors. In 7th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 244--255, Cambridge, MA, October 1996. Google ScholarDigital Library
- D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In 37th Conf. on Design Automation, pages 416--419, Los Angeles, CA, 2000. Google ScholarDigital Library
- S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In 39th Int'l Symp. on Microarchitecture (Micro), pages 455--468, Orlando, FL, December 2006. Google Scholar
- A. Fedorova, M. Seltzer, and M.D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In 16th Int'l Conf. on Parallel Architecture and Compilation Techniques (PACT), pages 25--36, Brasov, Romania, September 2007. Google ScholarDigital Library
- GoGrid. http://www.gogrid.com.Google Scholar
- L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 13--22, 2006. Google Scholar
- IA32-manual. IA-32 Intel architecture software developer's manual, 2008. http://www.intel.com&/products/processor/manuals/.Google Scholar
- Intel. TLBs, paging-structure caches, and their invalidation, 2008. http://www.intel.com/design/processor/applnots/317080.pdf.Google Scholar
- R. Iyer, L. Zhao, F. Guo, R. Illikkal, Don Newell, Y. Solihin, L. Hsu, and S. Reinhardt. QoS policies and architecture for cache/memroy in CMP platforms. In ACM SIGMETRICS, pages 25--36, San Diego, June 2007. Google ScholarDigital Library
- R. E. Kessler and M. D. Hill. Page placement algorithms for large real-indexed caches. ACM Trans. on Computer Systems, 10 (4): 338--359, November 1992. Google ScholarDigital Library
- S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 111--122, 2004. Google Scholar
- D. Lee, J. Choi, J. H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE Trans. on Computers, 50 (12): 1352--1361, December 2001. Google ScholarDigital Library
- J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Int'l Symp. on High-Performance Computer Architecture (HPCA), pages 367--378, Salt Lake, UT, February 2008.Google Scholar
- P. Lu and K. Shen. Virtual machine memory access tracing with hypervisor exclusive cache. In USENIX Annual Technical Conf. (USENIX), pages 29--43, Santa Clara, CA, June 2007. Google ScholarDigital Library
- D. A. Patterson. Latency lags bandwith. Communications of the ACM, 47 (10): 71--75, October 2004. Google ScholarDigital Library
- M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, hight-performance, runtime mechanism to partition shared caches. In 39th Int'l Symp. on Microarchitecture (Micro), pages 423--432, Orlando, FL, December 2006. Google Scholar
- N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT), pages 2--12, 2006. Google Scholar
- A. Raghuraman. Miss-ratio curve directed memory management for high performance and low energy. Master's thesis, Dept. of Computer Science, UIUC, 2003.Google Scholar
- T. H. Romer, D. Lee, B. N. Bershad, and J. B. Chen. Dynamic page mapping policies for cache conflict resolution on standard hardware. In First USENIX Symp. on Operating Systems Design and Implementation (OSDI), pages 255--266, Monterey, CA, November 1994.Google Scholar
- T. Sherwood, B. Calder, and J. Emer. Reducing cache misses using hardware and software page replacement. In 13th Int'l Conf. on Supercomputing (ICS), pages 155--164, Rhodes, Greece, June 1999. Google ScholarDigital Library
- L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In 41th Int'l Symp. on Microarchitecture (Micro), pages 258--269, Lake Como, ITALY, November 2008. Google Scholar
- L. B. Sokolinsky. LFU-K: An effective buffer management replacement algorithm. In 9th International Conference on Database Systems for Advanced Applications, pages 670--681, 2004.Google ScholarCross Ref
- H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Trans. on Computers, 41 (9): 1054--1068, September 1992. Google ScholarDigital Library
- G. E. Suh, L. Rudolph, and Srini Devadas. Dynamic cache partitioning for simultaneous multithreading systems. In Int'l Conf. on Parallel and Distributed Computing and Systems, pages 116--127, Anaheim, CA, August 2001.Google Scholar
- D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared L2 caches on multicore systems in software. In Workshop on the Interaction between Operating Systems and Computer Architecture, San Diego, CA, June 2007.Google Scholar
- G. Taylor, P. Davies, and M. Farmwald. The TLB slice a low-cost high-speed address translation mechanism. In 17th Int'l Symp. on Computer Architecture (ISCA), pages 355--363, Seattle, WA, June 1990. Google Scholar
- C. A. Waldspurger. Memory resource management in vmware ESX server. In 5th USENIX Symp. on Operating Systems Design and Implementation (OSDI), pages 181--194, Boston, MA, December 2002. Google ScholarDigital Library
- X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor hardware counter statistics as a first-class system resource. In 11th Workshop on Hot Topics in Operating Systems (HotOS), San Diego, CA, May 2007. Google ScholarDigital Library
- L. Zhao, R. Iyer, R. Illikkal, J. Moses, D. Newell, and S. Makineni. CacheScouts: Fine-grain monitoring of shared caches in CMP platforms. In 16th Int'l Conf. on Parallel Architecture and Compilation Techniques (PACT), pages 339--352, Brasov, Romania, September 2007. Google ScholarDigital Library
- P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In 11th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 177--188, Boston, MA, October 2004. Google ScholarDigital Library
Index Terms
- Towards practical page coloring-based multicore cache management
Recommendations
Code-based cache partitioning for improving hardware cache performance
ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and CommunicationRecently, improving hardware cache performance is getting more important, because the performance gap between processor and memory has caused "memory wall" problem. Most cache designs are based on the LRU replacement policy which is effective for high-...
Combining Process-Based Cache Partitioning and Pollute Region Isolation to Improve Shared Last Level Cache Utilization on Multicore Systems
TRUSTCOM '13: Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and CommunicationsShared last level cache has been widely used in modern multicore processors. However, uncontrolled cache sharing on multicore leads to more serious cache pollution than that on single-core processor. A process with weak locality can evict strong ...
A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems
ECRTS '13: Proceedings of the 2013 25th Euromicro Conference on Real-Time SystemsMany modern multi-core processors sport a large shared cache with the primary goal of enhancing the statistic performance of computing workloads. However, due to resulting cache interference among tasks, the uncontrolled use of such a shared cache can ...
Comments