A Software Cache Partitioning System for Hash-Based Caches

Authors:
Alberto Scolari

Politecnico di Milano, Milan, Italy

Politecnico di Milano, Milan, Italy

0000-0002-5862-8787
View Profile

,
Davide Basilio Bartolini

Politecnico di Milano

Politecnico di Milano
View Profile

,
Marco Domenico Santambrogio

Politecnico di Milano, Milan, Italy

Politecnico di Milano, Milan, Italy
View Profile

ACM Transactions on Architecture and Code Optimization Volume 13 Issue 4Article No.: 57pp 1–24https://doi.org/10.1145/3018113

Published:16 December 2016Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Contention on the shared Last-Level Cache (LLC) can have a fundamental negative impact on the performance of applications executed on modern multicores. An interesting software approach to address LLC contention issues is based on page coloring, which is a software technique that attempts to achieve performance isolation by partitioning a shared cache through careful memory management. The key assumption of traditional page coloring is that the cache is physically addressed. However, recent multicore architectures (e.g., Intel Sandy Bridge and later) switched from a physical addressing scheme to a more complex scheme that involves a hash function. Traditional page coloring is ineffective on these recent architectures.

In this article, we extend page coloring to work on these recent architectures by proposing a mechanism able to handle their hash-based LLC addressing scheme. Just as for traditional page coloring, the goal of this new mechanism is to deliver performance isolation by avoiding contention on the LLC, thus enabling predictable performance. We implement this mechanism in the Linux kernel, and evaluate it using several benchmarks from the SPEC CPU2006 and PARSEC 3.0 suites. Our results show that our solution is able to deliver performance isolation to concurrently running applications by enforcing partitioning of a Sandy Bridge LLC, which traditional page coloring techniques are not able to handle.

References

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81. Google ScholarDigital Library
Brian K. Bray, William L. Lunch, and Michael J. Flynn. 1990. Page Allocation to Reduce Access Time of Physical Caches. Technical Report. Stanford, CA, USA. Google ScholarDigital Library
Jacob Brock, Chencheng Ye, Chen Ding, Yechen Li, Xiaolin Wang, and Yingwei Luo. 2015. Optimal cache partition-sharing. In Proceedings of the 44th International Conference on Parallel Processing (ICPP’15). IEEE Computer Society, Washington, DC, 749--758. Google ScholarDigital Library
Cavium. 2004. Octeon processors family by Cavium Networks. Retrieved December 2, 2016 from http://www.cavium.com/newsevents_octeon_cavium.html.Google Scholar
Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 308--319. Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGARCH Computer Architecture News 41, 1, 77--88. Google ScholarDigital Library
Xiaoning Ding, Kaibo Wang, and Xiaodong Zhang. 2011. SRM-buffer: An OS buffer management technique to prevent last level cache from thrashing in multicores. In Proceedings of EuroSys. Google ScholarDigital Library
Alexandra Fedorova, Sergey Blagodurov, and Sergey Zhuravlev. 2010. Managing contention for shared resources on multicore processors. Communications of the ACM 53, 2, 49--57. Google ScholarDigital Library
S. Gupta and H. Zhou. 2015. Spatial locality-aware cache partitioning for effective cache sharing. In 2015 44th International Conference on Parallel Processing. 150--159. Google ScholarDigital Library
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News 34, 4, 1--17. Google ScholarDigital Library
R. Hund, C. Willems, and T. Holz. 2013. Practical timing side channel attacks against kernel space ASLR. In IEEE Symposium on Security and Privacy (SP’13). 191--205. Google ScholarDigital Library
Intel Corp. 2015. Improving Real-Time Performance by Utilizing Cache Allocation Technology. Technical Report. Retrieved December 2, 2016 from http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/cache-allocation-technology-white-paper.pdf.Google Scholar
Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2015. Systematic Reverse Engineering of Cache Slice Selection in Intel Processors. Cryptology ePrint Archive, Report 2015/690. Retrieved December 2, 2016 from http://eprint.iacr.org/.Google Scholar
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 60--71. Google ScholarDigital Library
Xinxin Jin, Haogang Chen, Xiaolin Wang, Zhenlin Wang, Xiang Wen, Yingwei Luo, and Xiaoming Li. 2009. A simple cache partitioning approach in a virtualized environment. In Proceedings of ISPA.Google ScholarCross Ref
S. Khan, A. R. Alameldeen, C. Wilkerson, O. Mutlu, and D. A. Jimenezz. 2014. Improving cache performance using read-write partitioning. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). 452--463.Google Scholar
M. Kharbutli, M. Jarrah, and Y. Jararweh. 2013. SCIP: Selective cache insertion and bypassing to improve the performance of last-level caches. In IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT’13). 1--6.Google Scholar
Hyoseung Kim, Arvind Kandhalu, and Ragunathan (Raj) Rajkumar. 2013. A coordinated approach for practical OS-level cache management in multi-core real-time systems. In Proceedings of the 25th Euromicro Conference on Real-Time Systems (ECRTS’13). IEEE Computer Society, Washington, DC, 80--89. Google ScholarDigital Library
JongWon Kim, Jinkyu Jeong, Hwanju Kim, and Joonwon Lee. 2011. Explicit non-reusable page cache management to minimize last level cache pollution. In Proceedings of ICCIT.Google Scholar
Kenneth C. Knowlton. 1965. A fast storage allocator. Commun. ACM 8, 10 (Oct. 1965), 623--624. Google ScholarDigital Library
Oded Lempel. 2011. 2nd Generation Intel Core Processor Family: Intel Core i7, i5 and i3. Retrieved December 2, 2016 from http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.19.9-Desktop-CPUs/HC23.19.911-Sandy-Bridge-Lempel-Intel-Rev%207.pdf.Google Scholar
Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, and Xu Cheng. 2012. Optimal bypass monitor for high performance last-level caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 315--324. Google ScholarDigital Library
Xiaofei Liao, Rentong Guo, Danping Yu, Hai Jin, and Li Lin. 2014. A phase behavior aware dynamic cache partitioning scheme for CMPs. International Journal of Parallel Programming 1--19. Google ScholarDigital Library
Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of HPCA.Google Scholar
L. Liu, Y. Li, C. Ding, H. Yang, and C. Wu. 2016. Rethinking memory management in modern operating system: Horizontal, vertical or random? IEEE Transactions on Computers 65, 6, 1921--1935.Google ScholarDigital Library
Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 248--259. Google ScholarDigital Library
Paul Menage. 2004. Control Group Linux documentation. Retrieved December 2, 2016 from https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt.Google Scholar
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, 381--391. Google ScholarDigital Library
Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of MICRO. Google ScholarDigital Library
Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of MICRO. Google ScholarDigital Library
Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of ISCA. Google ScholarDigital Library
A. Sandberg, A. Sembrant, E. Hagersten, and D. Black-Schaffer. 2013. Modeling performance variation due to cache sharing. In IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). 155--166. Google ScholarDigital Library
Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, and Todd C. Mowry. 2012. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of PACT. Google ScholarDigital Library
Akbar Sharifi, Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin. 2012. Courteous cache sharing: Being nice to others in capacity management. In Proceedings of the 49th Annual Design Automation Conference. Google ScholarDigital Library
Livio Soares, David Tam, and Michael Stumm. 2008. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 258--269. Google ScholarDigital Library
David Tam, Reza Azimi, Livio Soares, and Michael Stumm. 2007. Managing shared L2 caches on multicore systems in software. In Proceedings of WIOSCA.Google Scholar
Ruisheng Wang and Lizhong Chen. 2014. Futility scaling: High-associativity cache partitioning. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE, 356--367. Google ScholarDigital Library
Xiaolin Wang, Xiang Wen, Yechen Li, Yingwei Luo, Xiaoming Li, and Zhenlin Wang. 2012. A dynamic cache partitioning mechanism under virtualization environment. In Trust, Security and Privacy in Computing and Communications (TrustCom’12). IEEE, 1907--1911. Google ScholarDigital Library
Zhipeng Wei, Zehan Cui, and Mingyu Chen. 2015. Cracking Intel Sandy Bridge’s cache hash function. arXiv preprint arXiv:1508.03767.Google Scholar
Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 607--618. Google ScholarDigital Library
Ying Ye, Richard West, Zhuoqun Cheng, and Ye Li. 2014. COLORIS: A dynamic cache partitioning system using page coloring. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 381--392. Google ScholarDigital Library
Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards practical page coloring-based multi-core cache management. In Proceedings of EuroSys. Google ScholarDigital Library

Index Terms

A Software Cache Partitioning System for Hash-Based Caches
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Main memory

Recommendations

Temporal-based multilevel correlating inclusive cache replacement

Inclusive caches have been widely used in Chip Multiprocessors (CMPs) to simplify cache coherence. However, they have poor performance compared with noninclusive caches not only because of the limited capacity of the entire cache hierarchy but also due ...
Read More
Combining recency of information with selective random and a victim cache in last-level caches

Memory latency has become an important performance bottleneck in current microprocessors. This problem aggravates as the number of cores sharing the same memory controller increases. To palliate this problem, a common solution is to implement cache ...
Read More
MRU-Tour-based Replacement Algorithms for Last-Level Caches
SBAC-PAD '11: Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing

Memory hierarchy design is a major concern in current microprocessors. Many research work focuses on the Last-Level Cache (LLC), which is designed to hide the long miss penalty of accessing to main memory. To reduce both capacity and conflict misses, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 13, Issue 4
December 2016
648 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3012405
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 December 2016
- Revised: 1 November 2016
- Accepted: 1 November 2016
- Received: 1 December 2015
Published in taco Volume 13, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hash-based cache
Linux
last-level cache
operating system
page coloring
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 909
  Total Downloads
- Downloads (Last 12 months)127
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Software Cache Partitioning System for Hash-Based Caches

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Temporal-based multilevel correlating inclusive cache replacement

Combining recency of information with selective random and a victim cache in last-level caches

MRU-Tour-based Replacement Algorithms for Last-Level Caches