skip to main content
research-article
Free Access

A Software Cache Partitioning System for Hash-Based Caches

Published:16 December 2016Publication History
Skip Abstract Section

Abstract

Contention on the shared Last-Level Cache (LLC) can have a fundamental negative impact on the performance of applications executed on modern multicores. An interesting software approach to address LLC contention issues is based on page coloring, which is a software technique that attempts to achieve performance isolation by partitioning a shared cache through careful memory management. The key assumption of traditional page coloring is that the cache is physically addressed. However, recent multicore architectures (e.g., Intel Sandy Bridge and later) switched from a physical addressing scheme to a more complex scheme that involves a hash function. Traditional page coloring is ineffective on these recent architectures.

In this article, we extend page coloring to work on these recent architectures by proposing a mechanism able to handle their hash-based LLC addressing scheme. Just as for traditional page coloring, the goal of this new mechanism is to deliver performance isolation by avoiding contention on the LLC, thus enabling predictable performance. We implement this mechanism in the Linux kernel, and evaluate it using several benchmarks from the SPEC CPU2006 and PARSEC 3.0 suites. Our results show that our solution is able to deliver performance isolation to concurrently running applications by enforcing partitioning of a Sandy Bridge LLC, which traditional page coloring techniques are not able to handle.

References

  1. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brian K. Bray, William L. Lunch, and Michael J. Flynn. 1990. Page Allocation to Reduce Access Time of Physical Caches. Technical Report. Stanford, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jacob Brock, Chencheng Ye, Chen Ding, Yechen Li, Xiaolin Wang, and Yingwei Luo. 2015. Optimal cache partition-sharing. In Proceedings of the 44th International Conference on Parallel Processing (ICPP’15). IEEE Computer Society, Washington, DC, 749--758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cavium. 2004. Octeon processors family by Cavium Networks. Retrieved December 2, 2016 from http://www.cavium.com/newsevents_octeon_cavium.html.Google ScholarGoogle Scholar
  5. Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy efficiency while preserving responsiveness. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 308--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGARCH Computer Architecture News 41, 1, 77--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Xiaoning Ding, Kaibo Wang, and Xiaodong Zhang. 2011. SRM-buffer: An OS buffer management technique to prevent last level cache from thrashing in multicores. In Proceedings of EuroSys. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alexandra Fedorova, Sergey Blagodurov, and Sergey Zhuravlev. 2010. Managing contention for shared resources on multicore processors. Communications of the ACM 53, 2, 49--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Gupta and H. Zhou. 2015. Spatial locality-aware cache partitioning for effective cache sharing. In 2015 44th International Conference on Parallel Processing. 150--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Computer Architecture News 34, 4, 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Hund, C. Willems, and T. Holz. 2013. Practical timing side channel attacks against kernel space ASLR. In IEEE Symposium on Security and Privacy (SP’13). 191--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel Corp. 2015. Improving Real-Time Performance by Utilizing Cache Allocation Technology. Technical Report. Retrieved December 2, 2016 from http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/cache-allocation-technology-white-paper.pdf.Google ScholarGoogle Scholar
  13. Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2015. Systematic Reverse Engineering of Cache Slice Selection in Intel Processors. Cryptology ePrint Archive, Report 2015/690. Retrieved December 2, 2016 from http://eprint.iacr.org/.Google ScholarGoogle Scholar
  14. Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xinxin Jin, Haogang Chen, Xiaolin Wang, Zhenlin Wang, Xiang Wen, Yingwei Luo, and Xiaoming Li. 2009. A simple cache partitioning approach in a virtualized environment. In Proceedings of ISPA.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Khan, A. R. Alameldeen, C. Wilkerson, O. Mutlu, and D. A. Jimenezz. 2014. Improving cache performance using read-write partitioning. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA’14). 452--463.Google ScholarGoogle Scholar
  17. M. Kharbutli, M. Jarrah, and Y. Jararweh. 2013. SCIP: Selective cache insertion and bypassing to improve the performance of last-level caches. In IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT’13). 1--6.Google ScholarGoogle Scholar
  18. Hyoseung Kim, Arvind Kandhalu, and Ragunathan (Raj) Rajkumar. 2013. A coordinated approach for practical OS-level cache management in multi-core real-time systems. In Proceedings of the 25th Euromicro Conference on Real-Time Systems (ECRTS’13). IEEE Computer Society, Washington, DC, 80--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. JongWon Kim, Jinkyu Jeong, Hwanju Kim, and Joonwon Lee. 2011. Explicit non-reusable page cache management to minimize last level cache pollution. In Proceedings of ICCIT.Google ScholarGoogle Scholar
  20. Kenneth C. Knowlton. 1965. A fast storage allocator. Commun. ACM 8, 10 (Oct. 1965), 623--624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Oded Lempel. 2011. 2nd Generation Intel Core Processor Family: Intel Core i7, i5 and i3. Retrieved December 2, 2016 from http://www.hotchips.org/wp-content/uploads/hc_archives/hc23/HC23.19.9-Desktop-CPUs/HC23.19.911-Sandy-Bridge-Lempel-Intel-Rev%207.pdf.Google ScholarGoogle Scholar
  22. Lingda Li, Dong Tong, Zichao Xie, Junlin Lu, and Xu Cheng. 2012. Optimal bypass monitor for high performance last-level caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 315--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xiaofei Liao, Rentong Guo, Danping Yu, Hai Jin, and Li Lin. 2014. A phase behavior aware dynamic cache partitioning scheme for CMPs. International Journal of Parallel Programming 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2008. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proceedings of HPCA.Google ScholarGoogle Scholar
  25. L. Liu, Y. Li, C. Ding, H. Yang, and C. Wu. 2016. Rethinking memory management in modern operating system: Horizontal, vertical or random? IEEE Transactions on Computers 65, 6, 1921--1935.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 248--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Paul Menage. 2004. Control Group Linux documentation. Retrieved December 2, 2016 from https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt.Google ScholarGoogle Scholar
  28. Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, 381--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Sandberg, A. Sembrant, E. Hagersten, and D. Black-Schaffer. 2013. Modeling performance variation due to cache sharing. In IEEE 19th International Symposium on High Performance Computer Architecture (HPCA’13). 155--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, and Todd C. Mowry. 2012. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of PACT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Akbar Sharifi, Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin. 2012. Courteous cache sharing: Being nice to others in capacity management. In Proceedings of the 49th Annual Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Livio Soares, David Tam, and Michael Stumm. 2008. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 258--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. David Tam, Reza Azimi, Livio Soares, and Michael Stumm. 2007. Managing shared L2 caches on multicore systems in software. In Proceedings of WIOSCA.Google ScholarGoogle Scholar
  37. Ruisheng Wang and Lizhong Chen. 2014. Futility scaling: High-associativity cache partitioning. In 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’14). IEEE, 356--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaolin Wang, Xiang Wen, Yechen Li, Yingwei Luo, Xiaoming Li, and Zhenlin Wang. 2012. A dynamic cache partitioning mechanism under virtualization environment. In Trust, Security and Privacy in Computing and Communications (TrustCom’12). IEEE, 1907--1911. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhipeng Wei, Zehan Cui, and Mingyu Chen. 2015. Cracking Intel Sandy Bridge’s cache hash function. arXiv preprint arXiv:1508.03767.Google ScholarGoogle Scholar
  40. Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 607--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ying Ye, Richard West, Zhuoqun Cheng, and Ye Li. 2014. COLORIS: A dynamic cache partitioning system using page coloring. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT’14). ACM, New York, NY, 381--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiao Zhang, Sandhya Dwarkadas, and Kai Shen. 2009. Towards practical page coloring-based multi-core cache management. In Proceedings of EuroSys. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Software Cache Partitioning System for Hash-Based Caches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
      December 2016
      648 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3012405
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 December 2016
      • Revised: 1 November 2016
      • Accepted: 1 November 2016
      • Received: 1 December 2015
      Published in taco Volume 13, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader