ABSTRACT
In an α-way set-associative cache, the cache is partitioned into disjoint sets of size α, and each item can only be cached in one set, typically selected via a hash function. Set-associative caches are widely used and have many benefits, e.g., in terms of latency or concurrency, over fully associative caches, but they often incur more cache misses. As the set size α decreases, the benefits increase, but the paging costs worsen.
In this paper we characterize the performance of an α-way set-associative LRU cache of total size k, as a function of α = α(k). We prove the following, assuming that sets are selected using a fully random hash function: For α = ω(log k), the paging cost of an α-way set-associative LRU cache is within additive O(1) of that a fully-associative LRU cache of size (1-o(1))k, with probability 1 - 1 / poly (k), for all request sequences of length poly (k). For α = o(log k), and for all c = O(1) and r = O(1), the paging cost of an α-way set-associative LRU cache is not within a factor c of that a fully-associative LRU cache of size k/r, for some request sequence of length O(k1.01). For α = ω(log k), if the hash function can be occasionally changed, the paging cost of an α-way set-associative LRU cache is within a factor 1 + o(1) of that a fully-associative LRU cache of size (1-o(1))k, with probability 1 - 1/poly (k), for request sequences of arbitrary (e.g., super-polynomial) length. Some of our results generalize to other paging algorithms besides LRU, such as least-frequently used (LFU).
- Dolev Adas, Gil Einziger, and Roy Friedman. 2022. Limited Associativity Makes Concurrent Software Caches a Breeze. In Proceedings of the 23rd International Conference on Distributed Computing and Networking (ICDCN). Association for Computing Machinery, New York, NY, USA, 87--96.Google ScholarDigital Library
- Anant Agarwal, John Hennessy, and Mark Horowitz. 1988. Cache Performance of Operating System and Multiprogramming Workloads. ACM Trans. Comput. Syst., Vol. 6, 4 (November 1988), 393--431.Google ScholarDigital Library
- Alok Aggarwal and S. Vitter, Jeffrey. 1988. The Input/Output Complexity of Sorting and Related Problems. Commun. ACM, Vol. 31, 9 (sep 1988), 1116--1127. https://doi.org/10.1145/48529.48535Google ScholarDigital Library
- Kunal Agrawal, Michael A. Bender, Rathish Das, William Kuszmaul, Enoch Peserico, and Michele Scquizzato. 2020. Green Paging and Parallel Paging. In Proc. 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 493--495.Google ScholarDigital Library
- Kunal Agrawal, Michael A. Bender, Rathish Das, William Kuszmaul, Enoch Peserico, and Michele Scquizzato. 2021. Tight Bounds for Parallel Paging and Green Paging. In Proc. 32th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 3022--3041.Google ScholarCross Ref
- Kunal Agrawal, Michael A Bender, Rathish Das, William Kuszmaul, Enoch Peserico, and Michele Scquizzato. 2022. Online Parallel Paging with Optimal Makespan. In Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures. 205--216.Google ScholarDigital Library
- Kunal Agrawal, Michael A. Bender, and Jeremy T. Fineman. 2009. The Worst Page-Replacement Policy. Theory of Computing Systems, Vol. 44 (2 2009), 175--185. Issue 2. https://doi.org/10.1007/s00224-008-9114-1Google ScholarDigital Library
- Kathirgamar Aingaran, Sumti Jairath, Georgios Konstadinidis, Serena Leung, Paul Loewenstein, Curtis McAllister, Stephen Phillips, Zoran Radovic, Ram Sivaramakrishnan, David Smentek, and Thomas Wicki. 2015. M7: Oracle's Next-Generation Sparc Processor. IEEE Micro, Vol. 35, 2 (2015), 36--45. https://doi.org/10.1109/MM.2015.35Google ScholarDigital Library
- James Bell, David Casasent, and C. Gordon Bell. 1974. An Investigation of Alternative Cache Organizations. IEEE Trans. Comput., Vol. C-23, 4 (1974), 346--351.Google ScholarDigital Library
- Michael A. Bender, Abhishek Bhattacharjee, Alex Conway, Martín Farach-Colton, Rob Johnson, Sudarsun Kannan, William Kuszmaul, Nirjhar Mukherjee, Don Porter, Guido Tagliavini, Janet Vorobyeva, and Evan West. 2021. Paging and the Address-Translation Problem. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). Association for Computing Machinery, New York, NY, USA, 105--117.Google ScholarDigital Library
- Michael A. Bender, Rathish Das, Martín Farach-Colton, and Guido Tagliavini. 2023. An Associativity Threshold Phenomenon in Set-Associative Caches. arxiv: 2304.04954 [cs.DS]Google Scholar
- Allan Borodin and Ran El-Yaniv. 1998. Online Computation and Competitive Analysis. Cambridge University Press, USA.Google ScholarDigital Library
- Bill Bowhill, Blaine Stackhouse, Nevine Nassif, Zibing Yang, Arvind Raghavan, Charles Morganti, Chris Houghton, Dan Krueger, Olivier Franza, Jayen Desai, Jason Crop, Dave Bradley, Chris Bostak, Sal Bhimji, and Matt Becker. 2015. The Xeon® Processor E5--2600 v3: A 22nm 18-Core Product Family. In 2015 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers. IEEE Computer Society, 1--3. https://doi.org/10.1109/ISSCC.2015.7062934Google ScholarCross Ref
- Joan Boyar, Martin R. Ehmsen, Jens S. Kohrt, and Kim S. Larsen. 2010. A theoretical comparison of LRU and LRU-K. Acta Informatica, Vol. 47 (2010), 359--374.Google ScholarDigital Library
- Joan Boyar, Lene M. Favrholdt, and Kim S. Larsen. 2007. The Relative Worst-Order Ratio Applied to Paging. J. Comput. Syst. Sci., Vol. 73, 5 (Aug. 2007), 818--843. https://doi.org/10.1016/j.jcss.2007.03.001Google ScholarDigital Library
- Mark Brehob, Richard Enbody, Eric Torng, and Stephen Wagner. 2001. On-Line Restricted Caching. In Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (Washington, D.C., USA). Society for Industrial and Applied Mathematics, USA, 374--383.Google Scholar
- Niv Buchbinder, Shahar Chen, and Joseph (Seffi) Naor. 2014. Competitive Algorithms for Restricted Caching and Matroid Caching. In Proceedings of the 22nd Annual European Symposium on Algorithms (ESA),, Andreas S. Schulz and Dorothea Wagner (Eds.). Springer-Verlag, Berlin, Heidelberg, 209--221.Google ScholarCross Ref
- Edward G. Coffman and Peter J. Denning. 1973. Operating Systems Theory. Prentice Hall Professional Technical Reference.Google ScholarDigital Library
- Rathish Das, Kunal Agrawal, Michael A Bender, Jonathan Berry, Benjamin Moseley, and Cynthia A Phillips. 2020. How to Manage High-Bandwidth Memory Automatically. In Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 187--199.Google ScholarDigital Library
- Daniel DeLayo, Kenny Zhang, Kunal Agrawal, Michael A Bender, Jonathan Berry, Rathish Das, Benjamin Moseley, and Cynthia A Phillips. 2022. Automatic HBM Management: Models and Algorithms. In Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).Google ScholarDigital Library
- Reza Dorrigiv and Alejandro López-Ortiz. 2008. Closing the Gap Between Theory and Practice: New Measures for On-line Algorithm Analysis. In International Workshop on Algorithms and Computation (WALCOM). Springer-Verlag Berlin Heidelberg, 13--24.Google Scholar
- Reza Dorrigiv, Alejandro López-Ortiz, and J. Ian Munro. 2009. On the Relative Dominance of Paging Algorithms. Theor. Comput. Sci., Vol. 410, 38--40 (Sept. 2009), 3694--3701. https://doi.org/10.1016/j.tcs.2009.04.023Google ScholarDigital Library
- Devdatt Dubhashi and Alessandro Panconesi. 2009. Concentration of Measure for the Analysis of Randomized Algorithms 1st ed.). Cambridge University Press, USA.Google ScholarDigital Library
- Amos Fiat, Richard M. Karp, Michael Luby, Lyle A. McGeoch, Daniel D. Sleator, and Neal E. Young. 1991. Competitive Paging Algorithms. Journal of Algorithms, Vol. 12, 4 (December 1991), 685--699. https://doi.org/10.1016/0196-6774(91)90041-VGoogle ScholarDigital Library
- James D. Fix. 2003. The Set-Associative Cache Performance of Search Trees. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (Baltimore, Maryland). Society for Industrial and Applied Mathematics, USA, 565--572.Google ScholarDigital Library
- M. Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 1999. Cache-Oblivious Algorithms. In Proceedings of the 1999 IEEE 40th Annual Symposium on Foundations of Computer Science (FOCS). IEEE Computer Society, Los Alamitos, CA, USA, 285. https://doi.org/10.1109/SFFCS.1999.814600Google Scholar
- John S. Harper, Darren J. Kerbyson, and Graham R. Nudd. 1999. Analytical Modeling of Set-Associative Cache Behavior. IEEE Trans. Comput., Vol. 48, 10 (October 1999), 1009--1024. https://doi.org/10.1109/12.805152Google ScholarDigital Library
- Mark D. Hill. 1988. A Case for Direct-Mapped Caches. Computer, Vol. 21, 12 (December 1988), 25--40. https://doi.org/10.1109/2.16187Google ScholarDigital Library
- Mark D. Hill and Alan J. Smith. 1989. Evaluating Associativity in CPU Caches. IEEE Trans. Comput., Vol. 38, 12 (December 1989), 1612--1630. https://doi.org/10.1109/12.40842Google ScholarDigital Library
- Norman P. Jouppi. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. ACM SIGARCH Computer Architecture News, Vol. 18 (1990). Issue 2SI. https://doi.org/10.1145/325096.325162Google ScholarDigital Library
- Anna R. Karlin, Mark S. Manasse, Larry Rudolph, and Daniel D. Sleator. 1986. Competitive Snoopy Caching. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science (FOCS). 244--254. https://doi.org/10.1109/SFCS.1986.14Google ScholarDigital Library
- Richard E. Kessler, R. Jooss, Alvin R. Lebeck, and Mark D. Hill. 1989. Inexpensive Implementations Of Set-Associativity. In The 16th Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, 131--139. https://doi.org/10.1109/ISCA.1989.714547Google Scholar
- William F. King. 1971. Analysis of Demand Paging Algorithms. In IFIP Congress.Google Scholar
- Georgios K. Konstadinidis, Hongping Penny Li, Francis Schumacher, Venkat Krishnaswamy, Hoyeol Cho, Sudesna Dash, Robert P. Masleid, Chaoyang Zheng, Yuanjung David Lin, Paul Loewenstein, Heechoul Park, Vijay Srinivasan, Dawei Huang, Changku Hwang, Wenjay Hsu, Curtis McAllister, Jeff Brooks, Ha Pham, Sebastian Turullols, Yifan Yanggong, Robert Golla, Alan P. Smith, and Ali Vahidsafa. 2016. SPARC M7: A 20 nm 32-Core 64 MB L3 Cache Processor. IEEE Journal of Solid-State Circuits, Vol. 51, 1 (2016), 79--91. https://doi.org/10.1109/JSSC.2015.2456902Google ScholarCross Ref
- Nasser Kurd, Muntaquim Chowdhury, Edward Burton, Thomas P. Thomas, Christopher Mozak, Brent Boswell, Manoj Lal, Anant Deval, Jonathan Douglas, Mahmoud Elassal, Ankireddy Nalamalpu, Timothy M. Wilson, Matthew Merten, Srinivas Chennupaty, Wilfred Gomes, and Rajesh Kumar. 2014. Haswell: A family of IA 22nm processors. In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE Computer Society, 112--113. https://doi.org/10.1109/ISSCC.2014.6757361Google ScholarCross Ref
- Mark S. Manasse, Lyle A. McGeoch, and Daniel D. Sleator. 1990. Competitive Algorithms for Server Problems. J. Algorithms, Vol. 11, 2 (may 1990), 208--230. https://doi.org/10.1016/0196-6774(90)90003-WGoogle ScholarDigital Library
- M. Mendel and Steven S. Seiden. 2004. Online Companion Caching. Theoretical Computer Science, Vol. 324, 2-3 (September 2004), 183--200. https://doi.org/10.1016/j.tcs.2004.05.015Google ScholarDigital Library
- Nima Mousavi. 2012. How tight is Chernoff bound? Notes.Google Scholar
- Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. 1993. The LRU-K Page Replacement Algorithm for Database Disk Buffering. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (Washington, D.C., USA). Association for Computing Machinery, New York, NY, USA, 297--306. https://doi.org/10.1145/170035.170081Google ScholarDigital Library
- Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. 1999. An Optimality Proof of the LRU-K Page Replacement Algorithm. J. ACM, Vol. 46, 1 (January 1999), 92--112. https://doi.org/10.1145/300515.300518Google ScholarDigital Library
- Enoch Peserico. 2003. Online Paging with Arbitrary Associativity. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (Baltimore, Maryland). Society for Industrial and Applied Mathematics, USA, 555--564.Google ScholarDigital Library
- S. Prybylski, M. Horowitz, and J. Hennessy. 1988. Performance Tradeoffs in Cache Design. In Proceedings of the 15th Annual International Symposium on Computer Architecture (ISCA) (Honolulu, Hawaii, USA). IEEE Computer Society, Washington, DC, USA, 290--298.Google Scholar
- Moinuddin K. Qureshi. 2018. CEASER: Mitigating Conflict-Based Cache Attacks via Encrypted-Address and Remapping. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (Fukuoka, Japan). IEEE Press, 775--787. https://doi.org/10.1109/MICRO.2018.00068Google ScholarDigital Library
- Moinuddin K. Qureshi. 2019. New Attacks and Defense for Encrypted-Address Cache. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA) (Phoenix, Arizona). Association for Computing Machinery, New York, NY, USA, 360--371. https://doi.org/10.1145/3307650.3322246Google ScholarDigital Library
- Gururaj S. Rao. 1978. Performance Analysis of Cache Memories. J. ACM, Vol. 25, 3 (July 1978), 378--395. https://doi.org/10.1145/322077.322081Google ScholarDigital Library
- RocksDB. 2022. Block Cache. RocksDB wiki. https://github.com/facebook/rocksdb/wiki/Block-Cache Last accessed: 2023-01-09.Google Scholar
- Efraim Rotem, Adi Yoaz, Lihu Rappoport, Stephen J. Robinson, Julius Yuli Mandelblat, Arik Gihon, Eliezer Weissmann, Rajshree Chabukswar, Vadim Basin, Russell Fenger, Monica Gupta, and Ahmad Yasin. 2022. Intel Alder Lake CPU Architectures. IEEE Micro, Vol. 42, 3 (2022), 13--19. https://doi.org/10.1109/MM.2022.3164338Google ScholarCross Ref
- Peter Sanders. 1999. Accessing Multiple Sequences Through Set Associative Caches. In Proceedings of the 26th International Colloquium on Automata, Languages and Programming (ICALP). Springer-Verlag, Berlin, Heidelberg, 655--664.Google ScholarDigital Library
- Rathijit Sen and David A. Wood. 2013. Reuse-Based Online Models for Caches. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). Association for Computing Machinery, New York, NY, USA, 279--292.Google Scholar
- Sandeep Sen, Siddhartha Chatterjee, and Neeraj Dumir. 2002. Towards a Theory of Cache-Efficient Algorithms. J. ACM, Vol. 49, 6 (November 2002), 828--858. https://doi.org/10.1145/602220.602225Google ScholarDigital Library
- Daniel D. Sleator and Robert E. Tarjan. 1985. Amortized Efficiency of List Update and Paging Rules. Commun. ACM, Vol. 28, 2 (February 1985), 202--208. https://doi.org/10.1145/2786.2793Google ScholarDigital Library
- Alan J. Smith. 1976. On the Effectiveness of Set Associative Page Mapping and Its Application to Main Memory Management. In Proceedings of the 2nd International Conference on Software Engineering (ICSE) (San Francisco, California, USA). IEEE Computer Society Press, Washington, DC, USA, 286--292.Google ScholarDigital Library
- Alan J. Smith. 1978. A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory. IEEE Transactions on Software Engineering, Vol. 4, 2 (March 1978), 121--130. https://doi.org/10.1109/TSE.1978.231482Google ScholarDigital Library
- Simon M. Tam, Harry Muljono, Min Huang, Sitaraman Iyer, Kalapi Royneogi, Nagmohan Satti, Rizwan Qureshi, Wei Chen, Tom Wang, Hubert Hsieh, Sujal Vora, and Eddie Wang. 2018. SkyLake-SP: A 14nm 28-Core Xeon® Processor. In 2018 IEEE International Solid-State Circuits Conference (ISSCC). IEEE Computer Society, 34--36. https://doi.org/10.1109/ISSCC.2018.8310170Google ScholarCross Ref
- Nigel Topham and Antonio González. 1999. Randomized Cache Placement for Eliminating Conflicts. IEEE Trans. Comput., Vol. 48, 2 (February 1999), 185--192. https://doi.org/10.1109/12.752660Google ScholarDigital Library
- David Wajc. 2017. Negative Association - Definition, Properties, and Applications.Google Scholar
- Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. 2013. HOTL: A Higher Order Theory of Locality. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Association for Computing Machinery, New York, NY, USA, 343--356.Google ScholarDigital Library
- Neal Young. 1992. Competitive Paging and Dual-Guided on-Line Weighted Caching and Watching Algorithms. Ph.D. Dissertation. Princeton University, USA.Google Scholar
- Neil Young. 1994. The k-Server Dual and Loose Competitiveness for Paging. Algorithmica, Vol. 11, 6 (1994), 525--541. https://doi.org/10.1007/BF01189992Google ScholarDigital Library
- Neil Young. 2002. On-Line File Caching. Algorithmica, Vol. 33, 3 (2002), 371--383. https://doi.org/10.1007/s00453-001-0124-5Google ScholarCross Ref
Index Terms
- An Associativity Threshold Phenomenon in Set-Associative Caches
Recommendations
Efficient evaluation of arbitrary set-associative caches on multiprocessors
SPDP '92: Proceedings of the 1992 Fourth IEEE Symposium on Parallel and Distributed ProcessingThe authors propose a simple solution to the problem of efficient stack evaluation of LRU (least recently used) cache memories with an arbitrary two's power set-associativity on multiprocessors. It is an extension of stack evaluation techniques for all-...
Reactive-Associative Caches
PACT '01: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation TechniquesAbstract: While set-associative caches typically incur fewer misses than direct-mapped caches, set-associative caches have slower hit times. We propose the reactive-associative cache (r-a cache), which provides flexible associativity by placing most ...
Optimal Worst Case Formulas Comparing Cache Memory Associativity
In this paper we derive a worst case formula comparing the number of cache hits for two different cache memories. From this various other bounds for cache memory performance may be derived.
Consider an arbitrary program P which is to be executed on a ...
Comments