ABSTRACT
In the last-level cache, large amounts of blocks have reuse distances greater than the available cache capacity. Cache performance and efficiency can be improved if some subset of these distant reuse blocks can reside in the cache longer. The bypass technique is an effective and attractive solution that prevents the insertion of harmful blocks.
Our analysis shows that bypass can contribute significant performance improvement, and the optimal bypass can achieve similar performance compared to OPT+B, which is the theoretical optimal replacement policy. Thus, we propose a bypass technique called Optimal Bypass Monitor (OBM), which makes bypass decisions by learning and predicting the behavior of the optimal bypass. OBM keeps a short global track of the incoming-victim block pairs. By detecting the first reuse block in each pair, the behavior of the optimal bypass on the track can be asserted to guide the bypass choice.
Any existing replacement policy can be extended with OBM while requiring negligible design modification. Our experimental results show that using less than 1.5KB extra memory, OBM with the NRU replacement policy outperforms LRU by 9.7% and 8.9% for single-thread and multi-programmed workloads respectively. Compared with other state-of-the-art proposals such as DRRIP and SDBP, it achieves superior performance with less storage overhead.
- S. Bansal and D. S. Modha. Car: Clock with adaptive replacement. In FAST-3, 2004. Google ScholarDigital Library
- L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78--101, 1966. Google ScholarDigital Library
- S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54:67--77, 2011. Google ScholarDigital Library
- M. Chaudhuri. Pseudo-lifo: the foundation of a new family of replacement policies for last-level caches. In MICRO-42, 2009. Google ScholarDigital Library
- C.-H. Chi and H. Dietz. Improving cache performance by selective cache bypass. In HICSS-22, 1989.Google ScholarCross Ref
- H. Gao and C. Wilkerson. A dueling segmented lru replacement algorithm with adaptive bypassing. In JWAC-1, 2010.Google Scholar
- J. Gaur, M. Chaudhuri, and S. Subramoney. Bypass and insertion algorithms for exclusive last-level caches. In ISCA-38, 2011. Google ScholarDigital Library
- A. González, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS-9, 1995.Google Scholar
- J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34:1--17, 2006. Google ScholarDigital Library
- Z. Hu, S. Kaxiras, and M. Martonosi. Timekeeping in the memory system: predicting and optimizing memory behavior. In ISCA-29, 2002. Google ScholarDigital Library
- Intel. Intel core i7 processor. http://www.intel.com/products/processor/corei7/.Google Scholar
- A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA-37, 2010. Google ScholarDigital Library
- J. Jalminger and P. Stenstrom. A novel approach to cache block reuse predictions. In ICPP '03, 2003.Google ScholarCross Ref
- L. John and A. Subramanian. Design and performance evaluation of a cache assist to implement selective caching. In ICCD '97, 1997.Google ScholarCross Ref
- T. Johnson, D. Connors, M. Merten, and W.-M. Hwu. Run-time cache bypassing. Computers, IEEE Transactions on, 48(12):1338--1354, 1999. Google ScholarDigital Library
- N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA-17, 1990. Google ScholarDigital Library
- G. Keramidas, P. Petoumenos, and S. Kaxiras. Cache replacement based on reuse-distance prediction. In ICCD-25, 2007.Google ScholarCross Ref
- S. M. Khan, D. A. Jiménez, D. Burger, and B. Falsafi. Using dead blocks as a virtual victim cache. In PACT-19, 2010. Google ScholarDigital Library
- S. M. Khan, Y. Tian, and D. A. Jimenez. Sampling dead block prediction for last-level caches. In MICRO-43, 2010. Google ScholarDigital Library
- M. Kharbutli and Y. Solihin. Counter-based cache replacement and bypassing algorithms. Computers, IEEE Transactions on, 57(4):433--447, 2008. Google ScholarDigital Library
- A.-C. Lai, C. Fide, and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In ISCA-28, 2001. Google ScholarDigital Library
- H. Liu, M. Ferdman, J. Huh, and D. Burger. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In MICRO-41, 2008. Google ScholarDigital Library
- R. Manikantan, K. Rajan, and R. Govindarajan. Nucache: An efficient multicore cache organization based on next-use distance. In HPCA-17, 2011. Google ScholarDigital Library
- N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In FAST-2, 2003. Google ScholarDigital Library
- N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Cacti 6.0: A tool to understand large caches. HP Research Report, 2007.Google Scholar
- H. Patil, R. Cohn, M. Charney, R. Kapoor, A. Sun, and A. Karunanidhi. Pinpointing representative portions of large intel itanium programs with dynamic instrumentation. In MICRO-37, 2004. Google ScholarDigital Library
- M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer. Adaptive insertion policies for high performance caching. In ISCA-34, 2007. Google ScholarDigital Library
- M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for mlp-aware cache replacement. In ISCA-33, 2006. Google ScholarDigital Library
- K. Rajan and G. Ramaswamy. Emulating optimal replacement with a shepherd cache. In MICRO-40, 2007. Google ScholarDigital Library
- J. Rivers and E. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In ICPP '96, 1996.Google ScholarCross Ref
- J. A. Rivers, E. S. Tam, G. S. Tyson, E. S. Davidson, and M. Farrens. Utilizing reuse information in data cache management. In ICS-12, 1998. Google ScholarDigital Library
- R. Subramanian, Y. Smaragdakis, and G. H. Loh. Adaptive caches: Effective shaping of cache behavior to workloads. In MICRO-39, 2006. Google ScholarDigital Library
- G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. A modified approach to data cache management. In MICRO-28, 1995. Google ScholarDigital Library
- S. Walsh and J. Board. Pollution control caching. In ICCD '95, 1995. Google ScholarDigital Library
- C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely Jr, and J. Emer. Ship: Signature-based hit predictor for high performance caching. In MICRO-44, 2011. Google ScholarDigital Library
- C. Wu, A. Jaleel, M. Martonosi, S. Steely Jr, and J. Emer. Pacman: Prefetch-aware cache management for high performance caching. In MICRO-44, 2011. Google ScholarDigital Library
- L. Xiang, T. Chen, Q. Shi, and W. Hu. Less reused filter: improving l2 cache performance via filtering less reused lines. In ICS-23, 2009. Google ScholarDigital Library
- Y. Xie and G. H. Loh. Pipp: Promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA-36, 2009. Google ScholarDigital Library
Index Terms
- Optimal bypass monitor for high performance last-level caches
Recommendations
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesThe replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
Bypass and insertion algorithms for exclusive last-level caches
ISCA '11: Proceedings of the 38th annual international symposium on Computer architectureInclusive last-level caches (LLCs) waste precious silicon estate due to cross-level replication of cache blocks. As the industry moves toward cache hierarchies with larger inner levels, this wasted cache space leads to bigger performance losses compared ...
High performance cache replacement using re-reference interval prediction (RRIP)
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecturePractical cache replacement policies attempt to emulate optimal replacement by predicting the re-reference interval of a cache block. The commonly used LRU replacement policy always predicts a near-immediate re-reference interval on cache hits and ...
Comments