ABSTRACT
Memory transfers, in particular from/to off-chip memories, consume a significant amount of power. In order to reduce the amount of off-chip memory traffic, one or more levels of cache can be employed, located on the same die as the processor core. For performance, energy, and cost reasons, it is expedient that the on-chip cache is small and direct-mapped. Small, direct-mapped caches, however, generally produce much more traffic than needed. The purpose of this paper is two-fold. First, to measure how much traffic is generated by small, direct-mapped caches and what the minimal amount of traffic is. This yields an upper bound on the amount of traffic that can be saved by utilizing the on-chip memory more effectively. Second, we survey some techniques that can be deployed to reduce the amount of traffic produced by direct-mapped caches and present results for some of these techniques.
- A. Agarwal and S. D. Pudar. Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct Mapped Caches. In Proc. Int. Symp. on Computer Architecture, 1993.]] Google ScholarDigital Library
- G. Albera and I. Bahar. Power/Performance Advantages of Victim Buffer in High-Performance Processors. In Proc. Int. Symp. on Low Power Electronics and Design, 1998.]]Google Scholar
- D. Burger, J.R. Goodman, and A. Kägi. Memory Bandwidth Limitations of Future Microprocessors. In Proc. Int. Symp. on Computer Architecture, 1996.]] Google ScholarDigital Library
- Francky Catthoor. Energy-Delay Efficient Data Storage and Transfer Architectures and Methodologies: Current Solutions and Remaining Problems. Jnl. of VLSI Signal Processing, 21(3):219--231, 1999.]]Google ScholarDigital Library
- Francky Catthoor, Koen Danckaert, Chidamber Kulkarni, Erik Brockmeyer, Per~Gunnar Kjeldsberg, Tanja Van Achteren, and Thierry Omnes. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, 2002.]]Google ScholarDigital Library
- Pepijn de Langen and Ben Juurlink. Reducing Traffic Generated by Conflict Misses in Caches. In Proc. Computing Frontiers Conf., 2004.]] Google ScholarDigital Library
- A. González, C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality. In Proc. Int. Conf. on Supercomputing, 1995.]] Google ScholarDigital Library
- A. González, M. Valero, N. Topham, and J. Parcerisa. Eliminating Cache Conflict Misses through XOR-based Placement Functions. In Proc. Int Conf. on Supercomputing, 1997.]] Google ScholarDigital Library
- M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, and R.B. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Proc. Annual Workshop on Workload Characterization, 2001.]] Google ScholarDigital Library
- M. Huang, J. Renau, S-M. Yoo, and J. Torrellas. L1 Data Cache Decomposition for Energy Efficiency. In Int. Symp. on Low Power Electronics and Design, pages 10--15, 2001.]] Google ScholarDigital Library
- Koji Inoue, Tohru Ishihara, and Kazuaki Murakami. Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption. In Proc. Int. Symp. on Low Power Electronics and Design, 1999.]] Google ScholarDigital Library
- Teresa L. Johnson, Matthew C. Merten, and Wen-mei W. Hwu. Run-Time Spatial Locality Detection and Optimization. In Proc. Int. Symp. on Microarchitecture, pages 57--64, 1997.]] Google ScholarDigital Library
- N.P. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proc. Int. Symp. on Computer Architecture, pages 364--373, 1990.]] Google ScholarDigital Library
- Ben Juurlink. Unified Dual Data Caches. In Proc. Euromicro Symp. on Digital System Design, 2003.]] Google ScholarDigital Library
- S. Kaxiras, Z. Hu, G. Narlikar, and R. McLellan. Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power. In Proc. Workshop on Power-Aware Computer Systems, 2000.]] Google ScholarDigital Library
- Johnson Kin, Munish Gupta, and William H. Mangione-Smith. Filtering Memory References to Increase Energy Efficiency. IEEE Trans. on Computers, 49(1), 2000.]] Google ScholarDigital Library
- Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Proc. Int. Symp. on Microarchitecture, pages 330--335, 1997.]] Google ScholarDigital Library
- H.-H. S. Lee and G.S. Tyson. Region-Based Caching: An Energy-Delay Efficient Memory Architecture for Embedded Processors. In Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2000.]] Google ScholarDigital Library
- G. Memik, G. Reinman, and W.H. Mangione-Smith. Reducing Energy and Delay Using Efficient Victim Caches. In Proc. Int. Symp. on Low Power Electronics and Design, pages 262--265, 2003.]] Google ScholarDigital Library
- T.H. Meng, B. Gordon, E. Tsern, and A. Hung. Portable Video-on-Demand in Wireless Communication. Proc. of the IEEE, special issue on "Low Power Electronics", 83(4), 1995.]]Google ScholarCross Ref
- K. Palem, R. Rabbah, V. Mooney III, P. Korkmaz, and K. Puttaswamy. Design Space Optimization of Embedded Memory Systems via Data Remapping. In Proc. Joint Conf. on Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems, 2002.]] Google ScholarDigital Library
- Preeti~Ranjan Panda, Nikil Dutt, and Alexandru Nicolau. Memory Issues in Embedded Systems-on-Chip. Kluwer Academic Publishers, Boston/Dordrecht/London, 1999.]] Google ScholarDigital Library
- P. Petrov and A. Orailoglu. Performance and Power Effectiveness in Embedded Processors - Customizable Partitioned Caches. IEEE Trans. of Computer-Aided Design of Integrated Circuits and Systems, 20(11), 2001.]]Google ScholarDigital Library
- M. Prvulović, D. Marinov, Z. Dimitrijević, and V. Milutinović. Split Temporal/Spatial Cache: A Survey and Reevaluation of Performance. IEEE TCCA Newsletter, July 1999.]]Google Scholar
- G. Reinman and N. Jouppi. An Integrated Cache Timing and Power Model. Technical Report CACTI 2.0, COMPAQ Western Research Lab, 1999.]]Google Scholar
- Jeffrey B. Rothman and Alan Jay Smith. The Pool of Subsectors Cache Design. In Proc. Int. Conf. on Supercomputing, pages 31--42. ACM Press, 1999.]] Google ScholarDigital Library
- Tajana Simunic, Luca Benini, and Giovanni De Micheli. Energy-Efficient Design of Battery-Powered Embedded Systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(1), 2001.]] Google ScholarDigital Library
- R. A. Sugumar and S. G. Abraham. Efficient Simulation of Caches Under Optimal Replacement With Applications to Miss Characterization. In Proc. ACM SIGMETRICS Conf. on Measurement and Modeling Computer Systems, 1993.]] Google ScholarDigital Library
- Peter van Vleet, Eric Anderson, Linsay Brown, Jean-Loup Baer, and Anna Karlin. Pursuing the Performance Potential of Dynamic Cache Line Sizes. In Proc. Int. Conf. on Computer Design, 1999.]] Google ScholarDigital Library
- Alexander~V. Veidenbaum, Weiyu Tang, Rajesh Gupta, Alexandru Nicolau, and Xiaomei Ji. Adapting Cache Line Size to Application Behavior. In Proc. Int. Conf. on Supercomputing, pages 145--154, 1999.]] Google ScholarDigital Library
- S. Wuytack, F. Catthoor, F. Franssen, L. Nachtergaele, and H. De Man. Global Communication and Memory Optimizing Transformations for Low-Power Signal Processing Systems. In Proc. IEEE Int. Workshop on Low Power Design, 1994.]]Google Scholar
- Se-Hyun Yang, Michael~D. Powell, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches. In Proc. Int. Symp. on High Performance Computer Architectures, 2001.]] Google ScholarDigital Library
- H. Zhou, M.C. Toburen, E. Rotenberg, and T.M. Conte. Adaptive Mode Control: A Static-Power-Efficient Cache Design", Instruction Set. In Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, 2001.]] Google ScholarDigital Library
Index Terms
- Dynamic techniques to reduce memory traffic in embedded systems
Recommendations
Reducing traffic generated by conflict misses in caches
CF '04: Proceedings of the 1st conference on Computing frontiersOff-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a ...
Using the first-level caches as filters to reduce the pollution caused by speculative memory references
High-performance processors employ aggressive branch prediction and prefetching techniques to increase performance. Speculative memory references caused by these techniques sometimes bring data into the caches that are not needed by correct execution. ...
Designing a Modern Memory Hierarchy with Hardware Prefetching
In this paper, we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that, even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-...
Comments