Article

Dynamic techniques to reduce memory traffic in embedded systems

Authors:
Ben Juurlink

Delft University of Technology, Delft, The Netherlands

Delft University of Technology, Delft, The Netherlands
View Profile

,
Pepijn de Langen

Delft University of Technology, Delft, The Netherlands

Delft University of Technology, Delft, The Netherlands
View Profile

CF '04: Proceedings of the 1st conference on Computing frontiersApril 2004Pages 192–201https://doi.org/10.1145/977091.977118

Published:14 April 2004Publication History

CF '04: Proceedings of the 1st conference on Computing frontiers

Pages 192–201

ABSTRACT

Memory transfers, in particular from/to off-chip memories, consume a significant amount of power. In order to reduce the amount of off-chip memory traffic, one or more levels of cache can be employed, located on the same die as the processor core. For performance, energy, and cost reasons, it is expedient that the on-chip cache is small and direct-mapped. Small, direct-mapped caches, however, generally produce much more traffic than needed. The purpose of this paper is two-fold. First, to measure how much traffic is generated by small, direct-mapped caches and what the minimal amount of traffic is. This yields an upper bound on the amount of traffic that can be saved by utilizing the on-chip memory more effectively. Second, we survey some techniques that can be deployed to reduce the amount of traffic produced by direct-mapped caches and present results for some of these techniques.

References

A. Agarwal and S. D. Pudar. Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct Mapped Caches. In Proc. Int. Symp. on Computer Architecture, 1993.]] Google ScholarDigital Library
G. Albera and I. Bahar. Power/Performance Advantages of Victim Buffer in High-Performance Processors. In Proc. Int. Symp. on Low Power Electronics and Design, 1998.]]Google Scholar
D. Burger, J.R. Goodman, and A. Kägi. Memory Bandwidth Limitations of Future Microprocessors. In Proc. Int. Symp. on Computer Architecture, 1996.]] Google ScholarDigital Library
Francky Catthoor. Energy-Delay Efficient Data Storage and Transfer Architectures and Methodologies: Current Solutions and Remaining Problems. Jnl. of VLSI Signal Processing, 21(3):219--231, 1999.]]Google ScholarDigital Library
Francky Catthoor, Koen Danckaert, Chidamber Kulkarni, Erik Brockmeyer, Per~Gunnar Kjeldsberg, Tanja Van Achteren, and Thierry Omnes. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, 2002.]]Google ScholarDigital Library
Pepijn de Langen and Ben Juurlink. Reducing Traffic Generated by Conflict Misses in Caches. In Proc. Computing Frontiers Conf., 2004.]] Google ScholarDigital Library
A. González, C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality. In Proc. Int. Conf. on Supercomputing, 1995.]] Google ScholarDigital Library
A. González, M. Valero, N. Topham, and J. Parcerisa. Eliminating Cache Conflict Misses through XOR-based Placement Functions. In Proc. Int Conf. on Supercomputing, 1997.]] Google ScholarDigital Library
M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, and R.B. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Proc. Annual Workshop on Workload Characterization, 2001.]] Google ScholarDigital Library
M. Huang, J. Renau, S-M. Yoo, and J. Torrellas. L1 Data Cache Decomposition for Energy Efficiency. In Int. Symp. on Low Power Electronics and Design, pages 10--15, 2001.]] Google ScholarDigital Library
Koji Inoue, Tohru Ishihara, and Kazuaki Murakami. Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption. In Proc. Int. Symp. on Low Power Electronics and Design, 1999.]] Google ScholarDigital Library
Teresa L. Johnson, Matthew C. Merten, and Wen-mei W. Hwu. Run-Time Spatial Locality Detection and Optimization. In Proc. Int. Symp. on Microarchitecture, pages 57--64, 1997.]] Google ScholarDigital Library
N.P. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proc. Int. Symp. on Computer Architecture, pages 364--373, 1990.]] Google ScholarDigital Library
Ben Juurlink. Unified Dual Data Caches. In Proc. Euromicro Symp. on Digital System Design, 2003.]] Google ScholarDigital Library
S. Kaxiras, Z. Hu, G. Narlikar, and R. McLellan. Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power. In Proc. Workshop on Power-Aware Computer Systems, 2000.]] Google ScholarDigital Library
Johnson Kin, Munish Gupta, and William H. Mangione-Smith. Filtering Memory References to Increase Energy Efficiency. IEEE Trans. on Computers, 49(1), 2000.]] Google ScholarDigital Library
Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Proc. Int. Symp. on Microarchitecture, pages 330--335, 1997.]] Google ScholarDigital Library
H.-H. S. Lee and G.S. Tyson. Region-Based Caching: An Energy-Delay Efficient Memory Architecture for Embedded Processors. In Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2000.]] Google ScholarDigital Library
G. Memik, G. Reinman, and W.H. Mangione-Smith. Reducing Energy and Delay Using Efficient Victim Caches. In Proc. Int. Symp. on Low Power Electronics and Design, pages 262--265, 2003.]] Google ScholarDigital Library
T.H. Meng, B. Gordon, E. Tsern, and A. Hung. Portable Video-on-Demand in Wireless Communication. Proc. of the IEEE, special issue on "Low Power Electronics", 83(4), 1995.]]Google ScholarCross Ref
K. Palem, R. Rabbah, V. Mooney III, P. Korkmaz, and K. Puttaswamy. Design Space Optimization of Embedded Memory Systems via Data Remapping. In Proc. Joint Conf. on Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems, 2002.]] Google ScholarDigital Library
Preeti~Ranjan Panda, Nikil Dutt, and Alexandru Nicolau. Memory Issues in Embedded Systems-on-Chip. Kluwer Academic Publishers, Boston/Dordrecht/London, 1999.]] Google ScholarDigital Library
P. Petrov and A. Orailoglu. Performance and Power Effectiveness in Embedded Processors - Customizable Partitioned Caches. IEEE Trans. of Computer-Aided Design of Integrated Circuits and Systems, 20(11), 2001.]]Google ScholarDigital Library
M. Prvulović, D. Marinov, Z. Dimitrijević, and V. Milutinović. Split Temporal/Spatial Cache: A Survey and Reevaluation of Performance. IEEE TCCA Newsletter, July 1999.]]Google Scholar
G. Reinman and N. Jouppi. An Integrated Cache Timing and Power Model. Technical Report CACTI 2.0, COMPAQ Western Research Lab, 1999.]]Google Scholar
Jeffrey B. Rothman and Alan Jay Smith. The Pool of Subsectors Cache Design. In Proc. Int. Conf. on Supercomputing, pages 31--42. ACM Press, 1999.]] Google ScholarDigital Library
Tajana Simunic, Luca Benini, and Giovanni De Micheli. Energy-Efficient Design of Battery-Powered Embedded Systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(1), 2001.]] Google ScholarDigital Library
R. A. Sugumar and S. G. Abraham. Efficient Simulation of Caches Under Optimal Replacement With Applications to Miss Characterization. In Proc. ACM SIGMETRICS Conf. on Measurement and Modeling Computer Systems, 1993.]] Google ScholarDigital Library
Peter van Vleet, Eric Anderson, Linsay Brown, Jean-Loup Baer, and Anna Karlin. Pursuing the Performance Potential of Dynamic Cache Line Sizes. In Proc. Int. Conf. on Computer Design, 1999.]] Google ScholarDigital Library
Alexander~V. Veidenbaum, Weiyu Tang, Rajesh Gupta, Alexandru Nicolau, and Xiaomei Ji. Adapting Cache Line Size to Application Behavior. In Proc. Int. Conf. on Supercomputing, pages 145--154, 1999.]] Google ScholarDigital Library
S. Wuytack, F. Catthoor, F. Franssen, L. Nachtergaele, and H. De Man. Global Communication and Memory Optimizing Transformations for Low-Power Signal Processing Systems. In Proc. IEEE Int. Workshop on Low Power Design, 1994.]]Google Scholar
Se-Hyun Yang, Michael~D. Powell, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches. In Proc. Int. Symp. on High Performance Computer Architectures, 2001.]] Google ScholarDigital Library
H. Zhou, M.C. Toburen, E. Rotenberg, and T.M. Conte. Adaptive Mode Control: A Static-Power-Efficient Cache Design", Instruction Set. In Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, 2001.]] Google ScholarDigital Library

Index Terms

Dynamic techniques to reduce memory traffic in embedded systems
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Reducing traffic generated by conflict misses in caches
CF '04: Proceedings of the 1st conference on Computing frontiers

Off-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a ...
Read More
Using the first-level caches as filters to reduce the pollution caused by speculative memory references

High-performance processors employ aggressive branch prediction and prefetching techniques to increase performance. Speculative memory references caused by these techniques sometimes bring data into the caches that are not needed by correct execution. ...
Read More
Designing a Modern Memory Hierarchy with Hardware Prefetching

In this paper, we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that, even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '04: Proceedings of the 1st conference on Computing frontiers
April 2004
522 pages
ISBN:1581137419
DOI:10.1145/977091
General Chair:
Stamatis Vassiliadis
Delft University of Technology, The Netherlands
,
Program Chairs:
Jean-Luc Gaudiot
University of California at Irvine, USA
,
Vincenzo Piuri
University of Milan, Italy
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
caches
embedded processors
memory traffic
power consumption
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate240of680submissions,35%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 429
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dynamic techniques to reduce memory traffic in embedded systems

CF '04: Proceedings of the 1st conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing traffic generated by conflict misses in caches

Using the first-level caches as filters to reduce the pollution caused by speculative memory references

Designing a Modern Memory Hierarchy with Hardware Prefetching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dynamic techniques to reduce memory traffic in embedded systems

CF '04: Proceedings of the 1st conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reducing traffic generated by conflict misses in caches

Using the first-level caches as filters to reduce the pollution caused by speculative memory references

Designing a Modern Memory Hierarchy with Hardware Prefetching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media