skip to main content
10.1145/977091.977118acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
Article

Dynamic techniques to reduce memory traffic in embedded systems

Published:14 April 2004Publication History

ABSTRACT

Memory transfers, in particular from/to off-chip memories, consume a significant amount of power. In order to reduce the amount of off-chip memory traffic, one or more levels of cache can be employed, located on the same die as the processor core. For performance, energy, and cost reasons, it is expedient that the on-chip cache is small and direct-mapped. Small, direct-mapped caches, however, generally produce much more traffic than needed. The purpose of this paper is two-fold. First, to measure how much traffic is generated by small, direct-mapped caches and what the minimal amount of traffic is. This yields an upper bound on the amount of traffic that can be saved by utilizing the on-chip memory more effectively. Second, we survey some techniques that can be deployed to reduce the amount of traffic produced by direct-mapped caches and present results for some of these techniques.

References

  1. A. Agarwal and S. D. Pudar. Column-Associative Caches: A Technique for Reducing the Miss Rate of Direct Mapped Caches. In Proc. Int. Symp. on Computer Architecture, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Albera and I. Bahar. Power/Performance Advantages of Victim Buffer in High-Performance Processors. In Proc. Int. Symp. on Low Power Electronics and Design, 1998.]]Google ScholarGoogle Scholar
  3. D. Burger, J.R. Goodman, and A. Kägi. Memory Bandwidth Limitations of Future Microprocessors. In Proc. Int. Symp. on Computer Architecture, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Francky Catthoor. Energy-Delay Efficient Data Storage and Transfer Architectures and Methodologies: Current Solutions and Remaining Problems. Jnl. of VLSI Signal Processing, 21(3):219--231, 1999.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Francky Catthoor, Koen Danckaert, Chidamber Kulkarni, Erik Brockmeyer, Per~Gunnar Kjeldsberg, Tanja Van Achteren, and Thierry Omnes. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, 2002.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Pepijn de Langen and Ben Juurlink. Reducing Traffic Generated by Conflict Misses in Caches. In Proc. Computing Frontiers Conf., 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. González, C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality. In Proc. Int. Conf. on Supercomputing, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. González, M. Valero, N. Topham, and J. Parcerisa. Eliminating Cache Conflict Misses through XOR-based Placement Functions. In Proc. Int Conf. on Supercomputing, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M.R. Guthaus, J.S. Ringenberg, D. Ernst, T.M. Austin, T. Mudge, and R.B. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Proc. Annual Workshop on Workload Characterization, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Huang, J. Renau, S-M. Yoo, and J. Torrellas. L1 Data Cache Decomposition for Energy Efficiency. In Int. Symp. on Low Power Electronics and Design, pages 10--15, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Koji Inoue, Tohru Ishihara, and Kazuaki Murakami. Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption. In Proc. Int. Symp. on Low Power Electronics and Design, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Teresa L. Johnson, Matthew C. Merten, and Wen-mei W. Hwu. Run-Time Spatial Locality Detection and Optimization. In Proc. Int. Symp. on Microarchitecture, pages 57--64, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N.P. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proc. Int. Symp. on Computer Architecture, pages 364--373, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ben Juurlink. Unified Dual Data Caches. In Proc. Euromicro Symp. on Digital System Design, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kaxiras, Z. Hu, G. Narlikar, and R. McLellan. Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power. In Proc. Workshop on Power-Aware Computer Systems, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Johnson Kin, Munish Gupta, and William H. Mangione-Smith. Filtering Memory References to Increase Energy Efficiency. IEEE Trans. on Computers, 49(1), 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Proc. Int. Symp. on Microarchitecture, pages 330--335, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H.-H. S. Lee and G.S. Tyson. Region-Based Caching: An Energy-Delay Efficient Memory Architecture for Embedded Processors. In Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Memik, G. Reinman, and W.H. Mangione-Smith. Reducing Energy and Delay Using Efficient Victim Caches. In Proc. Int. Symp. on Low Power Electronics and Design, pages 262--265, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T.H. Meng, B. Gordon, E. Tsern, and A. Hung. Portable Video-on-Demand in Wireless Communication. Proc. of the IEEE, special issue on "Low Power Electronics", 83(4), 1995.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. K. Palem, R. Rabbah, V. Mooney III, P. Korkmaz, and K. Puttaswamy. Design Space Optimization of Embedded Memory Systems via Data Remapping. In Proc. Joint Conf. on Languages, Compilers, and Tools for Embedded Systems and Software and Compilers for Embedded Systems, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Preeti~Ranjan Panda, Nikil Dutt, and Alexandru Nicolau. Memory Issues in Embedded Systems-on-Chip. Kluwer Academic Publishers, Boston/Dordrecht/London, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Petrov and A. Orailoglu. Performance and Power Effectiveness in Embedded Processors - Customizable Partitioned Caches. IEEE Trans. of Computer-Aided Design of Integrated Circuits and Systems, 20(11), 2001.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Prvulović, D. Marinov, Z. Dimitrijević, and V. Milutinović. Split Temporal/Spatial Cache: A Survey and Reevaluation of Performance. IEEE TCCA Newsletter, July 1999.]]Google ScholarGoogle Scholar
  25. G. Reinman and N. Jouppi. An Integrated Cache Timing and Power Model. Technical Report CACTI 2.0, COMPAQ Western Research Lab, 1999.]]Google ScholarGoogle Scholar
  26. Jeffrey B. Rothman and Alan Jay Smith. The Pool of Subsectors Cache Design. In Proc. Int. Conf. on Supercomputing, pages 31--42. ACM Press, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tajana Simunic, Luca Benini, and Giovanni De Micheli. Energy-Efficient Design of Battery-Powered Embedded Systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(1), 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. A. Sugumar and S. G. Abraham. Efficient Simulation of Caches Under Optimal Replacement With Applications to Miss Characterization. In Proc. ACM SIGMETRICS Conf. on Measurement and Modeling Computer Systems, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Peter van Vleet, Eric Anderson, Linsay Brown, Jean-Loup Baer, and Anna Karlin. Pursuing the Performance Potential of Dynamic Cache Line Sizes. In Proc. Int. Conf. on Computer Design, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alexander~V. Veidenbaum, Weiyu Tang, Rajesh Gupta, Alexandru Nicolau, and Xiaomei Ji. Adapting Cache Line Size to Application Behavior. In Proc. Int. Conf. on Supercomputing, pages 145--154, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Wuytack, F. Catthoor, F. Franssen, L. Nachtergaele, and H. De Man. Global Communication and Memory Optimizing Transformations for Low-Power Signal Processing Systems. In Proc. IEEE Int. Workshop on Low Power Design, 1994.]]Google ScholarGoogle Scholar
  32. Se-Hyun Yang, Michael~D. Powell, Babak Falsafi, Kaushik Roy, and T. N. Vijaykumar. An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches. In Proc. Int. Symp. on High Performance Computer Architectures, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Zhou, M.C. Toburen, E. Rotenberg, and T.M. Conte. Adaptive Mode Control: A Static-Power-Efficient Cache Design", Instruction Set. In Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dynamic techniques to reduce memory traffic in embedded systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CF '04: Proceedings of the 1st conference on Computing frontiers
          April 2004
          522 pages
          ISBN:1581137419
          DOI:10.1145/977091

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 April 2004

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate240of680submissions,35%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader