article

Dynamic allocation for scratch-pad memory using compile-time decisions

Authors:
Sumesh Udayakumaran

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Angel Dominguez

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Rajeev Barua

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 5 Issue 2pp 472–511https://doi.org/10.1145/1151074.1151085

Published:01 May 2006Publication History

ACM Transactions on Embedded Computing Systems

Abstract

In this research, we propose a highly predictable, low overhead, and, yet, dynamic, memory-allocation strategy for embedded systems with scratch pad memory. A scratch pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees versus cache and by its significantly lower overheads in energy consumption, area, and overall runtime, even with a simple allocation scheme. Primarily scratch pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption, and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache. We propose a dynamic allocation methodology for global and stack data and program code that; (i) accounts for changing program requirements at runtime, (ii) has no software-caching tags, (iii) requires no runtime checks, (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method, data that is about to be accessed frequently is copied into the scratch pad using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation, results show that our scheme reduces runtime by up to 39.8% and energy by up to 31.3%, on average, for our benchmarks, depending on the SRAM size used. The actual gain depends on the SRAM size, but our results show that close to the maximum benefit in runtime and energy is achieved for a substantial range of small SRAM sizes commonly found in embedded systems. Our comparison with a direct mapped cache shows that our method performs roughly as well as a cached architecture.

References

Ahmed, N., Mateev, N., and Pingali, K. 2000. Tiling imperfectly-nested loop nests. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (CDROM). IEEE Computer Society, 31.]] Google Scholar
Anantharaman, S. and Pande, S. 1998. Compiler optimization for real time execution of loops on limited memory embedded systems. In Proc. of the 19th IEEE Real-Time Systems Symposium.]] Google Scholar
Angiolini, F., Benini, L., and Caprara, A. 2003. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM Press, New York, 318--326.]] Google Scholar
Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM Press, New York. 259--267.]] Google Scholar
Appel, A. W. and Ginsburg, M. 1998. Modern Compiler Implementation in C. Cambridge University Press, Cambridge.]] Google Scholar
Avissar, O., Barua, R., and Stewart, D. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the ACM 2nd International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), Atlanta, GA. (Also at http://www.ece.umd.edu/~barua.)]] Google Scholar
Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad based embedded systems. ACM Transactions on Embedded Systems (TECS) 1, 1 (Sept.).]] Google ScholarDigital Library
Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Tenth International Symposium on Hardware/Software Codesign (CODES). Estes Park, Colorado, ACM Press, New York.]] Google Scholar
Baynes, K., Collins, C., Fiterman, E., Ganesh, B., Kohout, P., Smit, C., Zhang, T., and Jacob, B. 2001. The performance and energy consumption of three embedded real-time operating systems. In Proceedings of the 4th Workshop on Compiler and Architecture Support for Embedded Systems (CASES'01). Atlanta, GA. 203--210.]] Google Scholar
Baynes, K., Collins, C., Fiterman, E., Ganesh, B., Kohout, P., Smit, C., Zhang, T., and Jacob, B. 2003. The performance and energy consumption of embedded real-time operating systems. IEEE Transactions on Computers 52, 11 (Nov.), 1454--1469.]] Google ScholarDigital Library
Belady, L. 1966. A study of replacement algorithms for virtual storage. In IBM Systems Journal 5, 78--101.]]Google ScholarDigital Library
Bringmann, R. A. 1995. Compiler-controlled speculation. Ph.D. thesis, Department of Computer Science, University of Illinois, Urbana, IL.]]Google Scholar
Chilimbi, T. M., Davidson, B., and Larus, J. 1999. Cache-conscious structure definition. In ACM SIGPLAN Notices. ACM Press, New York. 13--24.]] Google Scholar
Dinero Cache Simulator Revised. 2004. DineroIV Cache Simulator. http://www.cs.wisc.edu/markhill/DineroIV/.]]Google Scholar
Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing(JEC) 1, 4, IOS Press, Amsterdam, Netherlands.]] Google Scholar
Eisenbeis, C., Jalby, W. D., and Fran, C. 1990. A strategy for array management in local memory. In Technical Report 1262, INRIA, Domaine de Voluceau, France.]]Google Scholar
Francesco, P., Marchal, P., Atienza, D., Luca Benini, F. C., and Mendias, J. M. 2004. An integrated hardware/software approach for runtime scratchpad management. In Proceedings of the Design Automation Conference. ACM Press, New York. 238--243.]] Google Scholar
Goodwin, D. W. and Wilken, K. D. 1996. Optimal and near-optimal global register allocation using 0-1 integer programming. In Software-Practice and Experience. 929--965.]] Google Scholar
Hallnor, G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA). Vancouver, British Columbia, Canada.]] Google Scholar
Hiser, J. D. and Davidson, J. W. 2004. Embarc: An efficient memory bank assignment algorithm for retargetable compilers. In Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM Press, New York. 182--191.]] Google Scholar
ILOG Corporation. 2001. The CPLEX optimization suite. http://www.ilog.com/products/cplex/.]]Google Scholar
Janzen, J. 2001. Calculating memory system power for DDR SDRAM. In DesignLine Journal 10(2). Micron Technology Inc. http://www.micron.com/publications/designline.html.]]Google Scholar
Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Design Automation Conference. 690--695.]] Google Scholar
Keitel-Sculz, D. and Wehn, N. 2001. Embedded DRAM development technology, physical design, and application issues. In IEEE Design and Test of Computers.]] Google Scholar
Larson, E. and Austin, T. 2000. Compiler controlled value prediction using branch predictor based confidence. In Proceedings of the 33th Annual International Symposium on Microarchitecture (MICRO-33). IEEE Computer Society.]] Google Scholar
Lctes Panel. 2003. Compilation challenges for network processors. Industrial Panel, ACM Conference on Languages, Compilers and Tools for Embedded Systems (LCTES). Slides at http://www.cs.purdue.edu/s3/LCTES03/.]]Google Scholar
Lim, A. W., Liao, S.-W., and Lam, M. S. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proceedings of the eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. ACM Press, New York. 103--112.]] Google Scholar
Luk, C.-K. and Mowry, T. C. 1998. Cooperative instruction prefetching in modern processors. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture. 182--194.]] Google Scholar
Micron-flash data sheet. 128Mb Q-Flash memory. Micron technology Inc. http://www.micron.com/products/nor/qflash/partlist.aspx.]]Google Scholar
Micron-datasheet. 2003. 128Mb DDR SDRAM data sheet. (Dual data-rate synchronous DRAM) Micron Technology Inc. http://www.micron.com/products/dram/ddrsdram/.]]Google Scholar
Moritz, C. A., Frank, M., and Amarasinghe, S. 2000. FlexCache: A framework for flexible compiler generated data caching. In The 2nd Workshop on Intelligent Memory Systems, Boston, MA.]] Google ScholarDigital Library
Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip versus off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems 5, 3 (July).]] Google ScholarDigital Library
Schreiber, R. D. C. 2004. Near-optimal allocation of local memory arrays. In HPL-2004-24.]]Google Scholar
Sinha, A. and Chandrakasan, A. 2001. JouleTrack---A web based tool for software energy profiling. In Design Automation Conference. 220--225.]] Google Scholar
Sjodin, J. and Platen, C. V. 2001. Storage allocation for embedded processors. Compiler and Architecture Support for Embedded Computing Systems.]] Google Scholar
Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems.]]Google Scholar
Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto on chip memory. In Proceedings of the 15th International Symposium on System Synthesis (ISSS) (Kyoto, Japan). ACM Press, New York.]] Google Scholar
Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society, 409.]] Google Scholar
Tanenbaum, A. S. 1998. Structured Computer Organization (4th Edition). Prentice Hall, Englewood, Cliffs, NJ.]] Google Scholar
Tiwari, V. and Lee, M. T. C. 1998. Power analysis of a 32-bit embedded microcontroller. VLSI Design Journal 7, 33.]]Google Scholar
Udayakumaran, S., Narahari, B., and Simha, R. 2002. Application specific memory partitioning for low power. In Proceedings of ACM COLP 2002 (Compiler and Operating Systems for Low Power. ACM Press, New York.]]Google Scholar
Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). ACM Press, New York. 276--286.]] Google Scholar
Udayakumaran, S. and Barua, R. 2006. An integrated scratch-pad allocator for affine and non-affine code. In Design, Automation and Test in Europe (DATE). IEEE Computer Society.]] Google Scholar
Verma, M., Wehmeyer, L., and Marwedel, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society.]] Google Scholar
Verma, M., Wehmeyer, L., and Marwedel, P. 2004b. Dynamic overlay of scratch-pad memory for energy minimization. In International Conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS) (Stockholm, Sweden). ACM Press, New York.]] Google Scholar
Wehmeyer, L. and Marwedel, P. 2004. Influence of onchip scratch-pad memories on wcet prediction. In Proceedings of the 4th International Workshop on Worst-Case Execution Time (WCET) Analysis.]]Google Scholar
Wehmeyer, L., Helmig, U., and Marwedel, P. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues (WMPI2004).]] Google Scholar
Wilton, S. and Jouppi, N. 1996. Cacti: An enhanced cache access and cycle time model. In IEEE Journal of Solid-State Circuits.]]Google Scholar

Index Terms

Dynamic allocation for scratch-pad memory using compile-time decisions

Recommendations

Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems

This paper presents a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is ...
Read More
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems

This paper presents the first memory allocation scheme for embedded systems having scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. Its uses ...
Read More
Recursive function data allocation to scratch-pad memory
CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems

This paper presents the first automatic scheme to allocate local (stack) data in recursive functions to scratch-pad memory (SPM) in embedded systems. A scratch-pad is a fast directly addressed compiler-managed SRAM memory that replaces the hardware-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Embedded Computing Systems Volume 5, Issue 2
May 2006
253 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/1151074
Issue’s Table of Contents

Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 1 May 2006
Published in tecs Volume 5, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Memory allocation
compiler
embedded systems
scratch pad
software caching
software-managed cache
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 197
  Total Citations
  View Citations
- 1,626
  Total Downloads
- Downloads (Last 12 months)30
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Recursive function data allocation to scratch-pad memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Recursive function data allocation to scratch-pad memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media