Abstract
In this research, we propose a highly predictable, low overhead, and, yet, dynamic, memory-allocation strategy for embedded systems with scratch pad memory. A scratch pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees versus cache and by its significantly lower overheads in energy consumption, area, and overall runtime, even with a simple allocation scheme. Primarily scratch pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption, and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache. We propose a dynamic allocation methodology for global and stack data and program code that; (i) accounts for changing program requirements at runtime, (ii) has no software-caching tags, (iii) requires no runtime checks, (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method, data that is about to be accessed frequently is copied into the scratch pad using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation, results show that our scheme reduces runtime by up to 39.8% and energy by up to 31.3%, on average, for our benchmarks, depending on the SRAM size used. The actual gain depends on the SRAM size, but our results show that close to the maximum benefit in runtime and energy is achieved for a substantial range of small SRAM sizes commonly found in embedded systems. Our comparison with a direct mapped cache shows that our method performs roughly as well as a cached architecture.
- Ahmed, N., Mateev, N., and Pingali, K. 2000. Tiling imperfectly-nested loop nests. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (CDROM). IEEE Computer Society, 31.]] Google Scholar
- Anantharaman, S. and Pande, S. 1998. Compiler optimization for real time execution of loops on limited memory embedded systems. In Proc. of the 19th IEEE Real-Time Systems Symposium.]] Google Scholar
- Angiolini, F., Benini, L., and Caprara, A. 2003. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM Press, New York, 318--326.]] Google Scholar
- Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM Press, New York. 259--267.]] Google Scholar
- Appel, A. W. and Ginsburg, M. 1998. Modern Compiler Implementation in C. Cambridge University Press, Cambridge.]] Google Scholar
- Avissar, O., Barua, R., and Stewart, D. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the ACM 2nd International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), Atlanta, GA. (Also at http://www.ece.umd.edu/~barua.)]] Google Scholar
- Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad based embedded systems. ACM Transactions on Embedded Systems (TECS) 1, 1 (Sept.).]] Google ScholarDigital Library
- Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Tenth International Symposium on Hardware/Software Codesign (CODES). Estes Park, Colorado, ACM Press, New York.]] Google Scholar
- Baynes, K., Collins, C., Fiterman, E., Ganesh, B., Kohout, P., Smit, C., Zhang, T., and Jacob, B. 2001. The performance and energy consumption of three embedded real-time operating systems. In Proceedings of the 4th Workshop on Compiler and Architecture Support for Embedded Systems (CASES'01). Atlanta, GA. 203--210.]] Google Scholar
- Baynes, K., Collins, C., Fiterman, E., Ganesh, B., Kohout, P., Smit, C., Zhang, T., and Jacob, B. 2003. The performance and energy consumption of embedded real-time operating systems. IEEE Transactions on Computers 52, 11 (Nov.), 1454--1469.]] Google ScholarDigital Library
- Belady, L. 1966. A study of replacement algorithms for virtual storage. In IBM Systems Journal 5, 78--101.]]Google ScholarDigital Library
- Bringmann, R. A. 1995. Compiler-controlled speculation. Ph.D. thesis, Department of Computer Science, University of Illinois, Urbana, IL.]]Google Scholar
- Chilimbi, T. M., Davidson, B., and Larus, J. 1999. Cache-conscious structure definition. In ACM SIGPLAN Notices. ACM Press, New York. 13--24.]] Google Scholar
- Dinero Cache Simulator Revised. 2004. DineroIV Cache Simulator. http://www.cs.wisc.edu/markhill/DineroIV/.]]Google Scholar
- Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing(JEC) 1, 4, IOS Press, Amsterdam, Netherlands.]] Google Scholar
- Eisenbeis, C., Jalby, W. D., and Fran, C. 1990. A strategy for array management in local memory. In Technical Report 1262, INRIA, Domaine de Voluceau, France.]]Google Scholar
- Francesco, P., Marchal, P., Atienza, D., Luca Benini, F. C., and Mendias, J. M. 2004. An integrated hardware/software approach for runtime scratchpad management. In Proceedings of the Design Automation Conference. ACM Press, New York. 238--243.]] Google Scholar
- Goodwin, D. W. and Wilken, K. D. 1996. Optimal and near-optimal global register allocation using 0-1 integer programming. In Software-Practice and Experience. 929--965.]] Google Scholar
- Hallnor, G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA). Vancouver, British Columbia, Canada.]] Google Scholar
- Hiser, J. D. and Davidson, J. W. 2004. Embarc: An efficient memory bank assignment algorithm for retargetable compilers. In Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM Press, New York. 182--191.]] Google Scholar
- ILOG Corporation. 2001. The CPLEX optimization suite. http://www.ilog.com/products/cplex/.]]Google Scholar
- Janzen, J. 2001. Calculating memory system power for DDR SDRAM. In DesignLine Journal 10(2). Micron Technology Inc. http://www.micron.com/publications/designline.html.]]Google Scholar
- Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Design Automation Conference. 690--695.]] Google Scholar
- Keitel-Sculz, D. and Wehn, N. 2001. Embedded DRAM development technology, physical design, and application issues. In IEEE Design and Test of Computers.]] Google Scholar
- Larson, E. and Austin, T. 2000. Compiler controlled value prediction using branch predictor based confidence. In Proceedings of the 33th Annual International Symposium on Microarchitecture (MICRO-33). IEEE Computer Society.]] Google Scholar
- Lctes Panel. 2003. Compilation challenges for network processors. Industrial Panel, ACM Conference on Languages, Compilers and Tools for Embedded Systems (LCTES). Slides at http://www.cs.purdue.edu/s3/LCTES03/.]]Google Scholar
- Lim, A. W., Liao, S.-W., and Lam, M. S. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proceedings of the eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. ACM Press, New York. 103--112.]] Google Scholar
- Luk, C.-K. and Mowry, T. C. 1998. Cooperative instruction prefetching in modern processors. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture. 182--194.]] Google Scholar
- Micron-flash data sheet. 128Mb Q-Flash memory. Micron technology Inc. http://www.micron.com/products/nor/qflash/partlist.aspx.]]Google Scholar
- Micron-datasheet. 2003. 128Mb DDR SDRAM data sheet. (Dual data-rate synchronous DRAM) Micron Technology Inc. http://www.micron.com/products/dram/ddrsdram/.]]Google Scholar
- Moritz, C. A., Frank, M., and Amarasinghe, S. 2000. FlexCache: A framework for flexible compiler generated data caching. In The 2nd Workshop on Intelligent Memory Systems, Boston, MA.]] Google ScholarDigital Library
- Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip versus off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems 5, 3 (July).]] Google ScholarDigital Library
- Schreiber, R. D. C. 2004. Near-optimal allocation of local memory arrays. In HPL-2004-24.]]Google Scholar
- Sinha, A. and Chandrakasan, A. 2001. JouleTrack---A web based tool for software energy profiling. In Design Automation Conference. 220--225.]] Google Scholar
- Sjodin, J. and Platen, C. V. 2001. Storage allocation for embedded processors. Compiler and Architecture Support for Embedded Computing Systems.]] Google Scholar
- Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems.]]Google Scholar
- Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto on chip memory. In Proceedings of the 15th International Symposium on System Synthesis (ISSS) (Kyoto, Japan). ACM Press, New York.]] Google Scholar
- Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society, 409.]] Google Scholar
- Tanenbaum, A. S. 1998. Structured Computer Organization (4th Edition). Prentice Hall, Englewood, Cliffs, NJ.]] Google Scholar
- Tiwari, V. and Lee, M. T. C. 1998. Power analysis of a 32-bit embedded microcontroller. VLSI Design Journal 7, 33.]]Google Scholar
- Udayakumaran, S., Narahari, B., and Simha, R. 2002. Application specific memory partitioning for low power. In Proceedings of ACM COLP 2002 (Compiler and Operating Systems for Low Power. ACM Press, New York.]]Google Scholar
- Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). ACM Press, New York. 276--286.]] Google Scholar
- Udayakumaran, S. and Barua, R. 2006. An integrated scratch-pad allocator for affine and non-affine code. In Design, Automation and Test in Europe (DATE). IEEE Computer Society.]] Google Scholar
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society.]] Google Scholar
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004b. Dynamic overlay of scratch-pad memory for energy minimization. In International Conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS) (Stockholm, Sweden). ACM Press, New York.]] Google Scholar
- Wehmeyer, L. and Marwedel, P. 2004. Influence of onchip scratch-pad memories on wcet prediction. In Proceedings of the 4th International Workshop on Worst-Case Execution Time (WCET) Analysis.]]Google Scholar
- Wehmeyer, L., Helmig, U., and Marwedel, P. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues (WMPI2004).]] Google Scholar
- Wilton, S. and Jouppi, N. 1996. Cacti: An enhanced cache access and cycle time model. In IEEE Journal of Solid-State Circuits.]]Google Scholar
Index Terms
- Dynamic allocation for scratch-pad memory using compile-time decisions
Recommendations
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systemsThis paper presents a highly predictable, low overhead and yet dynamic, memory allocation strategy for embedded systems with scratch-pad memory. A scratch-pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is ...
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systemsThis paper presents the first memory allocation scheme for embedded systems having scratch-pad memory whose size is unknown at compile time. A scratch-pad memory (SPM) is a fast compiler-managed SRAM that replaces the hardware-managed cache. Its uses ...
Recursive function data allocation to scratch-pad memory
CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systemsThis paper presents the first automatic scheme to allocate local (stack) data in recursive functions to scratch-pad memory (SPM) in embedded systems. A scratch-pad is a fast directly addressed compiler-managed SRAM memory that replaces the hardware-...
Comments