skip to main content
article

Dynamic allocation for scratch-pad memory using compile-time decisions

Published:01 May 2006Publication History
Skip Abstract Section

Abstract

In this research, we propose a highly predictable, low overhead, and, yet, dynamic, memory-allocation strategy for embedded systems with scratch pad memory. A scratch pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees versus cache and by its significantly lower overheads in energy consumption, area, and overall runtime, even with a simple allocation scheme. Primarily scratch pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption, and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache. We propose a dynamic allocation methodology for global and stack data and program code that; (i) accounts for changing program requirements at runtime, (ii) has no software-caching tags, (iii) requires no runtime checks, (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method, data that is about to be accessed frequently is copied into the scratch pad using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation, results show that our scheme reduces runtime by up to 39.8% and energy by up to 31.3%, on average, for our benchmarks, depending on the SRAM size used. The actual gain depends on the SRAM size, but our results show that close to the maximum benefit in runtime and energy is achieved for a substantial range of small SRAM sizes commonly found in embedded systems. Our comparison with a direct mapped cache shows that our method performs roughly as well as a cached architecture.

References

  1. Ahmed, N., Mateev, N., and Pingali, K. 2000. Tiling imperfectly-nested loop nests. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (CDROM). IEEE Computer Society, 31.]] Google ScholarGoogle Scholar
  2. Anantharaman, S. and Pande, S. 1998. Compiler optimization for real time execution of loops on limited memory embedded systems. In Proc. of the 19th IEEE Real-Time Systems Symposium.]] Google ScholarGoogle Scholar
  3. Angiolini, F., Benini, L., and Caprara, A. 2003. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In Proceedings of the 2003 International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM Press, New York, 318--326.]] Google ScholarGoogle Scholar
  4. Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. ACM Press, New York. 259--267.]] Google ScholarGoogle Scholar
  5. Appel, A. W. and Ginsburg, M. 1998. Modern Compiler Implementation in C. Cambridge University Press, Cambridge.]] Google ScholarGoogle Scholar
  6. Avissar, O., Barua, R., and Stewart, D. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the ACM 2nd International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), Atlanta, GA. (Also at http://www.ece.umd.edu/~barua.)]] Google ScholarGoogle Scholar
  7. Avissar, O., Barua, R., and Stewart, D. 2002. An optimal memory allocation scheme for scratch-pad based embedded systems. ACM Transactions on Embedded Systems (TECS) 1, 1 (Sept.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., and Marwedel, P. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Tenth International Symposium on Hardware/Software Codesign (CODES). Estes Park, Colorado, ACM Press, New York.]] Google ScholarGoogle Scholar
  9. Baynes, K., Collins, C., Fiterman, E., Ganesh, B., Kohout, P., Smit, C., Zhang, T., and Jacob, B. 2001. The performance and energy consumption of three embedded real-time operating systems. In Proceedings of the 4th Workshop on Compiler and Architecture Support for Embedded Systems (CASES'01). Atlanta, GA. 203--210.]] Google ScholarGoogle Scholar
  10. Baynes, K., Collins, C., Fiterman, E., Ganesh, B., Kohout, P., Smit, C., Zhang, T., and Jacob, B. 2003. The performance and energy consumption of embedded real-time operating systems. IEEE Transactions on Computers 52, 11 (Nov.), 1454--1469.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Belady, L. 1966. A study of replacement algorithms for virtual storage. In IBM Systems Journal 5, 78--101.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bringmann, R. A. 1995. Compiler-controlled speculation. Ph.D. thesis, Department of Computer Science, University of Illinois, Urbana, IL.]]Google ScholarGoogle Scholar
  13. Chilimbi, T. M., Davidson, B., and Larus, J. 1999. Cache-conscious structure definition. In ACM SIGPLAN Notices. ACM Press, New York. 13--24.]] Google ScholarGoogle Scholar
  14. Dinero Cache Simulator Revised. 2004. DineroIV Cache Simulator. http://www.cs.wisc.edu/markhill/DineroIV/.]]Google ScholarGoogle Scholar
  15. Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing(JEC) 1, 4, IOS Press, Amsterdam, Netherlands.]] Google ScholarGoogle Scholar
  16. Eisenbeis, C., Jalby, W. D., and Fran, C. 1990. A strategy for array management in local memory. In Technical Report 1262, INRIA, Domaine de Voluceau, France.]]Google ScholarGoogle Scholar
  17. Francesco, P., Marchal, P., Atienza, D., Luca Benini, F. C., and Mendias, J. M. 2004. An integrated hardware/software approach for runtime scratchpad management. In Proceedings of the Design Automation Conference. ACM Press, New York. 238--243.]] Google ScholarGoogle Scholar
  18. Goodwin, D. W. and Wilken, K. D. 1996. Optimal and near-optimal global register allocation using 0-1 integer programming. In Software-Practice and Experience. 929--965.]] Google ScholarGoogle Scholar
  19. Hallnor, G. and Reinhardt, S. K. 2000. A fully associative software-managed cache design. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA). Vancouver, British Columbia, Canada.]] Google ScholarGoogle Scholar
  20. Hiser, J. D. and Davidson, J. W. 2004. Embarc: An efficient memory bank assignment algorithm for retargetable compilers. In Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM Press, New York. 182--191.]] Google ScholarGoogle Scholar
  21. ILOG Corporation. 2001. The CPLEX optimization suite. http://www.ilog.com/products/cplex/.]]Google ScholarGoogle Scholar
  22. Janzen, J. 2001. Calculating memory system power for DDR SDRAM. In DesignLine Journal 10(2). Micron Technology Inc. http://www.micron.com/publications/designline.html.]]Google ScholarGoogle Scholar
  23. Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In Design Automation Conference. 690--695.]] Google ScholarGoogle Scholar
  24. Keitel-Sculz, D. and Wehn, N. 2001. Embedded DRAM development technology, physical design, and application issues. In IEEE Design and Test of Computers.]] Google ScholarGoogle Scholar
  25. Larson, E. and Austin, T. 2000. Compiler controlled value prediction using branch predictor based confidence. In Proceedings of the 33th Annual International Symposium on Microarchitecture (MICRO-33). IEEE Computer Society.]] Google ScholarGoogle Scholar
  26. Lctes Panel. 2003. Compilation challenges for network processors. Industrial Panel, ACM Conference on Languages, Compilers and Tools for Embedded Systems (LCTES). Slides at http://www.cs.purdue.edu/s3/LCTES03/.]]Google ScholarGoogle Scholar
  27. Lim, A. W., Liao, S.-W., and Lam, M. S. 2001. Blocking and array contraction across arbitrarily nested loops using affine partitioning. In Proceedings of the eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. ACM Press, New York. 103--112.]] Google ScholarGoogle Scholar
  28. Luk, C.-K. and Mowry, T. C. 1998. Cooperative instruction prefetching in modern processors. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture. 182--194.]] Google ScholarGoogle Scholar
  29. Micron-flash data sheet. 128Mb Q-Flash memory. Micron technology Inc. http://www.micron.com/products/nor/qflash/partlist.aspx.]]Google ScholarGoogle Scholar
  30. Micron-datasheet. 2003. 128Mb DDR SDRAM data sheet. (Dual data-rate synchronous DRAM) Micron Technology Inc. http://www.micron.com/products/dram/ddrsdram/.]]Google ScholarGoogle Scholar
  31. Moritz, C. A., Frank, M., and Amarasinghe, S. 2000. FlexCache: A framework for flexible compiler generated data caching. In The 2nd Workshop on Intelligent Memory Systems, Boston, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Panda, P. R., Dutt, N. D., and Nicolau, A. 2000. On-chip versus off-chip memory: The data partitioning problem in embedded processor-based systems. ACM Transactions on Design Automation of Electronic Systems 5, 3 (July).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Schreiber, R. D. C. 2004. Near-optimal allocation of local memory arrays. In HPL-2004-24.]]Google ScholarGoogle Scholar
  34. Sinha, A. and Chandrakasan, A. 2001. JouleTrack---A web based tool for software energy profiling. In Design Automation Conference. 220--225.]] Google ScholarGoogle Scholar
  35. Sjodin, J. and Platen, C. V. 2001. Storage allocation for embedded processors. Compiler and Architecture Support for Embedded Computing Systems.]] Google ScholarGoogle Scholar
  36. Sjodin, J., Froderberg, B., and Lindgren, T. 1998. Allocation of global data objects in on-chip RAM. Compiler and Architecture Support for Embedded Computing Systems.]]Google ScholarGoogle Scholar
  37. Steinke, S., Grunwald, N., Wehmeyer, L., Banakar, R., Balakrishnan, M., and Marwedel, P. 2002a. Reducing energy consumption by dynamic copying of instructions onto on chip memory. In Proceedings of the 15th International Symposium on System Synthesis (ISSS) (Kyoto, Japan). ACM Press, New York.]] Google ScholarGoogle Scholar
  38. Steinke, S., Wehmeyer, L., Lee, B., and Marwedel, P. 2002b. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society, 409.]] Google ScholarGoogle Scholar
  39. Tanenbaum, A. S. 1998. Structured Computer Organization (4th Edition). Prentice Hall, Englewood, Cliffs, NJ.]] Google ScholarGoogle Scholar
  40. Tiwari, V. and Lee, M. T. C. 1998. Power analysis of a 32-bit embedded microcontroller. VLSI Design Journal 7, 33.]]Google ScholarGoogle Scholar
  41. Udayakumaran, S., Narahari, B., and Simha, R. 2002. Application specific memory partitioning for low power. In Proceedings of ACM COLP 2002 (Compiler and Operating Systems for Low Power. ACM Press, New York.]]Google ScholarGoogle Scholar
  42. Udayakumaran, S. and Barua, R. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). ACM Press, New York. 276--286.]] Google ScholarGoogle Scholar
  43. Udayakumaran, S. and Barua, R. 2006. An integrated scratch-pad allocator for affine and non-affine code. In Design, Automation and Test in Europe (DATE). IEEE Computer Society.]] Google ScholarGoogle Scholar
  44. Verma, M., Wehmeyer, L., and Marwedel, P. 2004a. Cache-aware scratchpad allocation algorithm. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE Computer Society.]] Google ScholarGoogle Scholar
  45. Verma, M., Wehmeyer, L., and Marwedel, P. 2004b. Dynamic overlay of scratch-pad memory for energy minimization. In International Conference on Hardware/Software Codesign and System Synthesis(CODES+ISSS) (Stockholm, Sweden). ACM Press, New York.]] Google ScholarGoogle Scholar
  46. Wehmeyer, L. and Marwedel, P. 2004. Influence of onchip scratch-pad memories on wcet prediction. In Proceedings of the 4th International Workshop on Worst-Case Execution Time (WCET) Analysis.]]Google ScholarGoogle Scholar
  47. Wehmeyer, L., Helmig, U., and Marwedel, P. 2004. Compiler-optimized usage of partitioned memories. In Proceedings of the 3rd Workshop on Memory Performance Issues (WMPI2004).]] Google ScholarGoogle Scholar
  48. Wilton, S. and Jouppi, N. 1996. Cacti: An enhanced cache access and cycle time model. In IEEE Journal of Solid-State Circuits.]]Google ScholarGoogle Scholar

Index Terms

  1. Dynamic allocation for scratch-pad memory using compile-time decisions

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Embedded Computing Systems
            ACM Transactions on Embedded Computing Systems  Volume 5, Issue 2
            May 2006
            253 pages
            ISSN:1539-9087
            EISSN:1558-3465
            DOI:10.1145/1151074
            Issue’s Table of Contents

            Copyright © 2006 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 May 2006
            Published in tecs Volume 5, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader