skip to main content
article

A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications

Published:27 September 2003Publication History
Skip Abstract Section

Abstract

The ever-increasing gap between processor and memory speed is an issue also in embedded systems, because of the increased complexity of multimedia elaborations and the strict resource constraints of these devices.Profile-driven code optimization techniques can be effectively employed for tuning application-cache interaction and performances of cache system itself. In fact, applications running on such systems are usually known in advance and do not change over time. In a previous paper, we presented a profile-based code restructuring technique (CAT) that was able to dramatically increase cache exploitation of embedded applications.However, it is well known that profile-driven optimizations can suffer from input-sensitivity problems: an application that is optimized for a particular input can perform even worse than the original one, when subjected other inputs.In this paper we take into account jpeg and mpeg compressor/decompressor applications and analyze the input-sensitivity of CAT improved layouts over a wide range of inputs. The input sets were accurately determined through both black-box and white-box analysis of applications.We propose two metrics for measuring the input-sensitivity of application layouts, and show how our profile-driven code transformation technique is able to reduce the input-sensitivity of the considered applications up to 48% on caches ranging from 1 KByte to 8KByte.

References

  1. M. D. Hill and A. J. Smith, "Evaluating Associativity in CPU caches", IEEE Transactions on Computers, vol. 38, no. 12, pp. 1612--1630, December 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Milutinovic, B. Markovic, M. Tomasevic and M. Tremblay. "The Split Temporal/Spatial Cache" Proceeding of SCIzzL5, Santa Clara, California, USA, pp. 63--69, March 1996.Google ScholarGoogle Scholar
  3. A. González, C. Aliagas and M. Valero, "A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality", Proceedings of ACM ICS 95, Barcelona, Spain, pp.338--347. July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Rotemberg, S. Bennet, J. E. Smith. "A Trace Cache Microarchitecture and Evaluation". IEEE Transactions on Computers, Special Issue on Cache Memory, vol. 42, no. 2, February 1999, pp. 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. P. Jouppi "Improving Direct-mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers". Proceedings of 17th ISCA, Seattle, WA, USA, pp. 364--373, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Topham, A. Gonzáles "Randomized Cache Placement for Eliminating Conflicts" IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 185--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. González, M. Valero, N. Topham and J. Parcerisa, "Eliminating Cache Conflict Misses Through XOR-Based Placement Functions", Proc. International Conference on Supercomputing, Vienna, Austria, July 1997, pp.76--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Sánchez, A. González, "A Locality Sensitive Multi-Module Cache with Explicit Management", Proc. of the ACM Int. Conf. on Supercomputing (ICS-99), Rhodes (Greece), June 1999, pp. 51--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. V. Milutinovic, M. Valero, "Guest Editors' IntroductionCache Memory and Related Problems: Enhancing and Exploiting the Locality", ". IEEE Transactions on Computers, Vol: 48, No. 2, February 1999, pp. 97--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Kandemir, J. Ramanujam, A. Choudhary. "Improving Cache Locality by a Combination of Loop and Data Transformations". IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 159--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Panda, H. Nakamura, N. Dutt, A. Nicolau. "Augmenting Loop Tiling with Data Alignment for Improved Cache Performance". IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 142--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. McFarling, "Procedure Merging with Instruction Caches", ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, June 26--28, 1991, pp.71--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Kalamatianos, A. Khalafi, D. Kaeli, W. Meleis. "Analysis of Temporal-Based Program Behaviour for Improved Instruction Cache Performance". IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 168--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Torrellas, R. Daigle. "Optimizing the Instruction Cache Performance of the Operating System". IEEE Transactions on Computers. Vol. 47, No. 12, Dec. 1998, pp.1363--1381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Pettis and R. C. Hansen, "Profile Guided Code Positioning". Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, ACM, June 1990, pp. 16--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Gloy, T. Blackwell, M. D. Smith and B. Calder. "Procedure Placement Using Temporal Ordering Information". Proc. of the 30th IEEE Annual International Symposium on Microarchitecture (Micro'97), Los Alamitos, pp. 303--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. A. Prete, M. Graziano, F. Lazzarini. "The ChARM Tool for Tuning Embedded Systems", IEEE Micro, July/august 1997, pp. 67--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Bartolini, C. A. Prete. "A Cache-Aware Program Transformation Technique Suitable for Embedded Systems", I&ST, ISSN:0950-5849, October 2002, Vol. 44, Num. 13, pp. 783--795.Google ScholarGoogle Scholar
  19. SPEC consortium. http://www.spec.org.Google ScholarGoogle Scholar
  20. Independent Jpeg Group. http://www.ijg.org/.Google ScholarGoogle Scholar
  21. MPEG Software Simulation Group. http://www.mpeg.org/MPEG/MSSG/.Google ScholarGoogle Scholar
  22. "JumpStart Reference Manual", Philips Ex-VLSI Technology Inc. 1998.Google ScholarGoogle Scholar
  23. A. Milenkovic, M. Milenkovic, N. Barnes. "A Performance Evaluation of Memory Hierarchy in Embedded Systems", Proceedings of the IEEE Southeastern Conference on System Theory, Morgantown WV, USA, March 2003.Google ScholarGoogle Scholar
  24. Intel #8482;Xscale ® Core - Developer's Manual, December 2000, http://developer.intel.com.Google ScholarGoogle Scholar
  25. P. Magarshack, P. G. Paulin. "System-on-Chip Beyond the Nanometer Wall", Proceedings of the 40th DAC Conference, Anaheim, CA, USA, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 32, Issue 3
    Special issue: MEDEA-2003 workshop
    June 2004
    81 pages
    ISSN:0163-5964
    DOI:10.1145/1024295
    Issue’s Table of Contents
    • cover image ACM Conferences
      MEDEA '03: Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
      September 2003
      75 pages
      ISBN:9781450378208
      DOI:10.1145/1152923

    Copyright © 2003 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 September 2003

    Check for updates

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader