Abstract
The ever-increasing gap between processor and memory speed is an issue also in embedded systems, because of the increased complexity of multimedia elaborations and the strict resource constraints of these devices.Profile-driven code optimization techniques can be effectively employed for tuning application-cache interaction and performances of cache system itself. In fact, applications running on such systems are usually known in advance and do not change over time. In a previous paper, we presented a profile-based code restructuring technique (CAT) that was able to dramatically increase cache exploitation of embedded applications.However, it is well known that profile-driven optimizations can suffer from input-sensitivity problems: an application that is optimized for a particular input can perform even worse than the original one, when subjected other inputs.In this paper we take into account jpeg and mpeg compressor/decompressor applications and analyze the input-sensitivity of CAT improved layouts over a wide range of inputs. The input sets were accurately determined through both black-box and white-box analysis of applications.We propose two metrics for measuring the input-sensitivity of application layouts, and show how our profile-driven code transformation technique is able to reduce the input-sensitivity of the considered applications up to 48% on caches ranging from 1 KByte to 8KByte.
- M. D. Hill and A. J. Smith, "Evaluating Associativity in CPU caches", IEEE Transactions on Computers, vol. 38, no. 12, pp. 1612--1630, December 1989. Google ScholarDigital Library
- V. Milutinovic, B. Markovic, M. Tomasevic and M. Tremblay. "The Split Temporal/Spatial Cache" Proceeding of SCIzzL5, Santa Clara, California, USA, pp. 63--69, March 1996.Google Scholar
- A. González, C. Aliagas and M. Valero, "A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality", Proceedings of ACM ICS 95, Barcelona, Spain, pp.338--347. July 1995. Google ScholarDigital Library
- E. Rotemberg, S. Bennet, J. E. Smith. "A Trace Cache Microarchitecture and Evaluation". IEEE Transactions on Computers, Special Issue on Cache Memory, vol. 42, no. 2, February 1999, pp. 111--120. Google ScholarDigital Library
- N. P. Jouppi "Improving Direct-mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers". Proceedings of 17th ISCA, Seattle, WA, USA, pp. 364--373, June 1990. Google ScholarDigital Library
- N. Topham, A. Gonzáles "Randomized Cache Placement for Eliminating Conflicts" IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 185--192. Google ScholarDigital Library
- A. González, M. Valero, N. Topham and J. Parcerisa, "Eliminating Cache Conflict Misses Through XOR-Based Placement Functions", Proc. International Conference on Supercomputing, Vienna, Austria, July 1997, pp.76--83. Google ScholarDigital Library
- J. Sánchez, A. González, "A Locality Sensitive Multi-Module Cache with Explicit Management", Proc. of the ACM Int. Conf. on Supercomputing (ICS-99), Rhodes (Greece), June 1999, pp. 51--59. Google ScholarDigital Library
- V. Milutinovic, M. Valero, "Guest Editors' IntroductionCache Memory and Related Problems: Enhancing and Exploiting the Locality", ". IEEE Transactions on Computers, Vol: 48, No. 2, February 1999, pp. 97--99. Google ScholarDigital Library
- M. Kandemir, J. Ramanujam, A. Choudhary. "Improving Cache Locality by a Combination of Loop and Data Transformations". IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 159--167. Google ScholarDigital Library
- P. Panda, H. Nakamura, N. Dutt, A. Nicolau. "Augmenting Loop Tiling with Data Alignment for Improved Cache Performance". IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 142--149. Google ScholarDigital Library
- S. McFarling, "Procedure Merging with Instruction Caches", ACM SIGPLAN'91 Conference on Programming Language Design and Implementation, Toronto, Ontario, Canada, June 26--28, 1991, pp.71--79. Google ScholarDigital Library
- J. Kalamatianos, A. Khalafi, D. Kaeli, W. Meleis. "Analysis of Temporal-Based Program Behaviour for Improved Instruction Cache Performance". IEEE Transactions on Computers, Vol. 48, No. 2, February 1999, pp. 168--175. Google ScholarDigital Library
- J. Torrellas, R. Daigle. "Optimizing the Instruction Cache Performance of the Operating System". IEEE Transactions on Computers. Vol. 47, No. 12, Dec. 1998, pp.1363--1381. Google ScholarDigital Library
- K. Pettis and R. C. Hansen, "Profile Guided Code Positioning". Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, ACM, June 1990, pp. 16--27. Google ScholarDigital Library
- N. Gloy, T. Blackwell, M. D. Smith and B. Calder. "Procedure Placement Using Temporal Ordering Information". Proc. of the 30th IEEE Annual International Symposium on Microarchitecture (Micro'97), Los Alamitos, pp. 303--313. Google ScholarDigital Library
- C. A. Prete, M. Graziano, F. Lazzarini. "The ChARM Tool for Tuning Embedded Systems", IEEE Micro, July/august 1997, pp. 67--76. Google ScholarDigital Library
- S. Bartolini, C. A. Prete. "A Cache-Aware Program Transformation Technique Suitable for Embedded Systems", I&ST, ISSN:0950-5849, October 2002, Vol. 44, Num. 13, pp. 783--795.Google Scholar
- SPEC consortium.
http://www.spec.org. Google Scholar - Independent Jpeg Group.
http://www.ijg.org/. Google Scholar - MPEG Software Simulation Group.
http://www.mpeg.org/MPEG/MSSG/. Google Scholar - "JumpStart Reference Manual", Philips Ex-VLSI Technology Inc. 1998.Google Scholar
- A. Milenkovic, M. Milenkovic, N. Barnes. "A Performance Evaluation of Memory Hierarchy in Embedded Systems", Proceedings of the IEEE Southeastern Conference on System Theory, Morgantown WV, USA, March 2003.Google Scholar
- Intel #8482;Xscale ® Core - Developer's Manual, December 2000,
http://developer.intel.com. Google Scholar - P. Magarshack, P. G. Paulin. "System-on-Chip Beyond the Nanometer Wall", Proceedings of the 40th DAC Conference, Anaheim, CA, USA, June 2003. Google ScholarDigital Library
Recommendations
A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications
MEDEA '03: Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architectureThe ever-increasing gap between processor and memory speed is an issue also in embedded systems, because of the increased complexity of multimedia elaborations and the strict resource constraints of these devices.Profile-driven code optimization ...
Twig: Profile-Guided BTB Prefetching for Data Center Applications
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitectureModern data center applications have deep software stacks, with instruction footprints that are orders of magnitude larger than typical instruction cache (I-cache) sizes. To efficiently prefetch instructions into the I-cache despite large application ...
Pattern-driven prefetching for multimedia applications on embedded processors
Multimedia applications in general and video processing, such as the MPEG4 Visual stream decoders, in particular are increasingly popular and important workloads for future embedded systems. Due to the high computational requirements, the need for low ...
Comments