ABSTRACT
Prefetching data ahead of use has the potential to tolerate the grow ing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisti cated prefetching techniques have been automated for limited domains, such as scientific codes that access dense arrays in loop nests, a similar level of success has eluded general-purpose pro grams, especially pointer-chasing codes written in languages such as C and C++. We address this problem by describing, implementing and evaluating a dynamic prefetching scheme. Our technique runs on stock hardware, is completely automatic, and works for general-purpose programs, including pointer-chasing codes written in weakly-typed languages, such as C and C++. It operates in three phases. First, the profiling phase gathers a temporal data reference profile from a running program with low-overhead. Next, the profiling is turned off and a fast analysis algorithm extracts hot data streams, which are data reference sequences that frequently repeat in the same order, from the temporal profile. Then, the system dynamically injects code at appropriate program points to detect and prefetch these hot data streams. Finally, the process enters the hibernation phase where no profiling or analysis is performed, and the program continues to execute with the added prefetch instructions. At the end of the hibernation phase, the program is de-optimized to remove the inserted checks and prefetch instructions, and control returns to the profiling phase. For long-running programs, this profile, analyze and optimize, hibernate, cycle will repeat multiple times. Our initial results from applying dynamic prefetching are promising, indicating overall execution time improvements of 5.19% for several memory-performance-limited SPECint2000 benchmarks running their largest (ref) inputs.
- M. Annavaram, J. Patel, and E. Davidson. "Data prefetching by dependence graph precomputation."In International Symposium on Computer Architecture (ISCA), 2001]] Google ScholarDigital Library
- M. Arnold et al. "Adaptive optimization in the Jalapeno JVM", In Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2000]] Google ScholarDigital Library
- M. Arnold, and B. Ryder. "A Framework for Reducing the Cost of Instrumented Code." In ACM SIGPLAN'01 Conference on Programming Languages Design and Implementation (PLDI), 2001]] Google ScholarDigital Library
- V. Bala, E. Duesterwald, and S. Banerjia. "Dynamo: A transpar¿ent dynamic optimization system." In ACM SIGPLAN'00 Conference on Programming Languages Design and Implementation (PLDI), 2000]] Google ScholarDigital Library
- B. Cahoon, and K. McKinley. "Data flow analysis for software prefetching linked data structures in Java." In International Conference on Parallel Architectures and Compilation Tech¿niques (PACT), 2001]] Google ScholarDigital Library
- M. Charney, and A. Reeves. "Generalized correlation based hardware prefetching." Tech report EE-CEG-95-1, Cornell University, 1995]]Google Scholar
- T. Chen, and J. Baer." Reducing memory latency via non-blocking and prefetching caches."In Architectural Support for Programming Languages and Operating Systems (ASPLOS),1992]] Google ScholarDigital Library
- T.M. Chilimbi. "Efficient Representations and Abstractions for Quantifying and Exploiting Data Reference Locality." In Proceedings of the ACM SIGPLAN'01 Conference on Program¿ming Language Design and Implementation, June 2001]] Google ScholarDigital Library
- T. M. Chilimbi, and J. R. Larus. "Using generational garbage collection to implement cache-conscious data placement." In Proceedings of the 1998 International Symposium on Memory Management, Oct. 1998]] Google ScholarDigital Library
- T. M. Chilimbi. "On the stability of temporal data reference profiles." In International Conference on Parallel Architectures and Compilation Techniques (PACT), 2001]] Google ScholarDigital Library
- M. Cierniak, G. Lueh, and J. Stichnoth. "Practicing JUDO: Java under dynamic optimizations." In ACM SIGPLAN'00 Conference on Programming Languages Design and Implementation (PLDI), 2000]] Google ScholarDigital Library
- R. Cooksey, D. Colarelli, and D. Grunwald, "Content-based prefetching: Initial results", In Workshop on Intelligent Memory Systems, 2000]] Google ScholarDigital Library
- D. Deaver, R. Gorton, and N. Rubin, "Wiggins/Redstone: An online program specializer.", In Hot Chips, 1999]]Google Scholar
- T. Harris. "Dynamic adaptive pre-tenuring." In International Symposium on Memory Management (ISMM), 2000]] Google ScholarDigital Library
- M. Hirzel and T. Chilimbi. " Bursty Tracing: A Framework for Low-Overhead Temporal Profiling", In Workshop on Feed'back-Directed and Dynamic Optimizations (FDDO), 2001]]Google Scholar
- D. Joseph and D. Grunwald. " Prefetching using Markov pre¿dictors", In International Symposium on Computer Architec¿ture (ISCA), 1997]] Google ScholarDigital Library
- N. Jouppi. "Improving direct-mapped cache performance by the addition of a small fully associative cache and prefetch buff¿ers", In International Symposium on Computer Architecture (ISCA), 1990]] Google ScholarDigital Library
- M. Karlsson, F. Dahlgren, and P. Stenstrom. "A Prefetching Technique for Irregular Accesses to Linked Data Structures, In High Performance Computer Architectures (HPCA), 1999]]Google Scholar
- T. Kistler and M. Franz. "Automated data-member layout of heap objects to improve memory-hierarchy performance." In Transactions on Programming Languages and Systems (TO'PLAS), 2000]] Google ScholarDigital Library
- A. Klaiber and H. Levy. "An architecture for software-con¿trolled data prefetching." In International Symposium on Com¿puter Architecture (ISCA), 1991]] Google ScholarDigital Library
- J. R. Larus. "Whole program paths." In Proceedings of the ACM SIGPLAN'99 Conference on Programming Language Design and Implementation, pages 259-269, May 1999]] Google ScholarDigital Library
- C. K. Luk, and T. Mowry. "Compiler-based prefetching for re¿cursive data structures." In Architectural Support for Program¿ming Languages and Operating Systems (ASPLOS), 1996]] Google ScholarDigital Library
- C. G. Nevill-Manning and I. H. Witten. "Linear-time, incre¿mental hierarchy inference for compression." In Proceedings of the Data Compression Conference (DCC'97), 1997]] Google ScholarDigital Library
- T. Mowry, M. Lam, and A. Gupta. "Design and Analysis of a Compiler Algorithm for Prefetching.", In Architectural Support for Programming Languages and Operating Systems (ASP¿LOS), 1992]] Google ScholarDigital Library
- M. Paleczny, C. Vick, and C. Click. "The Java HotSpot server compiler.", In USENIX Java Virtual Machine Research and Technology Symposium (JVM), 2001]] Google ScholarDigital Library
- A. Roth, A. Moshovos, and G. Sohi. "Dependence based prefetching for linked data structures." In Architectural Support for Programming Languages and Operating Systems (ASP¿LOS), 1998]] Google ScholarDigital Library
- A. Roth and G. Sohi. "Effective jump pointer prefetching for linked data structures." In International Symposium on Com¿puter Architecture (ISCA), 1999]] Google ScholarDigital Library
- S. Rubin, R. Bodik, and T. Chilimbi. "An Efficient Profile-Analysis Framework for Data-Layout Optimizations." In Prin¿ciples of Programming Languages, POPL'02, Jan 2002]] Google ScholarDigital Library
- R. Saavedra and D. Park. "Improving the effectiveness of soft¿ware prefetching with adaptive execution." In International Conference on Parallel Architectures and Compilation Tech¿niques (PACT), 1996]] Google ScholarDigital Library
- T. Sherwood and B. Calder. "Automated design of finite state machine predictors for customized processors." In Internation¿al Symposium on Computer Architecture (ISCA), 2001]] Google ScholarDigital Library
- A. Srivastava and A. Eustace. "ATOM: A system for building customized program analysis tools." In Proceedings of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation, pages 196-205, May 1994]] Google ScholarDigital Library
- A. Srivastava, A. Edwards, and H. Vo. "Vulcan: Binary trans¿formation in a distributed environment.", In Microsoft Re'search Tech Report, MSR-TR-2001-50, 2001]]Google Scholar
- A. Stoutchinin et al. "Speculative prefetching of induction pointers." In International Conference on Compiler Construc¿tion (CC), 2001]] Google ScholarDigital Library
- D. Ung, and C. Cifuentes."Opimising hot paths in a dynamic binary translator."In Workshop on Binary Translation, 2000]]Google Scholar
- S. Vander Wiel, and D. Lilja. "Data prefetch mechanisms", In¿ACM Computing Surveys, 2000]] Google ScholarDigital Library
Index Terms
- Dynamic hot data stream prefetching for general-purpose programs
Recommendations
Dynamic hot data stream prefetching for general-purpose programs
Prefetching data ahead of use has the potential to tolerate the grow ing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisti cated prefetching techniques have been automated for limited ...
Increasing hardware data prefetching performance using the second-level cache
Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Stealth prefetching
Proceedings of the 2006 ASPLOS ConferencePrefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Comments