- 1.D. Callahan, K. Kennedy, and A. Porterfield, "Software Prefetching," In the Proceedings of the Fourth international Conference on Architectural Support for Programming Languages and Operating Systems, April 1991. Google ScholarDigital Library
- 2.T.C. Mowry, M.S. Lam, and A. Gupta, "Design and evaluation of a compiler algorithm for prefetching," In the Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1992. Google ScholarDigital Library
- 3.A.C. Klaiber and H.M. Levy, "An Architecture for Software-Controlled Data Prefetching," In the Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991. Google ScholarDigital Library
- 4.A.J. Smith, "Cache Memories," ACM Computing Surveys, vol. 18, num. 3, September 1982. Google ScholarDigital Library
- 5.N.P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," In the Proceedings of the 17th Annual International Symposium on Computer Architecture, May 1990. Google ScholarDigital Library
- 6.J.L. Baer and T.F. Chen, "An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty," In the Proceedings of Supercomputing, 1991. Google ScholarDigital Library
- 7.J.W.C. Fu and J.H.Patel, "Stride directed prefetching in scalar processors," In the Proceedings of the 25th International Symposium on Microarchitecture, December 1992. Google ScholarDigital Library
- 8.R. Bianchini and T.J. LeBlanc, "A Preliminary Evaluation of Cache-Miss-Initiated Prefetching Techniques in Scalable Multiprocessors," University of Rochester Computer Science Department Technical Report 515, May 1994.Google Scholar
- 9.T.F. Chert, "An Effective Programmable Prefetch Engine for On- Chip Caches," In the Proceedings of the 28th International Symposium on Microarchitecture, 1995. Google ScholarDigital Library
- 10.J. Pierce and T. Mudge, "Wrong-Path Instruction Prefetching," In the Proceedings of the 29th International Symposium on Microarchitecture, 1996. Google ScholarDigital Library
- 11.M. Lipasti, W. Schmidt, S. Kunkel, and R. Roediger, "SPAID: Software Prefetching in Pointer- and Call-Intensive Environments,'' In the Proceedings of the 28th International Symposium on Microarchitecture, 1995. Google ScholarDigital Library
- 12.C.K. Luk and T.C. Mowry, "Compiler-Based Prefetching for Recursive Data Structures," In the Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996. Google ScholarDigital Library
- 13.A. Eustace and A. Srivastava, "ATOM: A Flexible Interface for Building High Performance Program Analysis Tools," Digital Equipment Corporation Western Research Laboratory Technical Note TN-44, July 1994.Google Scholar
- 14.D. Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization,'' In the Proceedings of the 8th International Symposium on Computer Architecture, May 1981. Google ScholarDigital Library
- 15.J. Dundas and T. Mudge, "Using Stall Cycles to improve Microprocessor Performance," University of Michigan Department of Electrical Engineering and Computer Science Technical Report CSE-TR-301-96, September 1996.Google Scholar
Index Terms
- Improving data cache performance by pre-executing instructions under a cache miss
Recommendations
Author retrospective improving data cache performance by pre-executing instructions under a cache miss
ACM International Conference on Supercomputing 25th Anniversary VolumeThe paper introduces and evaluates a technique, referred to as runahead, that prefetches instructions and data into the L1 caches on cache misses. The results of the experiments reported in the paper show that the CPI of a simple in-order pipeline could ...
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Special Issue: Proceedings of the 17th annual international symposium on Computer ArchitectureProjections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on ...
Comments