- 1.W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. Automatic program transformations for virtual memory computers. Proc. of the 1979 National Computer Conference, pages 969-974, June 1979.Google ScholarCross Ref
- 2.J-L. Baer and T-F. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of Supercomputing '91, 1991. Google ScholarDigital Library
- 3.D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS Parallel Benchmarks. Technical Report RNR-91-002, NASA Ames Research Center, August 1991.Google Scholar
- 4.D. Callahan, K. Kennedy, and A. Porterfield. Software prefetching. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 40--52, April 1991. Google ScholarDigital Library
- 5.W. Y. Chen, S. A. Mahlke, P. P. Chang, and W. W. Hwu'. Data access microarchitecmres for superscalar processors with compiler-assisted data prefetching. In Proceedings of Microcomputing 24, 1991. Google ScholarDigital Library
- 6.R. P. Colwell, R. P. Nix, I. $. O'Donnell, D. B. Papworth, and P. K. Rodman. A vliw architecture for a trace scheduling compiler, in Proc. Second Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 180-192, Oct. 1987. Google ScholarCross Ref
- 7.J. C. Dehnert, P. Y.-T. Hsu, and J. P. Bratt. Overlapped loop support in the cydra 5. In Third international Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 26-38, April 1989. Google ScholarDigital Library
- 8.I. Ferrante, V. $arkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Fourth Workshop on Languages and Compilers for Parallel Computing, Aug 1991. Google ScholarDigital Library
- 9.K. Gallivan, W. Jalby, U. Meier, and A. Sameh. The impact of hierarchical memory systems on linear algebra algorithm design. Technical Report UIUCSRD 625, University of Illinios, 1987.Google Scholar
- 10.D. Gannon and W. Jalby. The influence of memory hierarchy on algorithm organization: Programming FFTs on a vector mulfiprocessor. In The Characteristics of Parallel Algorithms. MIT Press, 1987.Google Scholar
- 11.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5:587-616, 1988. Google ScholarDigital Library
- 12.G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.Google Scholar
- 13.E. Gomish, E. Granston, and A. Veidenbaum. Compiler- Directed Data Prefetching in Multiprocessors with Memory Hierarchies. In International Conference on Supercomputing, 1990. Google ScholarDigital Library
- 14.E. H. Gomish. Compile time analysis for data prefetching. Master's thesis, University of Illinois at Urbana-Champaign, December 1989.Google Scholar
- 15.A. Gupta, I. Hennessy, K. Gharachorloo, T. Mowry, and W-D. Weber. Comparative evaluation of latency reducing and tolerating techniques. In Proceedings of the 18th Annum International Symposium on Computer Architecture, pages 254-263, May 1991. Google ScholarDigital Library
- 16.A. C. Klaiber and H. M. Levy. Architecture for softwarecontrolled data prefetching. In Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 43- 63, May 1991. Google ScholarDigital Library
- 17.D. Kroft. Lockup-ffee instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual International Symposium on Computer Architecture, pages 81-85, 1981. Google ScholarDigital Library
- 18.M. S. Lain. Software pipelining: An effective scheduling technique for vliw machines. In Proc. ACM SIGPLAN 88 Conference on Programming Language Design and Implementation, pages 318-328, June 1988. Google ScholarDigital Library
- 19.M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63-74, April 1991. Google ScholarDigital Library
- 20.R. L. Lee. The Effectiveness of Caches and Data Prefetch Buffers in Large-Scale Shared Memory Multiprocessors. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, May 1987. Google ScholarDigital Library
- 21.A. C. McKeller and E. G. Coffman. The organization of matrices and matrix operations in a paged multiprogramming environment. CACM, 12(3)'153-165, 1969. Google ScholarDigital Library
- 22.T. Mowry and A. Gupta. Tolerating latency through softwarecontrolled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 12(2):87-106, 1991. Google ScholarDigital Library
- 23.A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Department of Computer Science, Rice University, May 1989. Google ScholarDigital Library
- 24.B. R. Rau and C. D. Glaeser. Some Scheduling Techniques and an Easily Sehedulable Horizontal Architecture for High Performance Scientific Computing. In Proceedings of the 14th Annual Workshop on Microprogramming, pages 183-198, October 1981. Google ScholarDigital Library
- 25.J. P. Singh, W-D. Weber, and A. Gupta. Splash: Stanford parallel applications for shared memory. Technical Report CSL- TR-91-469, Stanford University, April 1991. Google ScholarDigital Library
- 26.M. D. Smith. Tracing with pixie. Technical Report CSL-TR- 91-497, Stanford University, November 1991.Google Scholar
- 27.SPEC. The SPEC Benchmark Report. Waterside Associates, Fremont, CA, January 1990.Google Scholar
- 28.S.W.K. Tjiang and J. L. Hennessy. Sharlit: A tool for building optimizers. In SiGPLAN Conference on Programming Language Design and Implementation, 1992. Google ScholarDigital Library
- 29.M. E. Wolf and M. S. Lain. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30--44, June 1991. Google ScholarDigital Library
Index Terms
- Design and evaluation of a compiler algorithm for prefetching
Recommendations
Maintaining Cache Coherence through Compiler-Directed Data Prefetching
In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching to enforce cache coherence in large-scale distributed shared-memory (DSM) systems. TheCache Coherence With Data Prefetching(CCDP) scheme uses ...
Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching
Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems ...
Comments