Abstract
Profiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data reuse may reveal global patterns not apparent in short-distance reuses or local control flow. However, the analysis must meet two requirements to be useful. The first is efficiency. It needs to analyze all accesses to all data elements in full-size benchmarks and to measure distance of any length and in any required precision. The second is predication. Based on a few training runs, it needs to classify patterns as regular and irregular and, for regular ones, it should predict their (changing) behavior for other inputs. In this paper, we show that these goals are attainable through three techniques: approximate analysis of reuse distance (originally called LRU stack distance), pattern recognition, and distance-based sampling. When tested on 15 integer and floating-point programs from SPEC and other benchmark suites, our techniques predict with on average 94% accuracy for data inputs up to hundreds times larger than the training inputs. Based on these results, the paper discusses possible uses of this analysis.
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, 2001. Google ScholarDigital Library
- G. Almasi, C. Cascaval, and D. Padua. Calculating stack distances efficiently. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Berlin, Germany, 2002. Google ScholarDigital Library
- M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, 2001. Google ScholarDigital Library
- R. Balasubramonian, D. Albonesi, A. Buyuktos, and S. Dwarkadas. Dynamic memory hierarchy performance and energy optimization. In Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000.Google Scholar
- V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Williamsburg, VA, Apr. 1991. Google ScholarDigital Library
- B. T. Bennett and V. J. Kruskal. LRU stack processing. IBM Journal of Research and Development, pages 353--357, 1975.Google ScholarDigital Library
- K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, 2001.Google Scholar
- K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany, 2002. Google ScholarDigital Library
- D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. Journal of Parallel and Distributed Computing, 5(4):334--358, Aug. 1988. Google ScholarDigital Library
- S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 16(6):1768--1810, 1994. Google ScholarDigital Library
- G. C. Cascaval. Compile-time Performance Prediction of Scientific Programs. PhD thesis, University of Illinois at Urbana-Champaign, 2000. Google ScholarDigital Library
- T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, 2001. Google ScholarDigital Library
- T. M. Chilimbi. On the stability of temporal data reference profiles. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, 2001. Google ScholarDigital Library
- T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, 2002. Google ScholarDigital Library
- J. Cocke and K. Kennedy. Profitability computations on program flow graphs. Technical Report RC 5123, IBM, 1974.Google Scholar
- R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, Sept. 1994. Google ScholarDigital Library
- C. Ding. Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse. PhD thesis, Dept. of Computer Science, Rice University, January 2000. Google ScholarDigital Library
- C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999. Google ScholarDigital Library
- C. Ding and Y. Zhong. Compiler-directed run-time monitoring of program data access. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Berlin, Germany, 2002. Google ScholarDigital Library
- L. Eeckhout, H. Vandierendonck, and K. D. Bosschere. Workload design: selecting representative program-input pairs. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Charlottesville, Virginia, 2002. Google ScholarDigital Library
- H. Han and C. W. Tseng. Locality optimizations for adaptive irregular scientific codes. Technical report, Department of Computer Science, University of Maryland, College Park, 2000.Google Scholar
- M. D. Hill. Aspects of cache memory and instruction buffer performance. PhD thesis, University of California, Berkeley, November 1987. Google ScholarDigital Library
- W. Hsu, H. Chen, P. C. Yew, and D. Chen. On the predictability of program behavior using different input data sets. In Proceedings of the Sixth Workshop on Interaction Between Compilers and Computer Architectures (INTERACT), 2002. Google ScholarDigital Library
- S. Jiang and X. Zhang. LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Marina Del Rey, California, 2002. Google ScholarDigital Library
- Y. H. Kim, M. D. Hill, and D. A. Wood. Implementing stack simulation for highly-associative memories. In Proc. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 212--213, May 1991. Google ScholarDigital Library
- K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In Proceedings of Symposium on Principles of Programming Languages, San Diego, CA, January 1998. Google ScholarDigital Library
- D. Knuth. An empirical study of FORTRAN programs. Software---Practice and Experience, 1:105--133, 1971.Google ScholarCross Ref
- T. Lafage and A. Seznec. Choosing representative slices of program execution for microarchitecture simulations: a preliminary application to the data stream. In Workload Characterization of Emerging Applications, Kluwer Academic Publishers, 2000. Google ScholarDigital Library
- Z. Li, J. Gu, and G. Lee. An evaluation of the potential benefits of register allocation for array references. In Workshop on Interaction between Compilers and Computer Architectures in conjuction with the HPCA-2, San Jose, California, February 1996.Google Scholar
- R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.Google ScholarDigital Library
- K. S. McKinley and O. Temam. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks. ACM Transactions on Computer Systems, 17(4):288--336, 1999. Google ScholarDigital Library
- J. Mellor-Crummey, R. Fowler, and D. B. Whalley. Tools for application-oriented performance tuning. In Proceedings of the 15th ACM International Conference on Supercomputing, Sorrento, Italy, 2001. Google ScholarDigital Library
- J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001. Google ScholarDigital Library
- F. Olken. Efficient methods for calculating the success function of fixed space replacement policies. Technical Report LBL-12370, Lawrence Berkeley Laboratory, 1981.Google Scholar
- V. Phalke and B. Gopinath. An inter-reference gap model for temporal locality in program behavior. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Ottawa, Ontario, Canada, 1995. Google ScholarDigital Library
- T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, 2002. Google ScholarDigital Library
- D. D. Sleator and R. E. Tarjan. Self adjusting binary search trees. Journal of the ACM, 32(3), 1985. Google ScholarDigital Library
- B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in FPGA-based systems. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, 2002. Google ScholarDigital Library
- A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Orlando, Florida, June 1994. Google ScholarDigital Library
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, 2003. Google ScholarDigital Library
- R. A. Sugumar and S. G. Abraham. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. Technical report, University of Michigan, 1993.Google ScholarDigital Library
- K. O. Thabit. Cache Management by the Compiler. PhD thesis, Dept. of Computer Science, Rice University, 1981. Google ScholarDigital Library
- D. W. Wall. Predicting program behavior using real or estimated profiles. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Toronto, Canada, 1991. Google ScholarDigital Library
- Y. Zhong, C. Ding, and K. Kennedy. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, Washington DC, March 2002.Google Scholar
- Y. Zhou, P. M. Chen, and K. Li. The multi-queue replacement algorithm for second level buffer caches. In Proceedings of USENIX Technical Conference, 2001. Google ScholarDigital Library
Index Terms
- Predicting whole-program locality through reuse distance analysis
Recommendations
Predicting whole-program locality through reuse distance analysis
PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementationProfiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data ...
Program locality analysis using reuse distance
On modern computer systems, the memory performance of an application depends on its locality. For a single execution, locality-correlated measures like average miss rate or working-set size have long been analyzed using reuse distance—the number of ...
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04: Proceedings of the 2004 workshop on Memory system performanceFeedback-directed optimization has become an increasingly important tool in designing and building optimizing compilers. Recently, reuse-distance analysis has shown much promise in predicting the memory behavior of programs over a wide range of data ...
Comments