article

Predicting whole-program locality through reuse distance analysis

Authors:
Chen Ding

University of Rochester, Rochester, New York

University of Rochester, Rochester, New York
View Profile

,
Yutao Zhong

University of Rochester, Rochester, New York

University of Rochester, Rochester, New York
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 38 Issue 5May 2003pp 245–257https://doi.org/10.1145/780822.781159

Published:09 May 2003Publication History

ACM SIGPLAN Notices

Abstract

Profiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data reuse may reveal global patterns not apparent in short-distance reuses or local control flow. However, the analysis must meet two requirements to be useful. The first is efficiency. It needs to analyze all accesses to all data elements in full-size benchmarks and to measure distance of any length and in any required precision. The second is predication. Based on a few training runs, it needs to classify patterns as regular and irregular and, for regular ones, it should predict their (changing) behavior for other inputs. In this paper, we show that these goals are attainable through three techniques: approximate analysis of reuse distance (originally called LRU stack distance), pattern recognition, and distance-based sampling. When tested on 15 integer and floating-point programs from SPEC and other benchmark suites, our techniques predict with on average 94% accuracy for data inputs up to hundreds times larger than the training inputs. Based on these results, the paper discusses possible uses of this analysis.

References

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, 2001. Google ScholarDigital Library
G. Almasi, C. Cascaval, and D. Padua. Calculating stack distances efficiently. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Berlin, Germany, 2002. Google ScholarDigital Library
M. Arnold and B. G. Ryder. A framework for reducing the cost of instrumented code. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, 2001. Google ScholarDigital Library
R. Balasubramonian, D. Albonesi, A. Buyuktos, and S. Dwarkadas. Dynamic memory hierarchy performance and energy optimization. In Proceedings of the 27th Annual International Symposium on Computer Architecture, 2000.Google Scholar
V. Balasundaram, G. Fox, K. Kennedy, and U. Kremer. A static performance estimator to guide data partitioning decisions. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Williamsburg, VA, Apr. 1991. Google ScholarDigital Library
B. T. Bennett and V. J. Kruskal. LRU stack processing. IBM Journal of Research and Development, pages 353--357, 1975.Google ScholarDigital Library
K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, 2001.Google Scholar
K. Beyls and E. D'Hollander. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference, Paderborn, Germany, 2002. Google ScholarDigital Library
D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. Journal of Parallel and Distributed Computing, 5(4):334--358, Aug. 1988. Google ScholarDigital Library
S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 16(6):1768--1810, 1994. Google ScholarDigital Library
G. C. Cascaval. Compile-time Performance Prediction of Scientific Programs. PhD thesis, University of Illinois at Urbana-Champaign, 2000. Google ScholarDigital Library
T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, 2001. Google ScholarDigital Library
T. M. Chilimbi. On the stability of temporal data reference profiles. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, 2001. Google ScholarDigital Library
T. M. Chilimbi and M. Hirzel. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, 2002. Google ScholarDigital Library
J. Cocke and K. Kennedy. Profitability computations on program flow graphs. Technical Report RC 5123, IBM, 1974.Google Scholar
R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, Sept. 1994. Google ScholarDigital Library
C. Ding. Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse. PhD thesis, Dept. of Computer Science, Rice University, January 2000. Google ScholarDigital Library
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999. Google ScholarDigital Library
C. Ding and Y. Zhong. Compiler-directed run-time monitoring of program data access. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Berlin, Germany, 2002. Google ScholarDigital Library
L. Eeckhout, H. Vandierendonck, and K. D. Bosschere. Workload design: selecting representative program-input pairs. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Charlottesville, Virginia, 2002. Google ScholarDigital Library
H. Han and C. W. Tseng. Locality optimizations for adaptive irregular scientific codes. Technical report, Department of Computer Science, University of Maryland, College Park, 2000.Google Scholar
M. D. Hill. Aspects of cache memory and instruction buffer performance. PhD thesis, University of California, Berkeley, November 1987. Google ScholarDigital Library
W. Hsu, H. Chen, P. C. Yew, and D. Chen. On the predictability of program behavior using different input data sets. In Proceedings of the Sixth Workshop on Interaction Between Compilers and Computer Architectures (INTERACT), 2002. Google ScholarDigital Library
S. Jiang and X. Zhang. LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Marina Del Rey, California, 2002. Google ScholarDigital Library
Y. H. Kim, M. D. Hill, and D. A. Wood. Implementing stack simulation for highly-associative memories. In Proc. ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 212--213, May 1991. Google ScholarDigital Library
K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In Proceedings of Symposium on Principles of Programming Languages, San Diego, CA, January 1998. Google ScholarDigital Library
D. Knuth. An empirical study of FORTRAN programs. Software---Practice and Experience, 1:105--133, 1971.Google ScholarCross Ref
T. Lafage and A. Seznec. Choosing representative slices of program execution for microarchitecture simulations: a preliminary application to the data stream. In Workload Characterization of Emerging Applications, Kluwer Academic Publishers, 2000. Google ScholarDigital Library
Z. Li, J. Gu, and G. Lee. An evaluation of the potential benefits of register allocation for array references. In Workshop on Interaction between Compilers and Computer Architectures in conjuction with the HPCA-2, San Jose, California, February 1996.Google Scholar
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.Google ScholarDigital Library
K. S. McKinley and O. Temam. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks. ACM Transactions on Computer Systems, 17(4):288--336, 1999. Google ScholarDigital Library
J. Mellor-Crummey, R. Fowler, and D. B. Whalley. Tools for application-oriented performance tuning. In Proceedings of the 15th ACM International Conference on Supercomputing, Sorrento, Italy, 2001. Google ScholarDigital Library
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001. Google ScholarDigital Library
F. Olken. Efficient methods for calculating the success function of fixed space replacement policies. Technical Report LBL-12370, Lawrence Berkeley Laboratory, 1981.Google Scholar
V. Phalke and B. Gopinath. An inter-reference gap model for temporal locality in program behavior. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Ottawa, Ontario, Canada, 1995. Google ScholarDigital Library
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, 2002. Google ScholarDigital Library
D. D. Sleator and R. E. Tarjan. Self adjusting binary search trees. Journal of the ACM, 32(3), 1985. Google ScholarDigital Library
B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in FPGA-based systems. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Berlin, Germany, 2002. Google ScholarDigital Library
A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Orlando, Florida, June 1994. Google ScholarDigital Library
M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, 2003. Google ScholarDigital Library
R. A. Sugumar and S. G. Abraham. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. Technical report, University of Michigan, 1993.Google ScholarDigital Library
K. O. Thabit. Cache Management by the Compiler. PhD thesis, Dept. of Computer Science, Rice University, 1981. Google ScholarDigital Library
D. W. Wall. Predicting program behavior using real or estimated profiles. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Toronto, Canada, 1991. Google ScholarDigital Library
Y. Zhong, C. Ding, and K. Kennedy. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, Washington DC, March 2002.Google Scholar
Y. Zhou, P. M. Chen, and K. Li. The multi-queue replacement algorithm for second level buffer caches. In Proceedings of USENIX Technical Conference, 2001. Google ScholarDigital Library

Index Terms

Predicting whole-program locality through reuse distance analysis
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Predicting whole-program locality through reuse distance analysis
PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation

Profiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data ...
Read More
Program locality analysis using reuse distance

On modern computer systems, the memory performance of an application depends on its locality. For a single execution, locality-correlated measures like average miss rate or working-set size have long been analyzed using reuse distance—the number of ...
Read More
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04: Proceedings of the 2004 workshop on Memory system performance

Feedback-directed optimization has become an increasingly important tool in designing and building optimizing compilers. Recently, reuse-distance analysis has shown much promise in predicting the memory behavior of programs over a wide range of data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 38, Issue 5
May 2003
349 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/780822
Issue’s Table of Contents
PLDI '03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
June 2003
360 pages
ISBN:1581136625
DOI:10.1145/781131
General Chair:
Ron Cytron
Washington University, USA
,
Program Chair:
Rajiv Gupta
University of Arizona, USA
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2003
Check for updates
Author Tags
data locality
pattern recognition
prediction
profiling
program locality
reuse distance
sampling
stack distance
training
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 243
  Total Citations
  View Citations
- 2,540
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predicting whole-program locality through reuse distance analysis

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Predicting whole-program locality through reuse distance analysis

Program locality analysis using reuse distance

Reuse-distance-based miss-rate prediction on a per instruction basis