skip to main content
research-article
Open Access

Program locality analysis using reuse distance

Published:26 August 2009Publication History
Skip Abstract Section

Abstract

On modern computer systems, the memory performance of an application depends on its locality. For a single execution, locality-correlated measures like average miss rate or working-set size have long been analyzed using reuse distance—the number of distinct locations accessed between consecutive accesses to a given location. This article addresses the analysis problem at the program level, where the size of data and the locality of execution may change significantly depending on the input.

The article presents two techniques that predict how the locality of a program changes with its input. The first is approximate reuse-distance measurement, which is asymptotically faster than exact methods while providing a guaranteed precision. The second is statistical prediction of locality in all executions of a program based on the analysis of a few executions. The prediction process has three steps: dividing data accesses into groups, finding the access patterns in each group, and building parameterized models. The resulting prediction may be used on-line with the help of distance-based sampling. When evaluated on fifteen benchmark applications, the new techniques predicted program locality with good accuracy, even for test executions that are orders of magnitude larger than the training executions.

The two techniques are among the first to enable quantitative analysis of whole-program locality in general sequential code. These findings form the basis for a unified understanding of program locality and its many facets. Concluding sections of the article present a taxonomy of related literature along five dimensions of locality and discuss the role of reuse distance in performance modeling, program optimization, cache and virtual memory management, and network traffic analysis.

References

  1. Adve, V. and Mellor-Crummey, J. 1998. Using integer sets for data-parallel program analysis and optimization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allen, R. and Kennedy, K. 2001. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Almasi, G., Cascaval, C., and Padua, D. 2002. Calculating stack distances efficiently. In Proceedings of the ACM SIGPLAN Workshop on Memory System Performance. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Almeida, V., Bestavros, A., Crovella, M., and de Oliveira, A. 1996. Characterizing reference locality in the WWW. In Proceedings of the International Conference on Parallel and Distributed Information Systems (PDIS). 92--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alon, N., Matias, Y., and Szegedy, M. 1996. The space complexity of approximating the frequency moments. In Proceedings of the ACM Symposium on Theory of Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arnold, M. and Ryder, B. G. 2001. A framework for reducing the cost of instrumented code. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Banerjee, U. 1988. Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Batson, A. P. and Madison, A. W. 1976. Measurements of major locality phases in symbolic reference strings. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems.Google ScholarGoogle Scholar
  9. Bennett, B. T. and Kruskal, V. J. 1975. LRU stack processing. IBM J. Resear. Devel. 353--357.Google ScholarGoogle Scholar
  10. Berg, E. and Hagersten, E. 2004. Statcache: A probabilistic approach to efficient and accurate data locality analysis. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 20--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Berg, E. and Hagersten, E. 2005. Fast data-locality profiling of native execution. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems. 169--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Beyls, K. and D'Hollander, E. 2002. Reuse distance-based cache hint selection. In Proceedings of the 8th International Euro-Par Conference. Paderborn, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Beyls, K. and D'Hollander, E. 2005. Generating cache hints for improved program efficiency. J. Syst. Archit. 51, 4, 223--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Beyls, K. and D'Hollander, E. 2006a. Discovery of locality-improving refactoring by reuse path analysis. In Proceedings of the High-Performance Computing and Communications Council. Springer. Lecture Notes in Computer Science, vol. 4208. 220--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Beyls, K. and D'Hollander, E. 2006b. Intermediately executed code is the key to find refactorings that improve temporal data locality. In Proceedings of the ACM Conference on Computing Frontiers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bunt, R. B. and Murphy, J. M. 1984. Measurement of locality and the behaviour of programs. Comput. J. 27, 3, 238--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Burke, M. and Cytron, R. 1986. Interprocedural dependence analysis and parallelization. In Proceedings of the SIGPLAN Symposium on Compiler Construction. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Calder, B., Krintz, C., John, S., and Austin, T. 1998. Cache-conscious data placement. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Callahan, D., Cocke, J., and Kennedy, K. 1988a. Analysis of interprocedural side effects in a parallel programming environment. J. Paral. Distrib. Comput. 5, 5, 517--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Callahan, D., Cocke, J., and Kennedy, K. 1988b. Estimating interlock and improving balance for pipelined machines. J. Paral. Distrib. Comput. 5, 4, 334--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Carr, S. and Kennedy, K. 1994. Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. Program. Lang. Syst. 16, 6, 1768--1810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Cascaval, C. and Padua, D. A. 2003. Estimating cache misses and locality using stack distances. In Proceedings of the International Conference on Supercomputing. San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High-Performance Computer Architecture. 340--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chatterjee, S., Parker, E., Hanlon, P. J., and Lebeck, A. R. 2001. Exact analysis of the cache behavior of nested loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chen, F., Jiang, S., and Zhang, X. 2005. CLOCK-Pro: An effective improvement of the CLOCK replacement. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Cheng, R. and Ding, C. 2005. Measuring temporal locality variation across program inputs. Tech. rep. TR 875, Department of Computer Science, University of Rochester.Google ScholarGoogle Scholar
  27. Chilimbi, T. M. 2001a. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Chilimbi, T. M. 2001b. On the stability of temporal data reference profiles. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chilimbi, T. M., Hill, M. D., and Larus, J. R. 1999. Cache-conscious structure layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chilimbi, T. M. and Hirzel, M. 2002. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Cierniak, M. and Li, W. 1995. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Cocke, J. and Kennedy, K. 1974. Profitability computations on program flow graphs. Tech. rep. RC 5123, IBM.Google ScholarGoogle Scholar
  33. Das, R., Uysal, M., Saltz, J., and Hwang, Y.-S. 1994. Communication optimizations for irregular scientific computations on distributed memory architectures. J. Paral. Distrib. Comput. 22, 3, 462--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Datar, M., Gionis, A., Indyk, P., and Motwani, R. 2002. Maintaining stream statistics over sliding windows. SIAM J. Comput. 31, 6, 1794--1813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Denning, P. 1980. Working sets past and present. IEEE Trans. Softw. Engin. 6, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ding, C. and Kennedy, K. 1999. Improving cache performance in dynamic applications through data and computation reorganization at runtime. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ding, C. and Kennedy, K. 2004. Improving effective bandwidth through compiler enhancement of global cache reuse. J. Paral. Distrib. Comput. 64, 1, 108--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ding, C. and Zhong, Y. 2002. Compiler-directed runtime monitoring of program data access. In Proceedings of the ACM SIGPLAN Workshop on Memory System Performance. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Eeckhout, L., Vandierendonck, H., and Bosschere, K. D. 2002. Workload design: Selecting representative program-input pairs. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Fang, C., Carr, S., Onder, S., and Wang, Z. 2005. Instruction-based memory distance analysis and its application to optimization. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ferrante, J., Sarkar, V., and Thrash, W. 1991. On estimating and enhancing cache effectiveness. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, Eds. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Flajolet, P. and Martin, G. 1983. Probabilistic counting. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ghosh, S., Martonosi, M., and Malik, S. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. Program. Lang. Syst. 21, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Gu, X., Christopher, I., Bai, T., Zhang, C., and Ding, C. 2009. A component model of spatial locality. In Proceedings of the International Symposium on Memory Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Han, H. and Tseng, C.-W. 2006. Exploiting locality for irregular scientific codes. IEEE Trans. Paral. Distrib. Syst. 17, 7, 606--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Havlak, P. and Kennedy, K. 1991. An implementation of interprocedural bounded regular section analysis. IEEE Trans. Paral. Distrib. Syst. 2, 3, 350--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Hill, M. D. and Smith, A. J. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. 38, 12, 1612--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Hsu, W., Chen, H., Yew, P. C., and Chen, D. 2002. On the predictability of program behavior using different input data sets. In Proceedings of the 6th Workshop on Interaction Between Compilers and Computer Architectures (INTERACT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jiang, S. and Zhang, X. 2002. LIRS: An efficient low inter-reference recency set replacement to improve buffer cache performance. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jiang, Y., Shen, X., Chen, J., and Tripathi, R. 2008. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. 220--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kandemir, M. T. 2005. Improving whole-program locality using intra-procedural and inter-procedural transformations. J. Paral. Distrib. Comput. 65, 5, 564--582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Kelly, T., Cohen, I., Goldszmidt, M., and Keeton, K. 2004. Inducing models of black-box storage arrays. Tech. rep. HPL-2004-108, HP Laboratories Palo Alto, CA.Google ScholarGoogle Scholar
  53. Kelly, W., Maslov, V., Pugh, W., Rosser, E., Shpeisman, T., and Wonnacott, D. 1996. The Omega Library Interface Guide. Tech. rep., Department of Computer Science, University of Maryland, College Park. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Kelsey, K., Bai, T., and Ding, C. 2009. Fast track: A software system for speculative optimization. In Proceedings of the International Symposium on Code Generation and Optimization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Kim, Y. H., Hill, M. D., and Wood, D. A. 1991. Implementing stack simulation for highly-associative memories. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 212--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. KleinOsowski, A. and Lilja, D. J. 2002. MinneSPEC: A new SPEC benchmark workload for simulation-based computer architecture research. Comput. Archit. Lett. 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Knobe, K. and Sarkar, V. 1998. Array SSA form and its use in parallelization. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Knuth, D. 1971. An empirical study of FORTRAN programs. Softw. Pract. Exper. 1, 105--133.Google ScholarGoogle ScholarCross RefCross Ref
  59. Kodukula, I., Ahmed, N., and Pingali, K. 1997. Data-centric multi-level blocking. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Li, Z., Yew, P., and Zhu, C. 1990. An efficient data dependence analysis for parallelizing compilers. IEEE Trans. Paral. Distrib. Syst. 1, 1, 26--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Liu, J., Chen, H., Yew, P.-C., and Hsu, W.-C. 2004. Design and implementation of a lightweight dynamic optimization system. J. Instruct.-Level Paral. 6.Google ScholarGoogle Scholar
  62. Marin, G. and Mellor-Crummey, J. 2004. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Marin, G. and Mellor-Crummey, J. 2005. Scalable cross-architecture predictions of memory hierarchy response for scientific applications. In Proceedings of the Symposium of the Las Alamos Computer Science Institute.Google ScholarGoogle Scholar
  64. Mattson, R. L., Gecsei, J., Slutz, D., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. McKinley, K. S., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. 18, 4, 424--453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Mellor-Crummey, J., Whalley, D., and Kennedy, K. 2001. Improving memory hierarchy performance for irregular applications. Int. J. Paral. Program. 29, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Olken, F. 1981. Efficient methods for calculating the success function of fixed space replacement policies. Tech. rep. LBL-12370, Lawrence Berkeley Laboratory.Google ScholarGoogle Scholar
  68. Petrank, E. and Rawitz, D. 2002. The hardness of cache conscious data placement. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Porterfield, A. 1989. Software methods for improvement of cache performance. Ph.D. thesis, Department of Computer Science, Rice University.Google ScholarGoogle Scholar
  70. Rawlings, J. O. 1988. Applied Regression Analysis: A Research Tool. Wadsworth and Brooks.Google ScholarGoogle Scholar
  71. Rothberg, E., Singh, J. P., and Gupta, A. 1993. Working sets, cache sizes, and node granularity issues for large-scale multiprocessors. In Proceedings of the International Symposium on Computer Architecture. 14--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Seidl, M. L. and Zorn, B. G. 1998. Segregating heap objects by reference behavior and lifetime. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Shen, X., Gao, Y., Ding, C., and Archambault, R. 2005. Lightweight reference affinity analysis. In Proceedings of the 19th ACM International Conference on Super-Computing. 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Shen, X., Shaw, J., Meeker, B., and Ding, C. 2007. Locality approximation using time. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 55--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Shen, X., Zhang, C., Ding, C., Scott, M., Dwarkadas, S., and Ogihara, M. 2007. Analysis of input-dependent program behavior using active profiling. In Proceedings of The 1st Workshop on Experimental Computer Science. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Shen, X., Zhong, Y., and Ding, C. 2004a. Locality phase prediction. In Proceedings of the International Conference on Architectual Support for Programming Languages and Operating Systems. 165--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Shen, X., Zhong, Y., and Ding, C. 2004b. Phase-based miss rate prediction. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing.Google ScholarGoogle Scholar
  78. Shen, X., Zhong, Y., and Ding, C. 2007. Predicting locality phases for dynamic memory optimization. J. Paral. Distrib. Comput. 67, 7, 783--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Sleator, D. D. and Tarjan, R. E. 1985. Self adjusting binary search trees. J. ACM 32, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Smaragdakis, Y., Kaplan, S., and Wilson, P. 2003. The EELRU adaptive replacement algorithm. Perform. Eval. 53, 2, 93--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Smith, A. J. 1976. On the effectiveness of set associative page mapping and its applications in main memory management. In Proceedings of the 2nd International Conference on Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. So, B., Hall, M. W., and Diniz, P. C. 2002. A compiler approach to fast hardware design space exploration in FPGA-based systems. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Song, Y. and Li, Z. 1999. New tiling techniques to improve cache temporal locality. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Srivastava, A. and Eustace, A. 1994. ATOM: A system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Strout, M. M., Carter, L., and Ferrante, J. 2003. Compile-time composition of runtime data and iteration reorderings. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Sugumar, R. A. and Abraham, S. G. 1993. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. Tech. rep., University of Michigan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Suh, G. E., Devadas, S., and Rudolph, L. 2001. Analytical cache models with applications to cache partitioning. In Proceedings of the International Conference on Super-Computing. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Thabit, K. O. 1981. Cache management by the compiler. Ph.D. thesis, Department of Computer Science, Rice University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Thompson, J. G. and Smith, A. J. 1989. Efficient (stack) algorithms for analysis of write-back and sector memories. ACM Trans. Comput. Syst. 7, 1, 78--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Triolet, R., Irigoin, F., and Feautrier, P. 1986. Direct parallelization of CALL statements. In Proceedings of the SIGPLAN Symposium on Compiler Construction. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Wall, D. W. 1991. Predicting program behavior using real or estimated profiles. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Wang, W. and Baer, J.-L. 1991. Efficient trace-driven simulation methods for cache performance analysis. ACM Trans. Comput. Syst. 9, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Wolf, M. E. and Lam, M. 1991. A data locality optimizing algorithm. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Wolfe, M. J. 1996. High-Performance Compilers for Parallel Computing. Addison-Wesley, Redwood City, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Wonnacott, D. 2002. Achieving scalable locality with time skewing. Int. J. Paral. Program. 30, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Xue, J. and Vera, X. 2004. Efficient and accurate analytical modeling of whole-program data cache behavior. IEEE Trans. Comput. 53, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Yang, T., Berger, E. D., Kaplan, S. F., and Moss, J. E. B. 2006. Cramm: Virtual memory support for garbage-collected applications. In Proceedings of the Symposium on Operating Systems Design and Implementation. 103--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Yi, Q., Adve, V., and Kennedy, K. 2000. Transforming loops to recursion for multi-level memory hierarchies. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Zhang, C., Ding, C., Ogihara, M., Zhong, Y., and Wu, Y. 2006. A hierarchical model of data locality. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Zhao, P., Cui, S., Gao, Y., Silvera, R., and Amaral, J. N. 2007. Forma: A framework for safe automatic array reshaping. ACM Trans. Program. Lang. Syst. 30, 1, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Zhong, Y. and Chang, W. 2008. Sampling-based program locality approximation. In Proceedings of the International Symposium on Memory Management. 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Zhong, Y., Ding, C., and Kennedy, K. 2002. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Languages, Compilers, and Runtime Systems for Scalable Computers.Google ScholarGoogle Scholar
  103. Zhong, Y., Dropsho, S. G., Shen, X., Studer, A., and Ding, C. 2007. Miss rate prediction across program inputs and cache configurations. IEEE Trans. Comput. 56, 3, 328--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Zhong, Y., Orlovich, M., Shen, X., and Ding, C. 2004. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Zhou, P., Pandey, V., Sundaresan, J., Raghuraman, A., Zhou, Y., and Kumar, S. 2004. Dynamic tracking of page miss ratio curve for memory management. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Zhou, Y., Chen, P. M., and Li, K. 2001. The multi-queue replacement algorithm for second-level buffer caches. In Proceedings of the USENIX Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Program locality analysis using reuse distance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 31, Issue 6
      August 2009
      162 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/1552309
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 August 2009
      • Accepted: 1 May 2007
      • Revised: 1 September 2005
      • Received: 1 September 2004
      Published in toplas Volume 31, Issue 6

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader