skip to main content
research-article
Open Access

Analytical modeling of cache behavior for affine programs

Published:27 December 2017Publication History
Skip Abstract Section

Abstract

Optimizing compilers implement program transformation strategies aimed at reducing data movement to or from main memory by exploiting the data-cache hierarchy. However, instead of attempting to minimize the number of cache misses, very approximate cost models are used, due to the lack of precise compile-time models for misses for hierarchical caches. The current state of practice for cache miss analysis is based on accurate simulation. However, simulation requires time proportional to the dataset/problem size, as well as the number of distinct cache configurations of interest to be evaluated.

This paper takes a fundamentally different approach, by focusing on polyhedral programs with static control flow. Instead of relying on costly simulation, a closed-form solution for modeling of misses in a set associative cache hierarchy is developed. This solution can enable program transformation choice at compile time to optimize cache misses. A tool implementing the approach has been developed and used for validation of the framework.

Skip Supplemental Material Section

Supplemental Material

analyticalmodeling.webm

webm

111.2 MB

References

  1. M. Adams. 2014. HPGMG: a benchmark for ranking high performance computing systems. (2014). https://www.hpgmg.org/Google ScholarGoogle Scholar
  2. A. Agarwal, J. Hennessy, and M. Horowitz. 1989. An Analytical Cache Model. ACM Transactions on Computer Systems (1989), 184ś215.Google ScholarGoogle Scholar
  3. N. Ahmed, N. Mateev, and K. Pingali. 2001. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. International Journal of Parallel Programming (2001), 493ś544.Google ScholarGoogle Scholar
  4. M. Alt, C. Ferdinand, F. Martin, and R. Wilhelm. 1996. Cache behavior prediction by abstract interpretation. In International Static Analysis Symposium (SAS’96). 52ś66. Google ScholarGoogle ScholarCross RefCross Ref
  5. W. Bao, C. Hong, S. Chunduri, S. Krishnamoorthy, N. Pouchet, F. Rastello, and P. Sadayappan. 2016a. Static and Dynamic Frequency Scaling on Multicore CPUs. ACM Transactions on Architecture and Code Optimization (2016), 1ś26.Google ScholarGoogle Scholar
  6. W. Bao, S. Krishnamoorthy, L. Pouchet, F. Rastello, and P. Sadayappan. 2016b. PolyCheck: Dynamic Veriication of Iteration Space Transformations on Aine Programs. ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16) (2016), 539ś554.Google ScholarGoogle Scholar
  7. W. Bao, P. Rawat, M. Kong, S. Krishnamoorthy, L. Pouchet, and P. Sadayappan. 2017. Eicient Cache Simulation for Aine Computations. In International Workshop on Languages and Compilers for Parallel Computing (LCPC’17).Google ScholarGoogle Scholar
  8. W. Bao, S. Tavarageri, F. Ozguner, and P. Sadayappan. 2014. PWCET: Power-Aware Worst Case Execution Time Analysis. In 43rd International Conference on Parallel Processing Workshops. 439ś447.Google ScholarGoogle Scholar
  9. E. Berg and E. Hagersten. 2004. StatCache: a probabilistic approach to eicient and accurate data locality analysis. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’04). 20ś27. Google ScholarGoogle ScholarCross RefCross Ref
  10. Kristof Beyls and Erik H. D’Hollander. 2005. Generating cache hints for improved program eiciency. Journal of Systems Architecture 51, 4 (2005), 223 ś 250.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’08).Google ScholarGoogle Scholar
  12. T. Carlson, W. Heirman, S. Eyerman, I. Hur, and L. Eeckhout. 2014. An Evaluation of High-Level Mechanistic Core Models. ACM Transactions on Architecture and Code Optimization (2014).Google ScholarGoogle Scholar
  13. S. Carr, S. McKinley, and C. Tseng. 1994. Compiler Optimizations for Improving Data Locality. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’94). 252ś262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Cascaval and A. Padua. 2003. Estimating cache misses and locality using stack distances. In 17th Annual International Conference on Supercomputing (ICS’03). 150ś159.Google ScholarGoogle Scholar
  15. S. Chatterjee, E. Parker, J. Hanlon, and R. Lebeck. 2001. Exact Analysis of the Cache Behavior of Nested Loops. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’01). 286ś297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Edler and M. Hill. 1999. Dinero IV Trace-Driven Uniprocessor Cache Simulator. http://pages.cs.wisc.edu/~markhill/ DineroIVGoogle ScholarGoogle Scholar
  17. C. Fang, S. Can, S. Onder, and Z. Wang. 2005. Instruction based memory distance analysis and its application to optimization. In International Conference on Parallel Architectures and Compilation Techniques (PACT’05). 27ś37.Google ScholarGoogle Scholar
  18. C. Fang, S. Carr, S. Önder, and Z. Wang. 2004. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proc. 2004 Workshop on Memory System Performance. 60ś68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Feautrier. 1992. Some eicient solutions to the aine scheduling problem, part II: multidimensional time. International Journal of Parallel Programming (1992), 389ś420.Google ScholarGoogle Scholar
  20. J. Ferrante, V. Sarkar, and W. Thrash. 1991. On estimating and enhancing cache efectiveness. In International Workshop on Languages and Compilers for Parallel Computing (LCPC’91). 328ś343.Google ScholarGoogle Scholar
  21. B. Fraguela, R. Doallo, and L. Zapata. 1999. Automatic analytical modeling for the estimation of cache misses. In International Conference on Parallel Architectures and Compilation Techniques (PACT’99). 221ś231. Google ScholarGoogle ScholarCross RefCross Ref
  22. B. Fraguela, R. Doallo, and L. Zapata. 2003. Probabilistic miss equations: Evaluating memory hierarchy performance. IEEE Trans. Comput. (2003), 321ś336.Google ScholarGoogle Scholar
  23. A. Frumkin and Rob F. Van W. 2002. Tight bounds on cache use for stencil operations on rectangular grids. J. ACM (2002), 434ś453.Google ScholarGoogle Scholar
  24. S. Ghosh, M. Martonosi, and S. Malik. 1998. Precise Miss Analysis for Program Transformations with Caches of Arbitrary Associativity. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’98). 228ś239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Ghosh, M. Martonosi, and S. Malik. 1999. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (1999), 703ś746.Google ScholarGoogle Scholar
  26. S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. 2006. Semi-Automatic Composition of Loop Transformations. International Journal of Parallel Programming (2006), 261ś317.Google ScholarGoogle Scholar
  27. S. Harper, J. Kerbyson, and R. Nudd. 1999. Analytical modeling of set-associative cache behavior. IEEE Trans. Comput. (1999), 1009ś1024.Google ScholarGoogle Scholar
  28. D. Hill and J. Smith. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. (1989), 1612ś1630.Google ScholarGoogle Scholar
  29. C. Hong, W. Bao, A. Cohen, S. Krishnamoorthy, L. Pouchet, F. Rastello, J. Ramanujam, and P. Sadayappan. 2016. Efective Padding of Multidimensional Arrays to Avoid Cache Conlict Misses. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’16) (2016), 129ś144.Google ScholarGoogle Scholar
  30. W. Kelly and W. Pugh. 1993. A Framework for Unifying Reordering Transformations. Technical Report.Google ScholarGoogle Scholar
  31. M. Kong, R. Veras, K. Stock, F. Franchetti, L. Pouchet, and P. Sadayappan. 2013. When polyhedral transformations meet SIMD code generation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 127ś138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. Lim and S. Lam. 1997. Maximizing Parallelism and Minimizing Synchronization with Aine Transforms. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’97). 201ś214.Google ScholarGoogle Scholar
  33. C. Oppen. 1978. A 2 2 2pn upper bound on the complexity of Presburger arithmetic. J. Comput. System Sci. (1978), 323ś332.Google ScholarGoogle Scholar
  34. L. Pouchet. 2017a. PoCC, the Polyhedral Compiler Collection 1.4. http://pocc.sourceforge.netGoogle ScholarGoogle Scholar
  35. L. Pouchet. 2017b. PolyBench/C 4.0. http://polybench.sourceforge.netGoogle ScholarGoogle Scholar
  36. H. Ramaprasad and F. Mueller. 2005. Bounding worst-case data cache behavior by analytically deriving cache reference patterns. In 11th IEEE Real Time and Embedded Technology and Applications Symposium (RTAS’05). 148ś157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Rivera and C. Tseng. 1998. Data transformations for eliminating conlict misses. In ACM SIGPLAN conference on Programming language design and implementation (PLDI’98). 38ś49.Google ScholarGoogle Scholar
  38. V. Sarkar and N. Megiddo. 2000. An analytical model for loop tiling and its solution. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’00). IEEE, 146ś153. Google ScholarGoogle ScholarCross RefCross Ref
  39. J. Shirako, K. Sharma, N. Fauzia, L. Pouchet, J. Ramanujam, P Sadayappan, and V. Sarkar. 2012. Analytical bounds for optimal tile size selection. In International Conference on Compiler Construction (CC’12). Springer, 101ś121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Shrivastava, J. Lee, and R. Jeyapaul. 2010. Cache vulnerability equations for protecting data in embedded processor caches from soft errors. In ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’10). 143ś152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P. Singh, S. Stone, and F. Thiebaut. 1992. A model of workloads and its use in miss-rate prediction for fully associative caches. IEEE Trans. Comput. (1992), 811ś825.Google ScholarGoogle Scholar
  42. M. Valiev, J. Bylaska, N. Govind, K. Kowalski, Tjerk P. Straatsma, Hubertus J J. Van D., D. Wang, J. Nieplocha, E. Apra, L. Windus, et al. 2010. NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications (2010), 1477ś1489.Google ScholarGoogle Scholar
  43. X. Vera, J. Abella, A. González, and J. Llosa. 2003. Optimizing program locality through CMEs and GAs. In International Conference on Parallel Architectures and Compilation Techniques (PACT’03). 68ś78. Google ScholarGoogle ScholarCross RefCross Ref
  44. X. Vera, J. Abella, J. Llosa, and A. González. 2005. An Accurate Cost Model for Guiding Data Locality Transformations. ACM Transactions on Programming Languages and Systems (2005), 946ś987.Google ScholarGoogle Scholar
  45. X. Vera, N. Bermudo, J. Llosa, and A. González. 2004. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Transactions on Programming Languages and Systems (2004), 263ś300.Google ScholarGoogle Scholar
  46. X. Vera and J. Xue. 2002. Let’s study whole-program cache behaviour analytically. In International Symposium on HighPerformance Computer Architecture (HPCA’02). 175ś186. Google ScholarGoogle ScholarCross RefCross Ref
  47. S. Verdoolaege. 2007. Barvinok, a library for counting the integer points in parametric and non-parametric polytopes. http://barvinok.gforge.inria.frGoogle ScholarGoogle Scholar
  48. S. Verdoolaege. 2010a. ISL: An integer set library for the polyhedral model. In the 3rd International Congress on Mathematical Software.Google ScholarGoogle ScholarCross RefCross Ref
  49. S. Verdoolaege. 2010b. ISL, the Integer Set Library. http://repo.or.cz/w/isl.gitGoogle ScholarGoogle Scholar
  50. S. Verdoolaege and T. Grosser. 2012. Polyhedral extraction tool. In 2nd International Workshop on Polyhedral Compilation Techniques.Google ScholarGoogle Scholar
  51. S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica (2007), 37ś66.Google ScholarGoogle Scholar
  52. W. Wang and L. Baer. 1990. Eicient Trace-driven Simulation Method for Cache Performance Analysis. In ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’90). 27ś36.Google ScholarGoogle Scholar
  53. J. Xue and X. Vera. 2004. Eicient and accurate analytical modeling of whole-program data cache behavior. IEEE Trans. Comput. (2004), 547ś566.Google ScholarGoogle Scholar
  54. W. Zhang. 2005. Computing cache vulnerability to transient errors and its implication. In IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05). 427ś435. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analytical modeling of cache behavior for affine programs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Programming Languages
      Proceedings of the ACM on Programming Languages  Volume 2, Issue POPL
      January 2018
      1961 pages
      EISSN:2475-1421
      DOI:10.1145/3177123
      Issue’s Table of Contents

      Copyright © 2017 ACM

      © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 December 2017
      Published in pacmpl Volume 2, Issue POPL

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader