ABSTRACT
High level data structures are a cornerstone of modern programming and at the same time stand in the way of compiler optimizations. In order to reason about user- or library-defined data structures compilers need to be extensible. Common mechanisms to extend compilers fall into two categories. Frontend macros, staging or partial evaluation systems can be used to programmatically remove abstraction and specialize programs before they enter the compiler. Alternatively, some compilers allow extending the internal workings by adding new transformation passes at different points in the compile chain or adding new intermediate representation (IR) types. None of these mechanisms alone is sufficient to handle the challenges posed by high level data structures. This paper shows a novel way to combine them to yield benefits that are greater than the sum of the parts.
Instead of using staging merely as a front end, we implement internal compiler passes using staging as well. These internal passes delegate back to program execution to construct the transformed IR. Staging is known to simplify program generation, and in the same way it can simplify program transformation. Defining a transformation as a staged IR interpreter is simpler than implementing a low-level IR to IR transformer. With custom IR nodes, many optimizations that are expressed as rewritings from IR nodes to staged program fragments can be combined into a single pass, mitigating phase ordering problems. Speculative rewriting can preserve optimistic assumptions around loops.
We demonstrate several powerful program optimizations using this architecture that are particularly geared towards data structures: a novel loop fusion and deforestation algorithm, array of struct to struct of array conversion, object flattening and code generation for heterogeneous parallel devices. We validate our approach using several non trivial case studies that exhibit order of magnitude speedups in experiments.
Supplemental Material
- S. Ackermann, V. Jovanovic, T. Rompf, and M. Odersky. Jet: An embedded dsl for high performance big data processing. BigData, 2012. http://infoscience.epfl.ch/record/181673/files/paper.pdf.Google Scholar
- M. S. Ager, O. Danvy, and H. K. Rohde. Fast partial evaluation of pattern matching in strings. ACM Trans. Program. Lang. Syst., 28 (4): 696--714, 2006. Google ScholarDigital Library
- J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: a java-compatible and synthesizable language for heterogeneous architectures. OOPSLA, 2010. Google ScholarDigital Library
- M. Bravenboer, A. van Dam, K. Olmos, and E. Visser. Program transformation with scoped dynamic rewrite rules. Fundam. Inf., 69: 123--178, July 2005. Google ScholarDigital Library
- K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. A heterogeneous parallel framework for domain-specific languages. PACT, 2011. Google ScholarDigital Library
- J. A. Brzozowski. Derivatives of regular expressions. J. ACM, 11 (4): 481--494, 1964. Google ScholarDigital Library
- C. Calcagno, W. Taha, L. Huang, and X. Leroy. Implementing multi-stage languages using asts, gensym, and reflection. In GPCE, 2003. Google ScholarDigital Library
- J. Carette, O. Kiselyov, and C. chieh Shan. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. J. Funct. Program., 19 (5): 509--543, 2009. Google ScholarDigital Library
- C. Click and K. D. Cooper. Combining analyses, combining optimizations. ACM Trans. Program. Lang. Syst., 17: 181--196, March 1995. Google ScholarDigital Library
- C. Consel and O. Danvy. Partial evaluation of pattern matching in strings. Inf. Process. Lett., 30 (2): 79--86, 1989. Google ScholarDigital Library
- W. R. Cook, B. Delaware, T. Finsterbusch, A. Ibrahim, and B. Wiedermann. Model transformation by partial evaluation of model interpreters. Technical Report TR-09-09, UT Austin Department of Computer Science, 2008.Google Scholar
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In ICFP, 2007. Google ScholarDigital Library
- T. Ekman and G. Hedin. The jastadd system - modular extensible compiler construction. Sci. Comput. Program., 69 (1--3): 14--26, 2007. Google ScholarDigital Library
- C. Elliott, S. Finne, and O. de Moor. Compiling embedded languages. In W. Taha, editor, phSemantics, Applications, and Implementation of Program Generation, volume 1924 of Lecture Notes in Computer Science, pages 9--26. Springer Berlin / Heidelberg, 2000. Google ScholarDigital Library
- Y. Futamura. Partial evaluation of computation process - an approach to a compiler-compiler. Higher-Order and Symbolic Computation, 12 (4): 381--391, 1999. Google ScholarDigital Library
- C. Grelck, K. Hinckfuß, and S.-B. Scholz. With-loop fusion for data locality and parallelism. IFL, 2006. Google ScholarDigital Library
- D. M. Groenewegen, Z. Hemel, L. C. L. Kats, and E. Visser. WebDSL: a domain-specific language for dynamic web applications. In OOPSLA Companion, 2008. Google ScholarDigital Library
- C. Hofer, K. Ostermann, T. Rendel, and A. Moors. Polymorphic embedding of DSLs. GPCE, 2008. Google ScholarDigital Library
- JetBrains. Meta Programming System, 2009. URL http://www.jetbrains.com/mps/.Google Scholar
- N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. Google ScholarDigital Library
- S. L. P. Jones, R. Leshchinskiy, G. Keller, and M. M. T. Chakravarty. Harnessing the Multicores: Nested Data Parallelism in Haskell. In FSTTCS, 2008.Google Scholar
- S. P. Jones, A. Tolmach, and T. Hoare. Playing by the rules: rewriting as a practical optimisation technique in ghc. Haskell, 2001.Google Scholar
- S. Karmesin, J. Crotinger, J. Cummings, S. Haney, W. Humphrey, J. Reynders, S. Smith, and T. J. Williams. Array design and expression evaluation in pooma ii. In ISCOPE, 1998. Google ScholarDigital Library
- L. C. L. Kats and E. Visser. The Spoofax language workbench. rules for declarative specification of languages and IDEs. In SPLASH/OOPSLA Companion, 2010. Google ScholarDigital Library
- R. Kelsey and P. Hudak. Realistic compilation by program transformation. In POPL, 1989. Google ScholarDigital Library
- K. Kennedy, B. Broom, A. Chauhan, R. Fowler, J. Garvin, C. Koelbel, C. McCosh, and J. Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93 (3): 387--408, 2005.Google ScholarCross Ref
- G. Kossakowski, N. Amin, T. Rompf, and M. Odersky. Javascript as an embedded dsl. In ECOOP, 2012. Google ScholarDigital Library
- H. Lee, K. J. Brown, A. K. Sujeeth, H. Chafi, T. Rompf, M. Odersky, and K. Olukotun. Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro, 31 (5): 42--53, 2011. Google ScholarDigital Library
- S. Lerner, D. Grove, and C. Chambers. Composing dataflow analyses and transformations. SIGPLAN Not., 37: 270--282, January 2002. Google ScholarDigital Library
- S. Lerner, T. D. Millstein, and C. Chambers. Automatically proving the correctness of compiler optimizations. In PLDI, 2003. Google ScholarDigital Library
- A. Møller. dk.brics.automaton -- finite-state automata and regular expressions for Java, 2010.texttthttp://www.brics.dk/automaton/.Google Scholar
- A. Moors, T. Rompf, P. Haller, and M. Odersky. Scala-virtualized. PEPM, 2012. Google ScholarDigital Library
- N. Nystrom, M. R. Clarkson, and A. C. Myers. Polyglot: An extensible compiler framework for java. In CC, 2003. Google ScholarDigital Library
- N. Nystrom, D. White, and K. Das. Firepile: run-time compilation for gpus in scala. GPCE, 2011. Google ScholarDigital Library
- S. Owens, J. Reppy, and A. Turon. Regular-expression derivatives re-examined. J. Funct. Program., 19 (2): 173--190, Mar. 2009. Google ScholarDigital Library
- D. J. Quinlan, M. Schordan, Q. Yi, and A. Sæbjørnsen. Classification and utilization of abstractions for optimization. In ISoLA (Preliminary proceedings), 2004. Google ScholarDigital Library
- T. Rompf. phLightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming. PhD thesis, EPFL, 2012.Google Scholar
- T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. GPCE, 2010. Google ScholarDigital Library
- T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. Commun. ACM, 55 (6): 121--130, 2012. Google ScholarDigital Library
- T. Rompf, I. Maier, and M. Odersky. Implementing first-class polymorphic delimited continuations by a type-directed selective cps-transform. In ICFP, 2009. Google ScholarDigital Library
- T. Rompf, A. K. Sujeeth, H. Lee, K. J. Brown, H. Chafi, M. Odersky, and K. Olukotun. Building-blocks for performance oriented dsls. DSL, 2011.Google ScholarCross Ref
- A. Shali and W. R. Cook. Hybrid partial evaluation. OOPSLA, 2011. Google ScholarDigital Library
- M. Sperber and P. Thiemann. Realistic compilation by partial evaluation. In PLDI, 1996. Google ScholarDigital Library
- A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, M. Wu, A. R. Atreya, M. Odersky, and K. Olukotun. OptiML: an implicitly parallel domain-specific language for machine learning. ICML, 2011.Google Scholar
- E. Sumii and N. Kobayashi. A hybrid approach to online and offline partial evaluation. Higher-Order and Symbolic Computation, 14 (2--3): 101--142, 2001. Google ScholarDigital Library
- W. Taha and T. Sheard. Metaml and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248 (1--2): 211--242, 2000. Google ScholarDigital Library
- R. Tate, M. Stepp, Z. Tatlock, and S. Lerner. Equality saturation: a new approach to optimization. In POPL, 2009. Google ScholarDigital Library
- R. Tate, M. Stepp, and S. Lerner. Generating compiler optimizations from proofs. In POPL, 2010. Google ScholarDigital Library
- P. Thiemann and D. Dussart. Partial evaluation for higher-order languages with state. Technical report, 1999. URL http://www.informatik.uni-freiburg.de/ thiemann/papers/mlpe.ps.gz.Google Scholar
- S. Tobin-Hochstadt, V. St-Amour, R. Culpepper, M. Flatt, and M. Felleisen. Languages as libraries. PLDI'11, 2011. Google ScholarDigital Library
- T. L. Veldhuizen. Expression templates, C++gems. SIGS Publications, Inc., New York, NY, 1996. Google ScholarDigital Library
- T. L. Veldhuizen. Arrays in blitz. In ISCOPE, 1998. Google ScholarDigital Library
- T. L. Veldhuizen and J. G. Siek. Combining optimizations, combining theories. Technical report, Indiana University, 2008.Google Scholar
- P. Wadler. Deforestation: Transforming programs to eliminate trees. Theor. Comput. Sci., 73 (2): 231--248, 1990. Google ScholarDigital Library
- P. Wadler and S. Blott. How to make ad-hoc polymorphism less ad-hoc. In POPL, 1989. Google ScholarDigital Library
Index Terms
- Optimizing data structures in high-level programs: new directions for extensible compilers based on staging
Recommendations
Optimizing data structures in high-level programs: new directions for extensible compilers based on staging
POPL '13High level data structures are a cornerstone of modern programming and at the same time stand in the way of compiler optimizations. In order to reason about user- or library-defined data structures compilers need to be extensible. Common mechanisms to ...
A facility for the downward extension of a high-level language
Proceedings of the 1982 SIGPLAN symposium on Compiler constructionThis paper presents a method whereby a high-level language can be extended to provide access to all the capabilities of the underlying hardware and operating system of a machine. In essence, it is a facility that allows a user to make special purpose ...
Surgical precision JIT compilers
PLDI '14Just-in-time (JIT) compilation of running programs provides more optimization opportunities than offline compilation. Modern JIT compilers, such as those in virtual machines like Oracle's HotSpot for Java or Google's V8 for JavaScript, rely on dynamic ...
Comments