ABSTRACT
Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet.
- Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283. https://www.usenix.org/system/files/conference/osdi16/ osdi16-abadi.pdf Google ScholarDigital Library
- Martin Abadi, Michael Isard, and Derek G. Murray. 2017. A Computational Model for TensorFlow (An Introduction).Google Scholar
- Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. DeepCoder: Learning to Write Programs. CoRR abs/1611.01989 (2016). arXiv: 1611.01989 http://arxiv. org/abs/1611.01989Google Scholar
- Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2015. Automatic differentiation in machine learning: a survey. CoRR abs/1502.05767 (2015). arXiv: 1502.05767 http://arxiv.org/abs/1502.05767Google Scholar
- Oliver Breuleux and Bart van MerriÃńnboer. 2017. Automatic differentiation in Myia. (2017). https://openreview.net/pdf?id=S1hcluzAbGoogle Scholar
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: End-to-End Compilation Stack for Deep Learning. In SysML 2018. https://arxiv.org/abs/1802.04799Google Scholar
- Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00). ACM, New York, NY, USA, 268–279. Google ScholarDigital Library
- Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, MarcâĂŹAurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In NIPS. https://research.google.com/archive/large_ deep_networks_nips2012.html Google ScholarDigital Library
- Gabriel Ebner, Sebastian Ullrich, Jared Roesch, Jeremy Avigad, and Leonardo de Moura. 2017. A Metaprogramming Framework for Formal Verification. Proc. ACM Program. Lang. 1, ICFP, Article 34 (Aug. 2017), 29 pages. Google ScholarDigital Library
- Conal Elliott. 2009. Beautiful differentiation. In International Conference on Functional Programming (ICFP). http://conal.net/papers/ beautiful-differentiation Google ScholarDigital Library
- Shixiang Gu, Ethan Holly, Timothy P. Lillicrap, and Sergey Levine. 2016. Deep Reinforcement Learning for Robotic Manipulation. CoRR abs/1610.00633 (2016). arXiv: 1610.00633 http://arxiv.org/abs/1610. 00633Google Scholar
- Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep Learning with Limited Numerical Precision. CoRR abs/1502.02551 (2015). arXiv: 1502.02551 http://arxiv.org/abs/ 1502.02551Google Scholar
- Gustafson and Yonemoto. 2017. Beating Floating Point at Its Own Game: Posit Arithmetic. Supercomput. Front. Innov.: Int. J. 4, 2 (June 2017), 71–86. Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015). arXiv: 1512.03385 http://arxiv.org/abs/1512.03385Google Scholar
- Simon Peyton Jones. {n. d.}. Into the Core - Squeezing Haskell into Nine Constructors. ({n. d.}). https://www.youtube.com/watch?v=uR_ VzYxvbxgGoogle Scholar
- Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, Richard C. Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. CoRR abs/1704.04760 (2017). arXiv: 1704.04760 http://arxiv.org/abs/1704.04760 Google ScholarDigital Library
- Edward Kmett, Barak Pearlmutter, and Jeffrey Mark Siskind. 2008. ad: Automatic Differentiation. (2008). https://github.com/ekmett/adGoogle Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’12). Curran Associates Inc., USA, 1097–1105. http://dl.acm.org/citation.cfm?id=2999134.2999257 Google ScholarDigital Library
- John Launchbury and Simon L. Peyton Jones. 1994. Lazy Functional State Threads. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI ’94). ACM, New York, NY, USA, 24–35. Google ScholarDigital Library
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarCross Ref
- Moshe Looks, Marcello Herreshoff, and DeLesley Hutchins. 2017. Announcing TensorFlow Fold: Deep Learning With Dynamic Computation Graphs. https://research.googleblog.com/2017/02/ announcing-tensorflow-fold-deep.html . (7 February 2017).Google Scholar
- Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, and Pengcheng Yin. {n. d.}. DyNet: The Dynamic Neural Network Toolkit. ({n. d.}). https://arxiv.org/abs/1701.03980Google Scholar
- Christopher Olah. 2015. Calculus on Computational Graphs: Backpropagation. (2015). http://colah.github.io/posts/2015-08-Backprop/Google Scholar
- Pavel Panchekha, Alex Sanchez-Stern, James R. Wilcox, and Zachary Tatlock. 2015. Automatically Improving Accuracy for Floating Point Expressions. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15), Vol. 50. ACM, New York, NY, USA, 1–11. Google ScholarDigital Library
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017). https://openreview.net/pdf?id=BJJsrmfCZGoogle Scholar
- Barak A. Pearlmutter and Jeffrey Mark Siskind. 2008. Reverse-mode AD in a Functional Framework: Lambda the Ultimate Backpropagator. ACM Trans. Program. Lang. Syst. 30, 2, Article 7 (March 2008), 36 pages. Google ScholarDigital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA, 519–530. Google ScholarDigital Library
- Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Satish Nadathur, Jakob Olesen, Jongsoo Park, Artem Rakhov, and Misha Smelyanskiy. 2018. Glow: Graph Lowering Compiler Techniques for Neural Networks. CoRR abs/1805.00907 (2018). arXiv: 1805.00907 https://arxiv.org/abs/1805.00907Google Scholar
- Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic Optimization of Floating-point Programs with Tunable Precision. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 53–64. Google ScholarDigital Library
- Asim Shankar and Wolff Dobson. 2017. Eager Execution: An imperative, define-by-run interface to TensorFlow. (2017). https: //ai.googleblog.com/2017/10/eager-execution-imperative-define-by. htmlGoogle Scholar
- Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015. Chainer: a Next-Generation Open Source Framework for Deep Learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS). http://learningsys.org/papers/LearningSys_ 2015_paper_33.pdfGoogle Scholar
- Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: FrameworkAgnostic High-Performance Machine Learning Abstractions. (2018). arXiv: 1802.04730 https://arxiv.org/abs/1802.04730Google Scholar
- Mu Wang and Alex Pothen. 2017. An Overview of High Order Reverse Mode. (2017). https://openreview.net/pdf?id=Hkmj6tzRZGoogle Scholar
- Richard Wei, Vikram S. Adve, and Lane Schwartz. 2017. DLVM: A modern compiler infrastructure for deep learning systems. CoRR abs/1711.03016 (2017). arXiv: 1711.03016 http://arxiv.org/abs/1711. 03016Google Scholar
- Alex Wiltschko. 2017. Tangent: Source-to-Source Debuggable Derivatives. (2017). https://ai.googleblog.com/2017/11/ tangent-source-to-source-debuggable.htmlGoogle Scholar
Index Terms
- Relay: a new IR for machine learning frameworks
Recommendations
IRDL: an IR definition language for SSA compilers
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationDesigning compiler intermediate representations (IRs) is often a manual process that makes exploration and innovation in this space costly. Developers typically use general-purpose programming languages to design IRs. As a result, IR implementations are ...
Rethinking compilers in the rise of machine learning and AI (keynote)
CC 2018: Proceedings of the 27th International Conference on Compiler ConstructionRecent years have witnessed some influential progresses in Machine Learning (ML) and Artificial Intelligence (AI). The progresses may lead to some significant changes to future programming. Many programs, for instance, may be not code written in some ...
3CPS: The Design of an Environment-Focussed Intermediate Representation
IFL '21: Proceedings of the 33rd Symposium on Implementation and Application of Functional LanguagesWe describe the design of 3CPS, a compiler intermediate representation (IR) we have developed for use in compiling call-by-value functional languages such as SML, OCaml, Scheme, and Lisp. The language is a low-level form designed in tandem with a ...
Comments