skip to main content
10.1145/3559009.3569685acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections

ReACT: Redundancy-Aware Code Generation for Tensor Expressions

Published:27 January 2023Publication History

ABSTRACT

High-level programming models for tensor computations are becoming increasingly popular in many domains such as machine learning and data science. The index notation is one such model that is widely adopted for expressing a wide range of tensor computations algorithmically and also as input to programming systems. In programming systems, sparse tensors can be specified as type annotations, and a compiler can be employed to perform code generation for the specified tensor expressions and sparse formats. Different code generation strategies and optimization decisions can have a significant impact on the performance of the generated code. However, the code generation strategies used by current state-of-the-art tensor compilers can result in redundant computations being present in the output code. In this work, we identify four common types of redundancies that can occur when generating code for compound expressions, and introduce new techniques that can avoid these redundancies. Empirical evaluation on real-world compound kernels, such as Sampled Dense Dense Matrix Multiplication (SDDMM), Graph Neural Network (GNN) and Matricized-Tensor Times Khatri-Rao Product (MTTKRP) shows that our generated code with redundancy elimination can result in performance improvements of 1.1× to 25× relative to a state-of-the-art Tensor Algebra COmpiler (TACO) and up to 101× relative to library approaches such as the SciPy.sparse.

Skip Supplemental Material Section

Supplemental Material

References

  1. 2017. TACO web tool. http://tensor-compiler.org/codegen.htmlGoogle ScholarGoogle Scholar
  2. 2021. SciPy's implementation of SpMM. https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/sparse/sparsetools/csr.h#L1172Google ScholarGoogle Scholar
  3. 2022. The COMET compiler. https://github.com/pnnl/COMETGoogle ScholarGoogle Scholar
  4. 2022. NumPy einsum API Reference. https://numpy.org/doc/stable/reference/generated/numpy.einsum.htmlGoogle ScholarGoogle Scholar
  5. Evrim Acar, Canan Aykut-Bingol, Haluk Bingol, Rasmus Bro, and Bülent Yener. 2007. Multiway analysis of epilepsy tensors. Bioinformatics 23, 13 (2007), i10--i18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). Association for Computing Machinery, New York, NY, USA, 625--639. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi chung Lam, Qingda Lu, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov. 2005. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models. Proc. IEEE 93, 2 (2005), 276--292. Google ScholarGoogle ScholarCross RefCross Ref
  9. Guillaume Bouchard, Jason Naradowsky, Sebastian Riedel, Tim Rocktäschel, and Andreas Vlachos. 2015. Matrix and tensor factorization methods for natural language processing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Tutorial Abstracts. 16--18.Google ScholarGoogle ScholarCross RefCross Ref
  10. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An Automated {End-to-End} Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578--594.Google ScholarGoogle Scholar
  11. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (dec 2011), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yufei Ding and Xipeng Shen. 2017. GLORE: Generalized Loop Redundancy Elimination upon LER-Notation. Proc. ACM Program. Lang. 1, OOPSLA, Article 74 (Oct. 2017), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Guang Gao, Russell Olsen, Vivek Sarkar, and Radhika Thekkath. 1992. Collective loop fusion for array contraction. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 281--295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. So Hirata. 2003. Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. The Journal of Physical Chemistry A 107, 46 (2003), 9887--9897.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ken Kennedy and John R. Allen. 2001. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle Scholar
  16. Ken Kennedy and Kathryn S McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 301--320.Google ScholarGoogle Scholar
  17. Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (Washington, DC, USA) (CGO 2019). IEEE Press, 180--192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang. 1, OOPSLA, Article 77 (Oct. 2017), 29 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Donald E Knuth. 1998. The art of computer programming, 2nd edn. Sorting and searching, vol. 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tamara G Kolda and Jimeng Sun. 2008. Scalable tensor decompositions for multiaspect data mining. In 2008 Eighth IEEE international conference on data mining. IEEE, 363--372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xupeng Li, Bin Cui, Yiru Chen, Wentao Wu, and Ce Zhang. 2017. Mlog: Towards declarative in-database machine learning. Proceedings of the VLDB Endowment 10, 12 (2017), 1933--1936.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Julian McAuley and Jure Leskovec. 2013. Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text. In Proceedings of the 7th ACM Conference on Recommender Systems (Hong Kong, China) (RecSys '13). Association for Computing Machinery, New York, NY, USA, 165--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Erdal Mutlu, Ruiqin Tian, Bin Ren, Sriram Krishnamoorthy, Roberto Gioiosa, Jacques Pienaar, and Gokcen Kestor. 2022. COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry. In Languages and Compilers for Parallel Computing, Barbara Chapman and José Moreira (Eds.). Springer International Publishing, Cham, 87--103.Google ScholarGoogle Scholar
  25. Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, and Bin Ren. 2021. DNN-Fusion: accelerating deep neural networks execution with advanced operator fusion. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 883--898.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bo Qiao, Oliver Reiche, Frank Hannig, and Jïrgen Teich. 2019. From loop fusion to kernel fusion: A domain-specific approach to locality optimization. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 242--253.Google ScholarGoogle ScholarCross RefCross Ref
  27. Md. Khaledur Rahman, Majedul Haque Sujon, and Ariful Azad. 2021. FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2021), 256--266.Google ScholarGoogle Scholar
  28. Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. 2009. Learning optimal ranking with tensor factorization for tag recommendation. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 727--736.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. Sarkar. 1997. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers. IBM Journal of Research and Development 41, 3 (1997), 233--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The Graph Neural Network Model. IEEE Transactions on Neural Networks 20, 1 (2009), 61--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang. 4, OOPSLA, Article 158 (Nov. 2020), 30 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/Google ScholarGoogle Scholar
  33. Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium. 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Daniele G. Spampinato and Markus Püschel. 2016. A Basic Linear Algebra Compiler for Structured Matrices. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (Barcelona, Spain) (CGO '16). Association for Computing Machinery, New York, NY, USA, 117--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proc. IEEE 106, 11 (2018), 1921--1934. Google ScholarGoogle ScholarCross RefCross Ref
  36. Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An Approach for Code Generation in the Sparse Polyhedral Framework. Parallel Comput. 53, C (April 2016), 32--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. TensorFlow. 2017. XLA: Optimizing Compiler for machine learning. https://www.tensorflow.org/xlaGoogle ScholarGoogle Scholar
  38. Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High Performance Sparse Tensor Algebra Compiler in MLIR. In 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 27--38. Google ScholarGoogle ScholarCross RefCross Ref
  39. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).Google ScholarGoogle Scholar
  40. Yin Zhang, Min Chen, Shiwen Mao, Long Hu, and Victor CM Leung. 2014. CAP: Community activity prediction based on big data analysis. IEEE Network 28, 4 (2014), 52--57.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. ReACT: Redundancy-Aware Code Generation for Tensor Expressions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader