research-article

Open Access

ReACT: Redundancy-Aware Code Generation for Tensor Expressions

Authors:
Tong Zhou

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

,
Ruiqin Tian

Horizon Robotics, Shanghai, China

Horizon Robotics, Shanghai, China
View Profile

,
Rizwan A. Ashraf

Pacific Northwest National Laboratory

Pacific Northwest National Laboratory
View Profile

,
Roberto Gioiosa

Pacific Northwest National Laboratory

Pacific Northwest National Laboratory
View Profile

,
Gokcen Kestor

Pacific Northwest National Laboratory

Pacific Northwest National Laboratory
View Profile

,
Vivek Sarkar

Georgia Institute of Technology

Georgia Institute of Technology
View Profile

PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesOctober 2022Pages 1–13https://doi.org/10.1145/3559009.3569685

Published:27 January 2023Publication History

PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

Pages 1–13

ABSTRACT

High-level programming models for tensor computations are becoming increasingly popular in many domains such as machine learning and data science. The index notation is one such model that is widely adopted for expressing a wide range of tensor computations algorithmically and also as input to programming systems. In programming systems, sparse tensors can be specified as type annotations, and a compiler can be employed to perform code generation for the specified tensor expressions and sparse formats. Different code generation strategies and optimization decisions can have a significant impact on the performance of the generated code. However, the code generation strategies used by current state-of-the-art tensor compilers can result in redundant computations being present in the output code. In this work, we identify four common types of redundancies that can occur when generating code for compound expressions, and introduce new techniques that can avoid these redundancies. Empirical evaluation on real-world compound kernels, such as Sampled Dense Dense Matrix Multiplication (SDDMM), Graph Neural Network (GNN) and Matricized-Tensor Times Khatri-Rao Product (MTTKRP) shows that our generated code with redundancy elimination can result in performance improvements of 1.1× to 25× relative to a state-of-the-art Tensor Algebra COmpiler (TACO) and up to 101× relative to library approaches such as the SciPy.sparse.

Supplemental Material

Available for Download

zip

p1-zhou-supp.zip (1.9 MB)

Supplemental files.

References

2017. TACO web tool. http://tensor-compiler.org/codegen.htmlGoogle Scholar
2021. SciPy's implementation of SpMM. https://github.com/scipy/scipy/blob/8a64c938ddf1ae4c02a08d2c5e38daeb8d061d38/scipy/sparse/sparsetools/csr.h#L1172Google Scholar
2022. The COMET compiler. https://github.com/pnnl/COMETGoogle Scholar
2022. NumPy einsum API Reference. https://numpy.org/doc/stable/reference/generated/numpy.einsum.htmlGoogle Scholar
Evrim Acar, Canan Aykut-Bingol, Haluk Bingol, Rasmus Bro, and Bülent Yener. 2007. Multiway analysis of epilepsy tensors. Bioinformatics 23, 13 (2007), i10--i18.Google ScholarDigital Library
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., USA.Google ScholarDigital Library
Travis Augustine, Janarthanan Sarma, Louis-Noël Pouchet, and Gabriel Rodríguez. 2019. Generating Piecewise-Regular Code from Irregular Structures. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). Association for Computing Machinery, New York, NY, USA, 625--639. Google ScholarDigital Library
G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi chung Lam, Qingda Lu, M. Nooijen, R.M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov. 2005. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models. Proc. IEEE 93, 2 (2005), 276--292. Google ScholarCross Ref
Guillaume Bouchard, Jason Naradowsky, Sebastian Riedel, Tim Rocktäschel, and Andreas Vlachos. 2015. Matrix and tensor factorization methods for natural language processing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing: Tutorial Abstracts. 16--18.Google ScholarCross Ref
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An Automated {End-to-End} Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578--594.Google Scholar
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (dec 2011), 25 pages. Google ScholarDigital Library
Yufei Ding and Xipeng Shen. 2017. GLORE: Generalized Loop Redundancy Elimination upon LER-Notation. Proc. ACM Program. Lang. 1, OOPSLA, Article 74 (Oct. 2017), 28 pages. Google ScholarDigital Library
Guang Gao, Russell Olsen, Vivek Sarkar, and Radhika Thekkath. 1992. Collective loop fusion for array contraction. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 281--295.Google ScholarDigital Library
So Hirata. 2003. Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. The Journal of Physical Chemistry A 107, 46 (2003), 9887--9897.Google ScholarCross Ref
Ken Kennedy and John R. Allen. 2001. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google Scholar
Ken Kennedy and Kathryn S McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 301--320.Google Scholar
Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (Washington, DC, USA) (CGO 2019). IEEE Press, 180--192.Google ScholarDigital Library
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang. 1, OOPSLA, Article 77 (Oct. 2017), 29 pages. Google ScholarDigital Library
Donald E Knuth. 1998. The art of computer programming, 2nd edn. Sorting and searching, vol. 3.Google ScholarDigital Library
Tamara G Kolda and Jimeng Sun. 2008. Scalable tensor decompositions for multiaspect data mining. In 2008 Eighth IEEE international conference on data mining. IEEE, 363--372.Google ScholarDigital Library
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009), 30--37.Google ScholarDigital Library
Xupeng Li, Bin Cui, Yiru Chen, Wentao Wu, and Ce Zhang. 2017. Mlog: Towards declarative in-database machine learning. Proceedings of the VLDB Endowment 10, 12 (2017), 1933--1936.Google ScholarDigital Library
Julian McAuley and Jure Leskovec. 2013. Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text. In Proceedings of the 7th ACM Conference on Recommender Systems (Hong Kong, China) (RecSys '13). Association for Computing Machinery, New York, NY, USA, 165--172. Google ScholarDigital Library
Erdal Mutlu, Ruiqin Tian, Bin Ren, Sriram Krishnamoorthy, Roberto Gioiosa, Jacques Pienaar, and Gokcen Kestor. 2022. COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry. In Languages and Compilers for Parallel Computing, Barbara Chapman and José Moreira (Eds.). Springer International Publishing, Cham, 87--103.Google Scholar
Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, and Bin Ren. 2021. DNN-Fusion: accelerating deep neural networks execution with advanced operator fusion. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 883--898.Google ScholarDigital Library
Bo Qiao, Oliver Reiche, Frank Hannig, and Jïrgen Teich. 2019. From loop fusion to kernel fusion: A domain-specific approach to locality optimization. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 242--253.Google ScholarCross Ref
Md. Khaledur Rahman, Majedul Haque Sujon, and Ariful Azad. 2021. FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2021), 256--266.Google Scholar
Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. 2009. Learning optimal ranking with tensor factorization for tag recommendation. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 727--736.Google ScholarDigital Library
V. Sarkar. 1997. Automatic selection of high-order transformations in the IBM XL FORTRAN compilers. IBM Journal of Research and Development 41, 3 (1997), 233--264. Google ScholarDigital Library
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The Graph Neural Network Model. IEEE Transactions on Neural Networks 20, 1 (2009), 61--80. Google ScholarDigital Library
Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. 2020. A Sparse Iteration Space Transformation Framework for Sparse Tensor Algebra. Proc. ACM Program. Lang. 4, OOPSLA, Article 158 (Nov. 2020), 30 pages. Google ScholarDigital Library
Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/Google Scholar
Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium. 61--70. Google ScholarDigital Library
Daniele G. Spampinato and Markus Püschel. 2016. A Basic Linear Algebra Compiler for Structured Matrices. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (Barcelona, Spain) (CGO '16). Association for Computing Machinery, New York, NY, USA, 117--127. Google ScholarDigital Library
Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code. Proc. IEEE 106, 11 (2018), 1921--1934. Google ScholarCross Ref
Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An Approach for Code Generation in the Sparse Polyhedral Framework. Parallel Comput. 53, C (April 2016), 32--57. Google ScholarDigital Library
TensorFlow. 2017. XLA: Optimizing Compiler for machine learning. https://www.tensorflow.org/xlaGoogle Scholar
Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High Performance Sparse Tensor Algebra Compiler in MLIR. In 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). 27--38. Google ScholarCross Ref
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).Google Scholar
Yin Zhang, Min Chen, Shiwen Mao, Long Hu, and Victor CM Leung. 2014. CAP: Community activity prediction based on big data analysis. IEEE Network 28, 4 (2014), 52--57.Google ScholarCross Ref

Index Terms

ReACT: Redundancy-Aware Code Generation for Tensor Expressions
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Detecting code smells in React-based Web apps
Abstract Context:
Facebook’s React is a widely popular JavaScript library to build rich and interactive user interfaces (UI). However, due to the complexity of modern Web UIs, React applications can have hundreds of components and ...
Read More
Image compressive sensing via Truncated Schatten-p Norm regularization

Low-rank property as a useful image prior has attracted much attention in image processing communities. Recently, a nonlocal low-rank regularization (NLR) approach toward exploiting low-rank property has shown the state-of-the-art performance in ...
Read More
Compressive sensing via nonlocal low-rank tensor regularization

The aim of Compressing sensing (CS) is to acquire an original signal, when it is sampled at a lower rate than Nyquist rate previously. In the framework of CS, the original signal is often assumed to be sparse and correlated in some domain. Recently, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques
October 2022
569 pages
ISBN:9781450398688
DOI:10.1145/3559009
General Chair:
Andreas Kloeckner
University of Illinois
,
Program Chair:
José Moreira
IBM
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 January 2023
Check for updates
Badges
- Artifacts Evaluated & Reusable / v1.1
- Results Reproduced / v1.1
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 339
  Total Downloads
- Downloads (Last 12 months)254
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ReACT: Redundancy-Aware Code Generation for Tensor Expressions

PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Detecting code smells in React-based Web apps

Image compressive sensing via Truncated Schatten-p Norm regularization

Compressive sensing via nonlocal low-rank tensor regularization