research-article

Open Access

Automatic generation of efficient sparse tensor format conversion routines

Authors:
Stephen Chou

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Fredrik Kjolstad

Stanford University, USA

Stanford University, USA
View Profile

,
Saman Amarasinghe

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2020Pages 823–838https://doi.org/10.1145/3385412.3385963

Published:11 June 2020Publication History

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 823–838

ABSTRACT

This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor’s nonzeros in memory. This lets a compiler emit code that performs complex remappings of nonzeros when converting between formats. We also develop a query language that can extract statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of many formats without hard-coding for any specific combination.

Our evaluation shows that the technique generates sparse tensor conversion routines with performance between 1.00 and 2.01× that of hand-optimized versions in SPARSKIT and Intel MKL, two popular sparse linear algebra libraries. And by emitting code that avoids materializing temporaries, which both libraries need for many combinations of source and target formats, our technique outperforms those libraries by 1.78 to 4.01× for CSC/COO to DIA/ELL conversion.

References

Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A Relational Engine for Graph Processing. ACM Trans. Database Syst. 42, 4, Article 20 (Oct. 2017), 44 pages. Google ScholarDigital Library
Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2014. Tensor Decompositions for Learning Latent Variable Models. J. Mach. Learn. Res. 15, Article 1 (Jan. 2014), 60 pages.Google ScholarDigital Library
Gilad Arnold. 2011. Data-Parallel Language for Correct and Efficient Sparse Matrix Codes. Ph.D. Dissertation. University of California, Berkeley.Google Scholar
Gilad Arnold, Johannes Hölzl, Ali Sinan Köksal, Rastislav Bodík, and Mooly Sagiv. 2010. Specifying and Verifying Sparse Matrix Codes. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (Baltimore, Maryland, USA) (ICFP ’10). ACM, New York, NY, USA, 249–260. Google ScholarDigital Library
Arash Ashari, Naser Sedaghati, John Eisenlohr, and P. Sadayappan. 2014. An Efficient Two-dimensional Blocking Strategy for Sparse Matrix-vector Multiplication on GPUs. In Proceedings of the 28th ACM International Conference on Supercomputing (Munich, Germany) (ICS ’14). ACM, New York, NY, USA, 273–282. 2597652.2597678 Google ScholarDigital Library
Brett W. Bader, Michael W. Berry, and Murray Browne. 2008. Discussion Tracking in Enron Email Using PARAFAC. Springer London, 147–163.Google Scholar
Brett W Bader and Tamara G Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205–231.Google ScholarDigital Library
M. Baskaran, B. Meister, N. Vasilache, and R. Lethin. 2012. Efficient and scalable computations with sparse tensors. In 2012 IEEE Conference on High Performance Extreme Computing. 1–6. HPEC.2012.6408676 Google ScholarCross Ref
Nathan Bell and Michael Garland. 2008. Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004. NVIDIA Corporation.Google Scholar
Nathan Bell and Michael Garland. 2009. Implementing Sparse Matrixvector Multiplication on Throughput-oriented Processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (Portland, Oregon) (SC ’09). ACM, New York, NY, USA, Article 18, 11 pages. Google ScholarDigital Library
Aart JC Bik and Harry AG Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. ACM, 416–424.Google ScholarDigital Library
Aart JC Bik and Harry AG Wijshoff. 1994. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing. Springer, 57–75.Google Scholar
Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrixtranspose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. ACM, 233–244.Google ScholarDigital Library
Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In IEEE International Symposium on Parallel and Distributed Processing, (IPDPS). 1–11.Google Scholar
Frank Cameron. 1993. Two space-saving algorithms for computing the permuted transpose of a sparse matrix. Advances in Engineering Software 17, 1 (Jan. 1993), 49–60. 90041-Q Google ScholarCross Ref
Hanfeng Chen, Joseph Vinish D’silva, Hongji Chen, Bettina Kemme, and Laurie Hendren. 2018. HorseIR: Bringing Array Programming Languages Together with Database Query Processing. In Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages (Boston, MA, USA) (DLS 2018). ACM, New York, NY, USA, 37–49. Google ScholarDigital Library
Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang. 2, OOPSLA, Article 123 (Oct. 2018), 30 pages.Google ScholarDigital Library
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011).Google ScholarDigital Library
Eduardo F. D’Azevedo, Mark R. Fahey, and Richard T. Mills. 2005. Vectorized Sparse Matrix Multiply for Compressed Row Storage Format. In Proceedings of the 5th International Conference on Computational Science - Volume Part I (Atlanta, GA) (ICCS’05). Springer-Verlag, Berlin, Heidelberg, 99–106. Google ScholarDigital Library
A. Elafrou, G. Goumas, and N. Koziris. 2017. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors. In 2017 46th International Conference on Parallel Processing (ICPP). 292–301.Google Scholar
Miguel A. Gonzalez-Mesa, Eladio D. Gutierrez, and Oscar Plata. 2013. Parallelizing the Sparse Matrix Transposition: Reducing the Programmer Effort Using Transactional Memory. Procedia Computer Science 18 (2013), 501 – 510. 2013 International Conference on Computational Science.Google ScholarCross Ref
Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw. 4, 3 (Sept. 1978), 250–269. Google ScholarDigital Library
Eun-jin Im and Katherine Yelick. 1998. Model-Based Memory Hierarchy Optimizations for Sparse Matrices. In In Workshop on Profile and Feedback-Directed Compilation.Google Scholar
Intel. 2020. Intel Math Kernel Library Developer Reference. https://software.intel.com/sites/default/files/mkl-2020-developerreference-c.pdf.pdfGoogle Scholar
Yuanlin Jiang. 2007. Techniques for Modeling Complex Reservoirs and Advanced Wells. Ph.D. Dissertation. Stanford University.Google Scholar
Jun Rao, H. Pirahesh, C. Mohan, and G. Lohman. 2006. Compiled Query Execution Engine using JVM. In 22nd International Conference on Data Engineering (ICDE’06). 23–23. Google ScholarDigital Library
David R. Kincaid, Thomas C. Oppe, and David M. Young. 1989. ITPACKV 2D User’s Guide.Google Scholar
Fredrik Kjolstad. 2020. Sparse Tensor Algebra Compilation. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. (2019), 180–192. http://dl.acm.org/citation.cfm?id=3314872.3314894Google Scholar
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang. 1, OOPSLA, Article 77 (Oct. 2017), 29 pages. Google ScholarDigital Library
Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Ph.D. Dissertation. Cornell University.Google Scholar
Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par’97 Parallel Processing. Springer, 318–327.Google Scholar
K. Krikellas, S. D. Viglas, and M. Cintra. 2010. Generating code for holistic query evaluation. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). 613–624. ICDE.2010.5447892 Google ScholarCross Ref
Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical Storage of Sparse Tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Dallas, Texas) (SC ’18). IEEE Press, Piscataway, NJ, USA, Article 19, 15 pages. Google ScholarDigital Library
B. Liu, C. Wen, A. D. Sarwate, and M. M. Dehnavi. 2017. A Unified Optimization Approach for Sparse Tensor Operations on GPUs. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 47–57. Google ScholarCross Ref
Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (Newport Beach, California, USA) (ICS ’15). ACM, New York, NY, USA, PLDI ’20, June 15–20, 2020, London, UK Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe 339–350. Google ScholarDigital Library
Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In High Performance Embedded Architectures and Compilers, Yale N. Patt, Pierfrancesco Foglia, Evelyn Duesterwald, Paolo Faraboschi, and Xavier Martorell (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 111–125.Google Scholar
Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. Technical report.Google Scholar
Payal Nandy, Mary Hall, Eddie C. Davis, Catherine Mills Olschanowsky, Mahdi Soltan Mohammadi, Wei He, and Michelle Mills Strout. 2018.Google Scholar
Abstractions for Specifying Sparse Matrix Data Transformations. In Proceedings of Eighth InternationalWorkshop on Polyhedral Compilation Techniques (Manchester, United Kingdom) (IMPACT 2018).Google Scholar
Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (June 2011), 539–550. Google ScholarDigital Library
Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2016. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. arXiv: cs.CV/1608.01409Google Scholar
Andrés Peratta and Viktor Popov. 2006. A new scheme for numerical modelling of flow and transport processes in 3D fractured porous media. Advances in Water Resources 29, 1 (2006), 42 – 61. Google ScholarCross Ref
Hamid Pirahesh, Joseph M. Hellerstein, and Waqar Hasan. 1992. Extensible/Rule Based Query Rewrite Optimization in Starburst. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data (San Diego, California, USA) (SIGMOD ’92). ACM, New York, NY, USA, 39–48. Google ScholarDigital Library
Hamid Pirahesh, T. Y. Cliff Leung, and Waqar Hasan. 1997. A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS. In Proceedings of the Thirteenth International Conference on Data Engineering (ICDE ’97). IEEE Computer Society, Washington, DC, USA, 391–400. http://dl.acm.org/citation.cfm?id=645482.653436Google ScholarDigital Library
William Pugh and Tatiana Shpeisman. 1999. SIPR: A new framework for generating efficient code for sparse matrix computations. In Languages and Compilers for Parallel Computing. Springer, 213–229.Google ScholarDigital Library
Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, and Trishul Chilimbi. 2017. Optimizing CNNs on Multicores for Scalability, Performance and Goodput. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China) (ASPLOS ’17). ACM, New York, NY, USA, 267–280. Google ScholarDigital Library
Youcef Saad. 1989. Krylov Subspace Methods on Supercomputers. SIAM J. Sci. Stat. Comput. 10, 6 (Nov. 1989), 1200–1232.Google ScholarCross Ref
org/10.1137/0910073 Google ScholarCross Ref
Youcef Saad. 1994. SPARSKIT: a basic tool kit for sparse matrix computations - Version 2.Google Scholar
Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM.Google Scholar
Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. ACM, 5.Google ScholarDigital Library
Paul Stodghill. 1997. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. Ph.D. Dissertation. Cornell University.Google Scholar
Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An approach for code generation in the Sparse Polyhedral Framework. Parallel Comput. 53 (2016), 32 – 57. Google ScholarDigital Library
Bor-Yiing Su and Kurt Keutzer. 2012. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (San Servolo Island, Venice, Italy) (ICS ’12). ACM, New York, NY, USA, 353–364. Google ScholarDigital Library
The SciPy community. 2018. scipy.sparse.dok_matrix – SciPy v1.1.0 Reference Guide. https://docs.scipy.org/doc/scipy/reference/ generated/scipy.sparse.dok_matrix.html.Google Scholar
William F Tinney and John W Walker. 1967. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proc. IEEE 55, 11 (1967), 1801–1809.Google ScholarCross Ref
Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (Portland, OR, USA) (PLDI 2015). 521–532.Google ScholarDigital Library
Hao Wang, Weifeng Liu, Kaixi Hou, and Wu-chun Feng. 2016. Parallel Transposition of Sparse Data Structures. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS âĂŹ16). Association for Computing Machinery, New York, NY, USA, Article 33, 13 pages. Google ScholarDigital Library
Tien-Hsiung Weng, Delgerdalai Batjargal, Hoa Pham, Meng-Yen Hsieh, and Kuan-Ching Li. 2013. Parallel Matrix Transposition and Vector Multiplication Using OpenMP. In Intelligent Technologies and Engineering Systems (Lecture Notes in Electrical Engineering), Jengnan Juang and Yi-Cheng Huang (Eds.). Springer, New York, NY, 243–249. Google ScholarCross Ref
Tien-Hsiung Weng, Hoa Pham, Hai Jiang, and Kuan-Ching Li. 2013. Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs. In Intelligent Technologies and Engineering Systems (Lecture Notes in Electrical Engineering), Jengnan Juang and Yi-Cheng Huang (Eds.). Springer, New York, NY, 251–257. 1-4614-6747-2_31 Google ScholarCross Ref
Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: Efficient Vectorization of SpMV on x86 Processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (Vienna, Austria) (CGO 2018). ACM, New York, NY, USA, 149–162. Google ScholarDigital Library

Index Terms

Automatic generation of efficient sparse tensor format conversion routines
1. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. Context specific languages
      1. Domain specific languages
  2. Software organization and properties
    1. Software system structures
      1. Abstraction, modeling and modularity

Recommendations

A sparse iteration space transformation framework for sparse tensor algebra

We address the problem of optimizing sparse tensor algebra in a compiler and show how to define standard loop transformations---split, collapse, and reorder---on sparse iteration spaces. The key idea is to track the transformation functions that map the ...
Read More
Format abstraction for sparse tensor algebra compilers

This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code ...
Read More
Compilation of dynamic sparse tensor algebra

Many applications, from social network graph analytics to control flow analysis, compute on sparse data that evolves over the course of program execution. Such data can be represented as dynamic sparse tensors and efficiently stored in formats (data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2020
1174 pages
ISBN:9781450376136
DOI:10.1145/3385412
General Chair:
Alastair F. Donaldson
Imperial College London, UK
,
Program Chair:
Emina Torlak
University of Washington, USA
Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional / v1.1
- Artifacts Evaluated & Reusable / v1.1
Author Tags
attribute query language
coordinate remapping notation
sparse tensor algebra
sparse tensor assembly
sparse tensor conversion
sparse tensor formats
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 984
  Total Downloads
- Downloads (Last 12 months)207
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic generation of efficient sparse tensor format conversion routines

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

A sparse iteration space transformation framework for sparse tensor algebra

Format abstraction for sparse tensor algebra compilers

Compilation of dynamic sparse tensor algebra