skip to main content
10.1145/3385412.3385963acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections

Automatic generation of efficient sparse tensor format conversion routines

Published:11 June 2020Publication History

ABSTRACT

This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor’s nonzeros in memory. This lets a compiler emit code that performs complex remappings of nonzeros when converting between formats. We also develop a query language that can extract statistics about sparse tensors, and we show how to emit efficient analysis code that computes such queries. Finally, we define an abstract interface that captures how data structures for storing a tensor can be efficiently assembled given specific statistics about the tensor. Disparate formats can implement this common interface, thus letting a compiler emit optimized sparse tensor conversion code for arbitrary combinations of many formats without hard-coding for any specific combination.

Our evaluation shows that the technique generates sparse tensor conversion routines with performance between 1.00 and 2.01× that of hand-optimized versions in SPARSKIT and Intel MKL, two popular sparse linear algebra libraries. And by emitting code that avoids materializing temporaries, which both libraries need for many combinations of source and target formats, our technique outperforms those libraries by 1.78 to 4.01× for CSC/COO to DIA/ELL conversion.

References

  1. Christopher R. Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. EmptyHeaded: A Relational Engine for Graph Processing. ACM Trans. Database Syst. 42, 4, Article 20 (Oct. 2017), 44 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2014. Tensor Decompositions for Learning Latent Variable Models. J. Mach. Learn. Res. 15, Article 1 (Jan. 2014), 60 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gilad Arnold. 2011. Data-Parallel Language for Correct and Efficient Sparse Matrix Codes. Ph.D. Dissertation. University of California, Berkeley.Google ScholarGoogle Scholar
  4. Gilad Arnold, Johannes Hölzl, Ali Sinan Köksal, Rastislav Bodík, and Mooly Sagiv. 2010. Specifying and Verifying Sparse Matrix Codes. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (Baltimore, Maryland, USA) (ICFP ’10). ACM, New York, NY, USA, 249–260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arash Ashari, Naser Sedaghati, John Eisenlohr, and P. Sadayappan. 2014. An Efficient Two-dimensional Blocking Strategy for Sparse Matrix-vector Multiplication on GPUs. In Proceedings of the 28th ACM International Conference on Supercomputing (Munich, Germany) (ICS ’14). ACM, New York, NY, USA, 273–282. 2597652.2597678 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brett W. Bader, Michael W. Berry, and Murray Browne. 2008. Discussion Tracking in Enron Email Using PARAFAC. Springer London, 147–163.Google ScholarGoogle Scholar
  7. Brett W Bader and Tamara G Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205–231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Baskaran, B. Meister, N. Vasilache, and R. Lethin. 2012. Efficient and scalable computations with sparse tensors. In 2012 IEEE Conference on High Performance Extreme Computing. 1–6. HPEC.2012.6408676 Google ScholarGoogle ScholarCross RefCross Ref
  9. Nathan Bell and Michael Garland. 2008. Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004. NVIDIA Corporation.Google ScholarGoogle Scholar
  10. Nathan Bell and Michael Garland. 2009. Implementing Sparse Matrixvector Multiplication on Throughput-oriented Processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (Portland, Oregon) (SC ’09). ACM, New York, NY, USA, Article 18, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aart JC Bik and Harry AG Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. ACM, 416–424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Aart JC Bik and Harry AG Wijshoff. 1994. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing. Springer, 57–75.Google ScholarGoogle Scholar
  13. Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrixtranspose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. ACM, 233–244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In IEEE International Symposium on Parallel and Distributed Processing, (IPDPS). 1–11.Google ScholarGoogle Scholar
  15. Frank Cameron. 1993. Two space-saving algorithms for computing the permuted transpose of a sparse matrix. Advances in Engineering Software 17, 1 (Jan. 1993), 49–60. 90041-Q Google ScholarGoogle ScholarCross RefCross Ref
  16. Hanfeng Chen, Joseph Vinish D’silva, Hongji Chen, Bettina Kemme, and Laurie Hendren. 2018. HorseIR: Bringing Array Programming Languages Together with Database Query Processing. In Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages (Boston, MA, USA) (DLS 2018). ACM, New York, NY, USA, 37–49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang. 2, OOPSLA, Article 123 (Oct. 2018), 30 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eduardo F. D’Azevedo, Mark R. Fahey, and Richard T. Mills. 2005. Vectorized Sparse Matrix Multiply for Compressed Row Storage Format. In Proceedings of the 5th International Conference on Computational Science - Volume Part I (Atlanta, GA) (ICCS’05). Springer-Verlag, Berlin, Heidelberg, 99–106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Elafrou, G. Goumas, and N. Koziris. 2017. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors. In 2017 46th International Conference on Parallel Processing (ICPP). 292–301.Google ScholarGoogle Scholar
  21. Miguel A. Gonzalez-Mesa, Eladio D. Gutierrez, and Oscar Plata. 2013. Parallelizing the Sparse Matrix Transposition: Reducing the Programmer Effort Using Transactional Memory. Procedia Computer Science 18 (2013), 501 – 510. 2013 International Conference on Computational Science.Google ScholarGoogle ScholarCross RefCross Ref
  22. Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw. 4, 3 (Sept. 1978), 250–269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Eun-jin Im and Katherine Yelick. 1998. Model-Based Memory Hierarchy Optimizations for Sparse Matrices. In In Workshop on Profile and Feedback-Directed Compilation.Google ScholarGoogle Scholar
  24. Intel. 2020. Intel Math Kernel Library Developer Reference. https://software.intel.com/sites/default/files/mkl-2020-developerreference-c.pdf.pdfGoogle ScholarGoogle Scholar
  25. Yuanlin Jiang. 2007. Techniques for Modeling Complex Reservoirs and Advanced Wells. Ph.D. Dissertation. Stanford University.Google ScholarGoogle Scholar
  26. Jun Rao, H. Pirahesh, C. Mohan, and G. Lohman. 2006. Compiled Query Execution Engine using JVM. In 22nd International Conference on Data Engineering (ICDE’06). 23–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. David R. Kincaid, Thomas C. Oppe, and David M. Young. 1989. ITPACKV 2D User’s Guide.Google ScholarGoogle Scholar
  28. Fredrik Kjolstad. 2020. Sparse Tensor Algebra Compilation. Ph.D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  29. Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. (2019), 180–192. http://dl.acm.org/citation.cfm?id=3314872.3314894Google ScholarGoogle Scholar
  30. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang. 1, OOPSLA, Article 77 (Oct. 2017), 29 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Ph.D. Dissertation. Cornell University.Google ScholarGoogle Scholar
  32. Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par’97 Parallel Processing. Springer, 318–327.Google ScholarGoogle Scholar
  33. K. Krikellas, S. D. Viglas, and M. Cintra. 2010. Generating code for holistic query evaluation. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). 613–624. ICDE.2010.5447892 Google ScholarGoogle ScholarCross RefCross Ref
  34. Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical Storage of Sparse Tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (Dallas, Texas) (SC ’18). IEEE Press, Piscataway, NJ, USA, Article 19, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Liu, C. Wen, A. D. Sarwate, and M. M. Dehnavi. 2017. A Unified Optimization Approach for Sparse Tensor Operations on GPUs. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 47–57. Google ScholarGoogle ScholarCross RefCross Ref
  36. Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (Newport Beach, California, USA) (ICS ’15). ACM, New York, NY, USA, PLDI ’20, June 15–20, 2020, London, UK Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe 339–350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In High Performance Embedded Architectures and Compilers, Yale N. Patt, Pierfrancesco Foglia, Evelyn Duesterwald, Paolo Faraboschi, and Xavier Martorell (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 111–125.Google ScholarGoogle Scholar
  38. Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. Technical report.Google ScholarGoogle Scholar
  39. Payal Nandy, Mary Hall, Eddie C. Davis, Catherine Mills Olschanowsky, Mahdi Soltan Mohammadi, Wei He, and Michelle Mills Strout. 2018.Google ScholarGoogle Scholar
  40. Abstractions for Specifying Sparse Matrix Data Transformations. In Proceedings of Eighth InternationalWorkshop on Polyhedral Compilation Techniques (Manchester, United Kingdom) (IMPACT 2018).Google ScholarGoogle Scholar
  41. Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 (June 2011), 539–550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2016. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. arXiv: cs.CV/1608.01409Google ScholarGoogle Scholar
  43. Andrés Peratta and Viktor Popov. 2006. A new scheme for numerical modelling of flow and transport processes in 3D fractured porous media. Advances in Water Resources 29, 1 (2006), 42 – 61. Google ScholarGoogle ScholarCross RefCross Ref
  44. Hamid Pirahesh, Joseph M. Hellerstein, and Waqar Hasan. 1992. Extensible/Rule Based Query Rewrite Optimization in Starburst. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data (San Diego, California, USA) (SIGMOD ’92). ACM, New York, NY, USA, 39–48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hamid Pirahesh, T. Y. Cliff Leung, and Waqar Hasan. 1997. A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS. In Proceedings of the Thirteenth International Conference on Data Engineering (ICDE ’97). IEEE Computer Society, Washington, DC, USA, 391–400. http://dl.acm.org/citation.cfm?id=645482.653436Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. William Pugh and Tatiana Shpeisman. 1999. SIPR: A new framework for generating efficient code for sparse matrix computations. In Languages and Compilers for Parallel Computing. Springer, 213–229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, and Trishul Chilimbi. 2017. Optimizing CNNs on Multicores for Scalability, Performance and Goodput. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China) (ASPLOS ’17). ACM, New York, NY, USA, 267–280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Youcef Saad. 1989. Krylov Subspace Methods on Supercomputers. SIAM J. Sci. Stat. Comput. 10, 6 (Nov. 1989), 1200–1232.Google ScholarGoogle ScholarCross RefCross Ref
  49. org/10.1137/0910073 Google ScholarGoogle ScholarCross RefCross Ref
  50. Youcef Saad. 1994. SPARSKIT: a basic tool kit for sparse matrix computations - Version 2.Google ScholarGoogle Scholar
  51. Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM.Google ScholarGoogle Scholar
  52. Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. ACM, 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Paul Stodghill. 1997. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. Ph.D. Dissertation. Cornell University.Google ScholarGoogle Scholar
  54. Michelle Mills Strout, Alan LaMielle, Larry Carter, Jeanne Ferrante, Barbara Kreaseck, and Catherine Olschanowsky. 2016. An approach for code generation in the Sparse Polyhedral Framework. Parallel Comput. 53 (2016), 32 – 57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Bor-Yiing Su and Kurt Keutzer. 2012. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (San Servolo Island, Venice, Italy) (ICS ’12). ACM, New York, NY, USA, 353–364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. The SciPy community. 2018. scipy.sparse.dok_matrix – SciPy v1.1.0 Reference Guide. https://docs.scipy.org/doc/scipy/reference/ generated/scipy.sparse.dok_matrix.html.Google ScholarGoogle Scholar
  57. William F Tinney and John W Walker. 1967. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proc. IEEE 55, 11 (1967), 1801–1809.Google ScholarGoogle ScholarCross RefCross Ref
  58. Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (Portland, OR, USA) (PLDI 2015). 521–532.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Hao Wang, Weifeng Liu, Kaixi Hou, and Wu-chun Feng. 2016. Parallel Transposition of Sparse Data Structures. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS âĂŹ16). Association for Computing Machinery, New York, NY, USA, Article 33, 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Tien-Hsiung Weng, Delgerdalai Batjargal, Hoa Pham, Meng-Yen Hsieh, and Kuan-Ching Li. 2013. Parallel Matrix Transposition and Vector Multiplication Using OpenMP. In Intelligent Technologies and Engineering Systems (Lecture Notes in Electrical Engineering), Jengnan Juang and Yi-Cheng Huang (Eds.). Springer, New York, NY, 243–249. Google ScholarGoogle ScholarCross RefCross Ref
  61. Tien-Hsiung Weng, Hoa Pham, Hai Jiang, and Kuan-Ching Li. 2013. Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs. In Intelligent Technologies and Engineering Systems (Lecture Notes in Electrical Engineering), Jengnan Juang and Yi-Cheng Huang (Eds.). Springer, New York, NY, 251–257. 1-4614-6747-2_31 Google ScholarGoogle ScholarCross RefCross Ref
  62. Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: Efficient Vectorization of SpMV on x86 Processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (Vienna, Austria) (CGO 2018). ACM, New York, NY, USA, 149–162. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic generation of efficient sparse tensor format conversion routines

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader