ABSTRACT
The widespread use of tensor operations in describing electronic structure calculations has motivated the design of software frameworks for productive development of scalable optimized tensor-based electronic structure methods. Whereas prior work focused on Cartesian abstractions for dense tensors, we present an algebra to specify and perform tensor operations on a larger class of block-sparse tensors. We illustrate the use of this framework in expressing real-world computational chemistry calculations beyond the reach of existing frameworks.
Supplemental Material
- G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R. J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi-chung Lam, Qingda Lu, M. Nooijen, R. M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov. 2005. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models. Proc. IEEE 93, 2 (Feb 2005), 276–292.Google ScholarCross Ref
- Justus A. Calvin and Edward F. Valeev. 2019. TiledArray: A generalpurpose scalable block-sparse tensor framework. https://github.com/ valeevgroup/tiledarray .Google Scholar
- Evgeny Epifanovsky, Michael Wormit, Tomasz Kuś, Arie Landau, Dmitry Zuev, Kirill Khistyaev, Prashant Manohar, Ilya Kaliman, Andreas Dreuw, and Anna I. Krylov. 2013. New implementation of highlevel correlated methods using a general block tensor library for highperformance electronic structure calculations. Journal of Computational Chemistry 34, 26 (2013), 2293–2309.Google ScholarCross Ref
- John A. Gunnels, Fred G. Gustavson, Greg M. Henry, and Robert A. van de Geijn. 2001. FLAME: Formal Linear Algebra Methods Environment. ACM Trans. Math. Softw. 27, 4 (Dec. 2001), 422–455. Google ScholarDigital Library
- R. Harrison, G. Beylkin, F. Bischoff, J. Calvin, G. Fann, J. FossoTande, D. Galindo, J. Hammond, R. Hartman-Baker, J. Hill, J. Jia, J. Kottmann, M. Yvonne Ou, J. Pei, L. Ratcliff, M. Reuter, A. RichieHalford, N. Romero, H. Sekino, W. Shelton, B. Sundahl, W. Thornton, E. Valeev, Á. Vázquez-Mayagoitia, N. Vence, T. Yanai, and Y. Yokoi. 2016. MADNESS: A Multiresolution, Adaptive Numerical Environment for Scientific Simulation. SIAM Journal on Scientific Computing 38, 5 (2016), S123–S142.Google ScholarDigital Library
- So Hirata. 2003. Tensor Contraction Engine: Abstraction and Automated Parallel Implementation of Configuration-Interaction, CoupledCluster, and Many-Body Perturbation Theories. The Journal of Physical Chemistry A 107, 46 (2003), 9887–9897.Google ScholarCross Ref
- Antti-Pekka Hynninen and Dmitry I. Lyakh. 2017. cuTT: A HighPerformance Tensor Transpose Library for CUDA Compatible GPUs. CoRR abs/1705.01598 (2017). arXiv: 1705.01598 http://arxiv.org/abs/ 1705.01598Google Scholar
- Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman P. Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019, Mahmut Taylan Kandemir, Alexandra Jimborean, and Tipp Moseley (Eds.). IEEE, 180–192.Google Scholar
- Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang. 1, OOPSLA, Article 77 (Oct. 2017), 29 pages. Google ScholarDigital Library
- Christoph Köppl and Hans-Joachim Werner. 2016. Parallel and LowOrder Scaling Implementation of Hartree–Fock Exchange Using Local Density Fitting. Journal of Chemical Theory and Computation 12, 7 (2016), 3122–3134.Google ScholarCross Ref
- Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical Storage of Sparse Tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’18). ACM, New York, NY, USA, 19:1–19:15. http://dl.acm.org/ citation.cfm?id=3291682 Google ScholarDigital Library
- B. Liu, C. Wen, A. D. Sarwate, and M. M. Dehnavi. 2017. A Unified Optimization Approach for Sparse Tensor Operations on GPUs. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 47–57.Google Scholar
- Dmitry I. Lyakh. 2015. An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU". Computer Physics Communications 189 (2015), 84 – 91.Google ScholarCross Ref
- Yuchen Ma, Jiajia Li, Xiaolong Wu, Chenggang Yan, Jimeng Sun, and Richard Vuduc. 2018. Optimizing sparse tensor times matrix on GPUs. J. Parallel and Distrib. Comput. (2018).Google Scholar
- Samuel Manzer, Evgeny Epifanovsky, Anna I. Krylov, and Martin Head-Gordon. 2017. A General Sparse Tensor Framework for Electronic Structure Theory. Journal of Chemical Theory and Computation 13, 3 (2017), 1108–1116.Google ScholarCross Ref
- Jack Poulson, Bryan Marker, Robert A. van de Geijn, Jeff R. Hammond, and Nichols A. Romero. 2013. Elemental: A New Framework for Distributed Memory Dense Matrix Computations. ACM Trans. Math. Softw. 39, 2, Article 13 (Feb. 2013), 24 pages. Google ScholarDigital Library
- Krishnan Raghavachari, Gary Trucks, John A. Pople, and Martin HeadGordon. 1989. A Fifth-Order Perturbation Comparison of Electron Correlation Theories. Chemical Physics Letters 157 (05 1989), 479–483.Google Scholar
- Christoph Riplinger, Peter Pinski, Ute Becker, Edward F. Valeev, and Frank Neese. 2016. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory. The Journal of Chemical Physics 144 (01 2016), 024109.Google Scholar
- M. Schatz, R. van de Geijn, and J. Poulson. 2016. Parallel Matrix Multiplication: A Systematic Journey. SIAM Journal on Scientific Computing 38, 6 (2016), C748–C781.Google ScholarDigital Library
- Martin D. Schatz, Tze Meng Low, Robert A. van de Geijn, and Tamara G. Kolda. 2014. Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors. SIAM J. Scientific Computing 36, 5 (2014).Google ScholarCross Ref
- Shaden Smith, Jongsoo Park, and George Karypis. 2017. Sparse Tensor Factorization on Many-Core Processors with High-BandwidthMemory. (2017), 1058–1067.Google Scholar
- Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse TensorMatrix Multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015, Hyderabad, India, May 25-29, 2015. 61–70.Google Scholar
- Edgar Solomonik and Torsten Hoefler. 2015. Sparse Tensor Algebra as a Parallel Programming Model. arXiv e-prints, Article arXiv:1512.00066 (Nov 2015), arXiv:1512.00066 pages. arXiv: cs.MS/1512.00066Google Scholar
- Edgar Solomonik, Devin Matthews, Jeff R. Hammond, John F. Stanton, and James Demmel. 2014. A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel and Distrib. Comput. 74, 12 (2014), 3176 – 3190. Google ScholarDigital Library
- Paul Springer, Jeff R. Hammond, and Paolo Bientinesi. 2017. TTC: A High-Performance Compiler for Tensor Transpositions. ACM Trans. Math. Softw. 44, 2, Article 15 (Aug. 2017), 21 pages. Google ScholarDigital Library
- Paul Springer, Aravind Sankaran, and Paolo Bientinesi. 2016. TTC: A Tensor Transposition Compiler for Multiple Architectures. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2016). ACM, New York, NY, USA, 41–46.Google ScholarDigital Library
- Paul Springer, Tong Su, and Paolo Bientinesi. 2017. HPTT: A HighPerformance Tensor Transposition C++ Library. (2017), 56–62. Google ScholarDigital Library
- Edward F. Valeev. 2019. Libint - a library for the evaluation of molecular integrals of many-body operators over Gaussian functions. https: //github.com/evaleev/libint .Google Scholar
- Marat Valiev, Eric J. Bylaska, Niranjan Govind, Karol Kowalski, Tjerk P. Straatsma, H. J. J. Van Dam, D. Wang, Jarek Nieplocha, Edoardo Aprà, Theresa L. Windus, and Wibe A. de Jong. 2010. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications 181, 9 (2010), 1477– 1489.Google ScholarCross Ref
- Jyothi Vedurada, Arjun Suresh, Aravind Sukumaran-Rajam, Jinsung Kim, Changwan Hong, Ajay Panyala, Sriram Krishnamoorthy, V. Krishna Nandivada, Rohit Kumar Srivastava, and P. Sadayappan. 2018. TTLG - An Efficient Tensor Transposition Library for GPUs. In 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Vancouver, BC, Canada, May 21-25, 2018. 578–588.Google Scholar
Index Terms
- Toward generalized tensor algebra for ab initio quantum chemistry methods
Recommendations
The tensor algebra compiler
Tensor algebra is a powerful tool with applications in machine learning, data analytics, engineering and the physical sciences. Tensors are often sparse and compound operations must frequently be computed in a single kernel for performance and to save ...
Compiling Structured Tensor Algebra
Tensor algebra is essential for data-intensive workloads in various computational domains. Computational scientists face a trade-off between the specialization degree provided by dense tensor algebra and the algorithmic efficiency that leverages the ...
taco: a tool to generate tensor algebra kernels
ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software EngineeringTensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, the number of tensor expressions is unbounded, which makes it hard to develop and ...
Comments