skip to main content
research-article
Public Access

Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis

Published:26 January 2017Publication History
Skip Abstract Section

Abstract

The four-index integral transform is a fundamental and computationally demanding calculation used in many computational chemistry suites such as NWChem. It transforms a four-dimensional tensor from one basis to another. This transformation is most efficiently implemented as a sequence of four tensor contractions that each contract a four- dimensional tensor with a two-dimensional transformation matrix. Differing degrees of permutation symmetry in the intermediate and final tensors in the sequence of contractions cause intermediate tensors to be much larger than the final tensor and limit the number of electronic states in the modeled systems.

Loop fusion, in conjunction with tiling, can be very effective in reducing the total space requirement, as well as data movement. However, the large number of possible choices for loop fusion and tiling, and data/computation distribution across a parallel system, make it challenging to develop an optimized parallel implementation for the four-index integral transform. We develop a novel approach to address this problem, using lower bounds modeling of data movement complexity. We establish relationships between available aggregate physical memory in a parallel computer system and ineffective fusion configurations, enabling their pruning and consequent identification of effective choices and a characterization of optimality criteria. This work has resulted in the development of a significantly improved implementation of the four-index transform that enables higher performance and the ability to model larger electronic systems than the current implementation in the NWChem quantum chemistry software suite.

References

  1. ACES II, a program product of the quantum theory project. See http://www.qtp.ufl.edu/aces/, 1996.Google ScholarGoogle Scholar
  2. The massively parallel quantum chemistry program (MPQC). http://www.mpqc.org/index.php, 2004.Google ScholarGoogle Scholar
  3. MOLPRO, a package of ab initio programs. See http://www.molpro. net, 2006.Google ScholarGoogle Scholar
  4. Nwchem: A comprehensive and scalable open-source solution for large scale molecular simulations. See http://www.nwchem-sw. org/index.php, 2010.Google ScholarGoogle Scholar
  5. Psi4, an open-source ab initio electronic structure program. See http://www.psicode.org/, 2012.Google ScholarGoogle Scholar
  6. M. Abe, T. Yanai, T. Nakajima, and K. Hirao. A four-index transformation in dirac's four-component relativistic theory. Chem. Phys. Letters, 388 (1-3): 68--73, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  7. G. Bilardi and E. Peserico. A characterization of temporal locality and its portability across memory hierarchies. Automata, Languages and Programming, pages 128--139, 2001. Google ScholarGoogle ScholarCross RefCross Ref
  8. L. A. Covick and K. M. Sando. Four-index transformation on distributed-memory parallel computers. J. Comp. Chem., 11 (10): 1151--1159, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dongarra, J.-F. Pineau, Y. Robert, and F. Vivien. Matrix product on heterogeneous master-worker platforms. In PPoPP, pages 53--62, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Fletcher, M. Schmidt, and M. Gordon. Developments in parallel electronic structure theory. Adv. Chem. Phys., 110: 267--294, 1999. Google ScholarGoogle ScholarCross RefCross Ref
  11. T. R. Furlani and H. F. King. Implementation of a parallel direct scf algorithm on distributed memory computers. J. Comp. Chem., 16 (1): 91--104, 1995. Google ScholarGoogle ScholarCross RefCross Ref
  12. X. Gao, S. Krishnamoorthy, S. K. Sahoo, C. Lam, G. Baumgartner, J. Ramanujam, and P. Sadayappan. Efficient search-space pruning for integrated fusion and tiling transformations. CCPE, 19 (18): 2425--2443, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  13. J.-W. Hong and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC, pages 326--333, 1981.Google ScholarGoogle Scholar
  14. D. Irony, S. Toledo, and A. Tiskin. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput., 64 (9): 1017--1026, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Lam, T. Rauber, G. Baumgartner, D. Cociorva, and P. Sadayappan. Memory-optimal evaluation of expression trees involving large objects. Comp. Lang. Sys. Struc., 37 (2): 63--75.Google ScholarGoogle Scholar
  16. A. C. Limaye and S. R. Gadre. A general parallel solution to the integral transformation and second-order Møller-Plesset energy evaluation on distributed memory parallel machines. J. Chem. Phys., 100 (2): 1303--1307, 1994. Google ScholarGoogle ScholarCross RefCross Ref
  17. W. Ma, S. Krishnamoorthy, and G. Agrawal. Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies. In CC 2011, pages 266--285, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  18. ga1J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease, and E. Aprà. Advances, applications and performance of the global arrays shared memory programming toolkit. Int. J. High Perform. Comput. Appl., 20 (2): 203--231, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Pernpointner, L. Visscher, W. A. de Jong, and R. Broer. Parallelization of four-component calculations. i. integral generation, SCF, and four-index transformation in the Dirac-Fock package MOLFDIR. J. Comp. Chem., 21 (13): 1176--1186.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. Rauhut, P. Pulay, and H.-J. Werner. Integral transformation with low-order scaling for large local second-order Møller-Plesset calculations. J. Comp. Chem., 19 (11): 1241--1254.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Sæbø and J. Almlöf. Avoiding the integral storage bottleneck in LCAO calculations of electron correlation. Chem. Phys. Let., 154 (1): 83 -- 89, 1989. Google ScholarGoogle ScholarCross RefCross Ref
  22. S. K. Sahoo, S. Krishnamoorthy, R. Panuganti, and P. Sadayappan. Integrated loop optimizations for data locality enhancement of tensor contraction expressions. In SC 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, et al. General atomic and molecular electronic structure system. J. Comp. Chem., 14 (11): 1347--1363, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. A. Whiteside, J. S. Binkley, M. E. Colvin, and H. F. Schaefer III. Parallel algorithms for quantum chemistry. i. integral transformations on a hypercube multiprocessor. J. Chem. Phys., 86 (4): 2185--2193, 1987. Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Wilson. Four-index transformations. In Methods in Computational Chemistry, pages 251--309. Springer, 1987. Google ScholarGoogle ScholarCross RefCross Ref
  26. T. L. Windus, M. W. Schmidt, and M. S. Gordon. Parallel algorithm for integral transformations and GUGA MCSCF. Theoretica chimica acta, 89 (1): 77--88, 1994. Google ScholarGoogle ScholarCross RefCross Ref
  27. A. T. Wong, R. J. Harrison, and A. P. Rendell. Parallel direct four-index transformations. Th. Chim. Acta, 93 (6): 317--331.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader