Skip to main content

Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi™ Processor

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10524))

Included in the following conference series:

Abstract

Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP to exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Measuring arithmetic intensity, http://www.nersc.gov/users/application-performance/measuring-arithmetic-intensity/. Accessed 22 Oct 2016

  2. Aprà, E., Bylaska, E.J., Dean, D.J., Fortunelli, A., Gao, F., Krstić, P.S., Wells, J.C., Windus, T.L.: NWChem for materials science. Comput. Mater. Sci. 28(2), 209–221 (2003)

    Article  Google Scholar 

  3. Ayala, O., Wang, L.P.: Parallel implementation and scalability analysis of 3D fast fourier transform using 2D domain decomposition. Parallel Comput. 39(1), 58–77 (2013). http://www.sciencedirect.com/science/article/pii/S0167819112000932

  4. Bylaska, E., Tsemekhman, K., Govind, N., Valiev, M.: Large-scale plane-wave-based density functional theory: formalism, parallelization, and applications. In: Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, pp. 77–116 (2011)

    Google Scholar 

  5. Bylaska, E.J., Glass, K., Baxter, D., Baden, S.B., Weare, J.H.: Hard scaling challenges for ab initio molecular dynamics capabilities in nwchem: using 100,000 CPUs per second. In: Journal of Physics: Conference Series, vol. 180, p. 012028. IOP Publishing (2009)

    Google Scholar 

  6. Bylaska, E.J., Valiev, M., Kawai, R., Weare, J.H.: Parallel implementation of the projector augmented plane wave method for charged systems. Comput. Phys. Commun. 143(1), 11–28 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  7. Canning, A., Raczkowski, D.: Scaling first-principles plane-wave codes to thousands of processors. Comput. Phys. Commun. 169(1), 449–453 (2005)

    Article  Google Scholar 

  8. Canning, A., Shalf, J., Wang, L.W., Wasserman, H., Gajbe, M.: A comparison of different communication structures for scalable parallel three dimensional FFTs in first principle codes. In: Chapman, B., Desprez, F., Joubert, G.R., et al. (eds.), pp. 107–116 (2010)

    Google Scholar 

  9. Car, R., Parrinello, M.: Unified approach for molecular dynamics and density-functional theory. Phys. Rev. Lett. 55(22), 2471 (1985)

    Article  Google Scholar 

  10. Chen, Y., Bylaska, E., Weare, J.: First principles estimation of geochemically important transition metal oxide properties. In: Molecular Modeling of Geochemical Reactions: An Introduction, p. 107 (2016)

    Google Scholar 

  11. Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: OpenMP programming on Intel Xeon Phi Coprocessors: an early performance comparison. In: Proceedings of Many Core Applications Research Community (MARC) Symposium, pp. 38–44 (2012)

    Google Scholar 

  12. Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Computat. Sci. Eng. 5(1), 46–55 (1998)

    Article  Google Scholar 

  13. Fattebert, J.L., Osei-Kuffuor, D., Draeger, E.W., Ogitsu, T., Krauss, W.D.: Modeling dilute solutions using first-principles molecular dynamics: computing more than a million atoms with over a million cores. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2016, pp. 12–22. IEEE (2016)

    Google Scholar 

  14. Gygi, F.: Architecture of Qbox: A scalable first-principles molecular dynamics code. IBM J. Res. Develop. 52(1.2), 137–144 (2008)

    Google Scholar 

  15. Jacquelin, M., De Jong, W., Bylaska, E.: Towards highly scalable Ab initio molecular dynamics (AIMD) simulations on the Intel knights landing manycore processor. In: 31st IEEE International Parallel & Distributed Processing Symposium. IEEE Computer Society (2017, Accepted)

    Google Scholar 

  16. de Jong, W.A., Bylaska, E., Govind, N., Janssen, C.L., Kowalski, K., Müller, T., Nielsen, I.M., van Dam, H.J., Veryazov, V., Lindh, R.: Utilizing high performance computing for chemistry: parallel computational chemistry. Phys. Chem. Chem. Phys. 12(26), 6896–6920 (2010)

    Article  Google Scholar 

  17. Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. SIGARCH Comput. Archit. News 36(3), 77–88 (2008). http://doi.acm.org/10.1145/1394608.1382129

  18. Kohn, W., Sham, L.J.: Self-consistent equations including exchange and correlation effects. Phys. Rev. 140(4A), A1133 (1965)

    Article  MathSciNet  Google Scholar 

  19. Lancaster, P., Rodman, L.: Algebraic Riccati Equations. Clarendon Press, Oxford (1995)

    MATH  Google Scholar 

  20. Marx, D., Hutter, J.: Modern methods and algorithms of quantum chemistry. Grotendorst, J. (ed.), pp. 301–449 (2000)

    Google Scholar 

  21. MPI Forum: MPI: A Message-passing Interface Standard. Tech. rep., June 2015

    Google Scholar 

  22. Nelson, J., Plimpton, S., Sears, M.: Plane-wave electronic-structure calculations on a parallel supercomputer. Phys. Rev. B 47(4), 1765 (1993)

    Article  Google Scholar 

  23. OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 4.5, November 2015. http://www.openmp.org/

  24. Parr, R.G.: Density functional theory of atoms and molecules. In: Fukui, K., Pullman, B. (eds.) Horizons of Quantum Chemistry. Académie Internationale Des Sciences Moléculaires Quantiques/International Academy of Quantum Molecular Science, vol. 3, pp. 5–15. Springer, Dordrecht (1980). doi:10.1007/978-94-009-9027-2_2

  25. Payne, M.C., Teter, M.P., Allan, D.C., Arias, T., Joannopoulos, J.: Iterative minimization techniques for ab initio total-energy calculations: molecular dynamics and conjugate gradients. Rev. Mod. Phys. 64(4), 1045 (1992)

    Article  Google Scholar 

  26. Polian, A., Loubeyre, P., Boccara, N.: Simple molecular systems at very high density. In: NATO Advanced Science Institutes (ASI) Series B, vol. 186 (1989)

    Google Scholar 

  27. Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 427–436. IEEE (2009)

    Google Scholar 

  28. Remler, D.K., Madden, P.A.: Molecular dynamics without effective potentials via the car-parrinello approach. Mol. Phys. 70(6), 921–966 (1990)

    Article  Google Scholar 

  29. Sodani, A.: Knights landing (KNL): 2nd Generation Intel\(^{\textregistered }\) Xeon Phi Processor. In: Presentation at Hot Chips: A Symposium on High Performance Chips, August 2015

    Google Scholar 

  30. Swarztrauber, P.: Fftpack: a package of fortran subprograms for the fast fourier transform of periodic and other symmetric sequences. Obtainable by e-mail or by ftp from nctlib@ornl.gov (1985)

    Google Scholar 

  31. Van De Geijn, R.A., Watts, J.: Summa: scalable universal matrix multiplication algorithm. Concurrency-Pract. Exp. 9(4), 255–274 (1997)

    Article  Google Scholar 

  32. Wiggs, J., Jonsson, H.: A hybrid decomposition parallel implementation of the car-parrinello method. Comput. Phys. Commun. 87(3), 319–340 (1995)

    Article  Google Scholar 

  33. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). http://doi.acm.org/10.1145/1498765.1498785

Download references

Acknowledgment

This work was supported by the NWChem project in the William R. Wiley Environmental Molecular Sciences Laboratory (EMSL), the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research ECP program (NWChemEx project), and E.J.B was also supported by the the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division at PNNL, DE-AC06-76RLO 1830. EMSL operations are supported by the DOE’s Office of Biological and Environmental Research. M.J. and W.A.D. were partially supported by the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences. In particular, M.J. was supported by the FASTMath SciDAC institute. We wish to thank the Scientific Computing Staff, Office of Energy Research, and the U. S. Department of Energy for support through the NERSC NESAP program the National Energy Research Scientific Computing Center (Berkeley, CA). This work was also supported by Intel as part of its Intel Parallel Computing Centers effort. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

Intel, Xeon, and Xeon Phi are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

* Other names and brands are the property of their respective owners.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric J. Bylaska .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bylaska, E.J., Jacquelin, M., de Jong, W.A., Hammond, J.R., Klemm, M. (2017). Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi™ Processor. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67630-2_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67629-6

  • Online ISBN: 978-3-319-67630-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics