skip to main content
10.1145/2063384.2063478acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Scaling lattice QCD beyond 100 GPUs

Published:12 November 2011Publication History

ABSTRACT

Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo "analysis" phase which accounts for a substantial fraction of the workload in a typical LQCD calculation, the initial Monte Carlo "gauge field generation" phase requires capability-level supercomputing, corresponding to O(100) GPUs or more. Such strong scaling has not been previously achieved. In this contribution, we demonstrate that using a multi-dimensional parallelization strategy and a domain-decomposed preconditioner allows us to scale into this regime. We present results for two popular discretizations of the Dirac operator, Wilson-clover and improved staggered, employing up to 256 GPUs on the Edge cluster at Lawrence Livermore National Laboratory.

References

  1. http://en.wikipedia.org/wiki/Grand_Challenge, 2011.Google ScholarGoogle Scholar
  2. http://www.research.ibm.com/bluegene/BG_External_Presentation_January_2002.pdf, 2002.Google ScholarGoogle Scholar
  3. M. A. Clark, R. Babich, K. Barros, R. C. Brower, and C. Rebbi, "Solving Lattice QCD systems of equations using mixed precision solvers on GPUs," Comput. Phys. Commun. 181 (2010) 1517--1528, arXiv:0911.3191 {hep-lat}.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Babich, M. A. Clark, and B. Joó, "Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pp. 1--11. IEEE Computer Society, Washington, DC, USA, 2010. arXiv:1011.0024 {hep-lat}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Gottlieb, G. Shi, A. Torok, and V. Kindratenko, "QUDA programming for staggered quarks," PoS LATTICE2010 (2010) 026.Google ScholarGoogle Scholar
  6. G. Shi, S. Gottlieb, A. Torok, and V. V. Kindratenko, "Design of MILC lattice QCD application for GPU clusters," in IPDPS. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Sheikholeslami and R. Wohlert, "Improved Continuum Limit Lattice Action for QCD with Wilson Fermions," Nucl. Phys. B259 (1985) 572.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Bazavov, D. Toussaint, C. Bernard, J. Laiho, C. DeTar, L. Levkova, M. B. Oktay, S. Gottlieb, U. M. Heller, J. E. Hetrick, P. B. Mackenzie, R. Sugar, and R. S. Van de Water, "Nonperturbative QCD simulations with 2 + 1 flavors of improved staggered quarks," Rev. Mod. Phys. 82 no. 2, (May, 2010) 1349--1417.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. R. Hestenes and E. Stiefel, "Methods of Conjugate Gradients for Solving Linear Systems," Journal of Research of the National Bureau of Standards 49 no. 6, (Dec., 1952) 409--436.Google ScholarGoogle ScholarCross RefCross Ref
  10. H. A. van der Vorst, "Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems," SIAM Journal on Scientific and Statistical Computing 13 no. 2, (1992) 631--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. A. Degrand and P. Rossi, "Conditioning techniques for dynamical fermions," Computer Physics Communications 60 no. 2, (1990) 211--214.Google ScholarGoogle ScholarCross RefCross Ref
  12. B. Jegerlehner, "Krylov space solvers for shifted linear systems," arXiv:hep-lat/9612014.Google ScholarGoogle Scholar
  13. H. A. Schwarz, "Über einen Grenzübergang durch alternierendes Verfahren," Vierteljahrsschrift der Naturforschenden Gesellschaft in Zürich 15 (1870) 272--286.Google ScholarGoogle Scholar
  14. G. I. Egri, Z. Fodor, C. Hoelbling, S. D. Katz, D. Nógrádi, and K. K. Szabó, "Lattice QCD as a video game," Computer Physics Communications 177 no. 8, (2007) 631--639, arXiv:0611022 {hep-lat}.Google ScholarGoogle ScholarCross RefCross Ref
  15. M. A. Clark, "QCD on GPUs: cost effective supercomputing," PoS LATTICE2009 (2009) 003.Google ScholarGoogle Scholar
  16. A. Alexandru, C. Pelissier, B. Gamari, and F. Lee, "Multi-mass solvers for lattice QCD on GPUs," arXiv:1103.5103 {hep-lat}.Google ScholarGoogle Scholar
  17. TWQCD Collaboration, T.-W. Chiu, T.-H. Hsieh, Y.-Y. Mao, and K. Ogawa, "GPU-Based Conjugate Gradient Solver for Lattice QCD with Domain-Wall Fermions," PoS LATTICE2010 (2010) 030, arXiv:1101.0423 {hep-lat}.Google ScholarGoogle Scholar
  18. A. Alexandru, M. Lujan, C. Pelissier, B. Gamari, and F. X. Lee, "Efficient implementation of the overlap operator on multi- GPUs," arXiv:1106.4964 {hep-lat}. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Borsáni, "Thermodynamics from accelerated architectures." http://crunch.ikp.physik.tu-darmstadt.de/gpu2011/Talks/Borsanyi_Darmstadt_GPU.pdf, 2011.Google ScholarGoogle Scholar
  20. M. Luscher, "Solution of the Dirac equation in lattice QCD using a domain decomposition method," Comput.Phys.Commun. 156 (2004) 209--220, arXiv:hep-lat/0310048 {hep-lat}.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Osaki and K.-I. Ishikawa, "Domain Decomposition method on GPU cluster," PoS LATTICE2010 (2010) 036, arXiv:1011.3318 {hep-lat}.Google ScholarGoogle Scholar
  22. http://lattice.github.com/quda, 2011.Google ScholarGoogle Scholar
  23. G. Ruetsch and P. Micikevicius, "Optimizing matrix transpose in CUDA," NVIDIA Technical Report (2009).Google ScholarGoogle Scholar
  24. http://www.mellanox.com/pdf/whitepapers/TB_GPU_Direct.pdf, 2010.Google ScholarGoogle Scholar
  25. R. G. Edwards and B. Joó, "The Chroma software system for lattice QCD," Nucl. Phys. Proc. Suppl. 140 (2005) 832, arXiv:hep-lat/0409003.Google ScholarGoogle ScholarCross RefCross Ref
  26. H.-W. Lin et al., "First results from 2+1 dynamical quark flavors on an anisotropic lattice: light-hadron spectroscopy and setting the strange-quark mass," Phys. Rev. D79 (2009) 034502, arXiv:0810.3588 {hep-ph}.Google ScholarGoogle Scholar
  27. MIMD Lattice Collaboration, C. Bernard et al., "The MILC Code." http://www.physics.utah.edu/~detar/milc/milcv7.pdf, 2010.Google ScholarGoogle Scholar
  28. A. Bazavov, D. Toussaint, C. Bernard, J. Laiho, C. DeTar, et al., "Nonperturbative QCD simulations with 2+1 flavors of improved staggered quarks," Rev.Mod.Phys. 82 (2010) 1349--1417, arXiv:0903.3598 {hep-lat}.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Scaling lattice QCD beyond 100 GPUs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
              November 2011
              866 pages
              ISBN:9781450307710
              DOI:10.1145/2063384

              Copyright © 2011 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 12 November 2011

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              SC '11 Paper Acceptance Rate74of352submissions,21%Overall Acceptance Rate1,516of6,373submissions,24%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader