research-article

Public Access

SV-sim: scalable PGAS-based state vector simulation of quantum circuits

Authors:
Ang Li

Quantum Science Center

Quantum Science Center
View Profile

,
Bo Fang

Quantum Science Center

Quantum Science Center
View Profile

,
Christopher Granade

Microsoft Research

Microsoft Research
View Profile

,
Guen Prawiroatmodjo

Microsoft Research

Microsoft Research
View Profile

,
Bettina Heim

Microsoft Research

Microsoft Research
View Profile

,
Martin Roetteler

Microsoft Research

Microsoft Research
View Profile

,
Sriram Krishnamoorthy

Pacific Northwest National Laboratory

Pacific Northwest National Laboratory
View Profile

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2021Article No.: 97Pages 1–14https://doi.org/10.1145/3458817.3476169

Published:13 November 2021Publication History

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–14

ABSTRACT

High-performance quantum circuit simulation in a classic HPC is still imperative in the NISQ era. Observing that the major obstacle of scalable state-vector quantum simulation arises from the massively fine-grained irregular data-exchange with remote nodes, in this paper we present SV-Sim to apply the PGAS-based communication models (i.e., direct peer access for intra-node CPUs/GPUs and SHMEM for inter-node CPU/GPU clusters) for efficient generalpurpose quantum circuit simulation. Through an orchestrated design based on device functional pointer, SV-Sim is able to abstract various quantum gates across multiple heterogeneous backends, including IBM/Intel/AMD CPUs, NVIDIA/AMD GPUs, and Intel Xeon Phi, in a unified framework, but still asserting outstanding performance and tractable interface to higher-level quantum programming environments, such as IBM Qiskit, Microsoft Q# and Google Cirq. Circumventing the obstacle from the lack of polymorphism in GPUs and leveraging the device-initiated one-sided communication, SV-Sim can process circuit that are dynamically generated in Python using a single GPU/CPU kernel without the need of expensive JIT or runtime parsing, significantly simplifying the programming complexity and improving performance for QC simulation. This is especially appealing for the variational quantum algorithms given the circuits are synthesized online per iteration. Evaluations on the latest NVIDIA DGX-A100, V100-DGX-2, ALCF Theta, OLCF Spock, and OLCF Summit HPCs show that SV-Sim can deliver scalable performance on various state-of-the-art HPC platforms, offering a useful tool for quantum algorithm validation and verification. SV-Sim has been released at http://github.com/pnnl/sv-sim. A version specially tweaked for Q#/QDK is also provided.

Supplemental Material

SV-Sim_ Scalable PGAS-Based State Vector Simulation of Quantum Circuits.mp4.mp4

mp4

287.5 MB

Download

References

[n.d.]. List of QC simulators. https://www.quantiki.org/wiki/list-qc-simulators.Google Scholar
Ali J Abhari, Arvin Faruque, Mohammad J Dousti, Lukas Svec, Oana Catu, Amlan Chakrabati, Chen-Fu Chiang, Seth Vanderwilt, John Black, and Fred Chong. 2012. Scaffold: Quantum programming language. Technical Report. Princeton Univ NJ Dept of Computer Science.Google Scholar
Gadi Aleksandrowicz, Thomas Alexander, Panagiotis Barkoutsos, Luciano Bello, Yael Ben-Haim, D Bucher, FJ Cabrera-Hernández, J Carballo-Franquis, A Chen, CF Chen, et al. 2019. Qiskit: An open-source framework for quantum computing. Accessed on: Mar 16 (2019).Google Scholar
AMD. [n.d.]. ROCm OpenSHMEM. URL: https://github.com/ROCm-Developer-Tools/ROC_SHMEM.Google Scholar
AMD. 2020. AMD Infinity Fabric.Google Scholar
Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando GSL Brandao, David A Buell, et al. 2019. Quantum supremacy using a programmable superconducting processor. Nature 574, 7779 (2019), 505--510.Google Scholar
Adriano Barenco, Charles H Bennett, Richard Cleve, David P DiVincenzo, Norman Margolus, Peter Shor, Tycho Sleator, John A Smolin, and Harald Weinfurter. 1995. Elementary gates for quantum computation. Physical review A 52, 5 (1995), 3457.Google Scholar
Rami Barends, Julian Kelly, Anthony Megrant, Andrzej Veitia, Daniel Sank, Evan Jeffrey, Ted C White, Josh Mutus, Austin G Fowler, Brooks Campbell, et al. 2014. Superconducting quantum circuits at the surface code threshold for fault tolerance. Nature 508, 7497 (2014), 500--503.Google Scholar
Kerstin Beer, Dmytro Bondarenko, Terry Farrelly, Tobias J Osborne, Robert Salzmann, Daniel Scheiermann, and Ramona Wolf. 2020. Training deep quantum neural networks. Nature communications 11, 1 (2020), 1--6.Google Scholar
Sergio Boixo, Sergei V Isakov, Vadim N Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, Michael J Bremner, John M Martinis, and Hartmut Neven. 2018. Characterizing quantum supremacy in near-term devices. Nature Physics 14, 6 (2018), 595--600.Google ScholarCross Ref
Michael Broughton, Guillaume Verdon, Trevor McCourt, Antonio J Martinez, Jae Hyeon Yoo, Sergei V Isakov, Philip Massey, Murphy Yuezhen Niu, Ramin Halavati, Evan Peters, et al. 2020. Tensorflow quantum: A software framework for quantum machine learning. arXiv preprint arXiv:2003.02989 (2020).Google Scholar
Jonathan Carter, David Dean, Greg Hebner, Jungsang Kim, Andrew Landahl, Peter Maunz, Raphael Pooser, Irfan Siddiqi, and Jeffrey Vetter. 2017. ASCR Report on a Quantum Computing Testbed for Science. Technical Report. USDOE Office of Science (SC), Washington, DC (United States). Advanced ....Google Scholar
Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. 1--3.Google ScholarDigital Library
Andrew W Cross, Lev S Bishop, John A Smolin, and Jay M Gambetta. 2017. Open quantum assembly language. arXiv preprint arXiv:1707.03429 (2017). Repo: https://github.com/Qiskit/openqasm.Google Scholar
Hans De Raedt, Fengping Jin, Dennis Willsch, Madita Willsch, Naoki Yoshioka, Nobuyasu Ito, Shengjun Yuan, and Kristel Michielsen. 2019. Massively parallel quantum computer simulator, eleven years later. Computer Physics Communications 237 (2019), 47--61.Google ScholarCross Ref
Koen De Raedt, Kristel Michielsen, Hans De Raedt, Binh Trieu, Guido Arnold, Marcus Richter, Th Lippert, Hiroshi Watanabe, and Nobuyasu Ito. 2007. Massively parallel quantum computer simulator. Computer Physics Communications 176, 2 (2007), 121--136.Google ScholarCross Ref
Jun Doi, Hitomi Takahashi, Rudy Raymond, Takashi Imamichi, and Hiroshi Horii. 2019. Quantum computing simulator on a heterogeneous hpc system. In Proceedings of the 16th ACM International Conference on Computing Frontiers. 85--93.Google ScholarDigital Library
Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. 2014. A quantum approximate optimization algorithm. arXiv preprint arXiv:1411.4028 (2014).Google Scholar
Edward Farhi and Hartmut Neven. 2018. Classification with quantum neural networks on near term processors. arXiv preprint arXiv:1802.06002 (2018).Google Scholar
Gian Giacomo Guerreschi and Anne Y Matsuura. 2019. QAOA for Max-Cut requires hundreds of qubits for quantum speed-up. Scientific reports 9, 1 (2019), 1--7.Google Scholar
Khaled Hamidouche and Michael LeBeane. 2020. Gpu initiated openshmem: correct and efficient intra-kernel networking for dgpus. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 336--347.Google ScholarDigital Library
Thomas Häner and Damian S Steiger. 2017. 5 petabyte simulation of a 45-qubit quantum circuit. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--10.Google ScholarDigital Library
Thomas Häner, Damian S Steiger, Mikhail Smelyanskiy, and Matthias Troyer. 2016. High performance emulation of quantum circuits. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 866--874.Google ScholarDigital Library
Bettina Heim. 2021. Universal Quantum Intermediate Representation. Bulletin of the American Physical Society (2021).Google Scholar
IBM. [n.d.]. IBM Quantum Experience. URL: https://quantum-computing.ibm.com/.Google Scholar
Sylvain Jeaugey. 2017. NCCL 2.0. In GPU Technology Conference (GTC).Google Scholar
Tyson Jones, Anna Brown, Ian Bush, and Simon C Benjamin. 2019. QuEST and high performance simulation of quantum computers. Scientific reports 9, 1 (2019), 1--11.Google Scholar
Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, Maika Takita, Markus Brink, Jerry M Chow, and Jay M Gambetta. 2017. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature 549, 7671 (2017), 242--246.Google Scholar
Michael LeBeane, Khaled Hamidouche, Brad Benton, Mauricio Breternitz, Steven K Reinhardt, and Lizy K John. 2017. GPU triggered networking for intra-kernel communications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12.Google ScholarDigital Library
Ang Li, Tong Geng, Tianqi Wang, Martin Herbordt, Shuaiwen Leon Song, and Kevin Barker. 2019. BSTC: A novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--30.Google ScholarDigital Library
Ang Li and Sriram Krishnamoorthy. 2020. QASMBench: A low-level QASM benchmark suite for NISQ evaluation and simulation. arXiv preprint arXiv:2005.13018 (2020).Google Scholar
Ang Li, Weifeng Liu, Mads RB Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez, and Shuaiwen Leon Song. 2017. Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.Google ScholarDigital Library
Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, and Shuaiwen Leon Song. 2018. Warp-consolidation: A novel execution model for gpus. In Proceedings of the 2018 International Conference on Supercomputing. 53--64.Google ScholarDigital Library
Ang Li, Shuaiwen Leon Song, Eric Brugel, Akash Kumar, Daniel Chavarria-Miranda, and Henk Corporaal. 2016. X: A comprehensive analytic model for parallel machines. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 242--252.Google ScholarCross Ref
Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R Tallent, and Kevin J Barker. 2019. Evaluating modern gpu interconnect: Pcie, nvlink, nv-sli, nvswitch and gpudirect. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2019), 94--110.Google ScholarDigital Library
Ang Li, Shuaiwen Leon Song, Jieyang Chen, Xu Liu, Nathan Tallent, and Kevin Barker. 2018. Tartan: evaluating modern GPU interconnect via a multi-GPU benchmark suite. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 191--202.Google ScholarCross Ref
Ang Li, Shuaiwen Leon Song, Akash Kumar, Eddy Z Zhang, Daniel Chavarría-Miranda, and Henk Corporaal. 2016. Critical points based register-concurrency autotuning for GPUs. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1273--1278.Google Scholar
Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, and Henk Corporaal. 2017. Locality-aware CTA clustering for modern GPUs. ACM SIGARCH Computer Architecture News 45, 1 (2017), 297--311.Google ScholarDigital Library
Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, and Henk Corporaal. 2016. SFU-driven transparent approximation acceleration on GPUs. In Proceedings of the 2016 International Conference on Supercomputing. 1--14.Google ScholarDigital Library
Ang Li and Simon Su. 2020. Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2020), 1878--1891.Google Scholar
Ang Li, Omer Subasi, Xiu Yang, and Sriram Krishnamoorthy. 2020. Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--15.Google ScholarDigital Library
Ang Li, Gert-Jan van den Braak, Henk Corporaal, and Akash Kumar. 2015. Finegrained synchronizations and dataflow programming on GPUs. In Proceedings of the 29th ACM on International Conference on Supercomputing. 109--118.Google Scholar
Ang Li, Gert-Jan van den Braak, Akash Kumar, and Henk Corporaal. 2015. Adaptive and transparent cache bypassing for GPUs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12.Google Scholar
Zhen Li and Jiabin Yuan. 2017. Quantum computer simulation on gpu cluster incorporating data locality. In International Conference on Cloud Computing and Security. Springer, 85--97.Google ScholarCross Ref
Daniel A Lidar and Todd A Brun. 2013. Quantum error correction. Cambridge university press.Google Scholar
A Linn. [n.d.]. The future is quantum: Microsoft releases free preview of quantum development kit.(Dec. 11, 2017).Google Scholar
John Nickolls and William J Dally. 2010. The GPU computing era. IEEE micro 30, 2 (2010), 56--69.Google ScholarDigital Library
NVIDIA. [n.d.]. NVIDIA NVSHMEM Developer Guide. URL: https://docs.nvidia.com/hpc-sdk/nvshmem/archives/nvshmem-101/developer-guide/index.html.Google Scholar
Yuchen Pang, Tianyi Hao, Annika Dugad, Yiqing Zhou, and Edgar Solomonik. 2020. Efficient 2D tensor network simulation of quantum systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--14.Google ScholarDigital Library
Sreeram Potluri, Davide Rossetti, Donald Becker, Duncan Poole, Manjunath Gorentla Venkata, Oscar Hernandez, Pavel Shamis, M Graham Lopez, Mathew Baker, and Wendy Poole. 2014. Exploring OpenSHMEM model to program GPU-based extreme-scale systems. In Workshop on OpenSHMEM and Related Technologies. Springer, 18--35.Google Scholar
John Preskill. 2018. Quantum Computing in the NISQ era and beyond. Quantum 2 (2018), 79.Google ScholarCross Ref
Jonathan Romero, Ryan Babbush, Jarrod R McClean, Cornelius Hempel, Peter J Love, and Alán Aspuru-Guzik. 2018. Strategies for quantum computing molecular energies using the unitary coupled cluster ansatz. Quantum Science and Technology 4, 1 (2018), 014008.Google ScholarCross Ref
Davide Rossetti and S Team. 2015. GPUDIRECT: Integrating the GPU with a Network Interface. In GPU Technology Conference.Google Scholar
Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights landing: Second-generation intel xeon phi product. Ieee micro 36, 2 (2016), 34--46.Google Scholar
SPI. [n.d.]. OpenMPI: Open Source High Performance Computing. URL: https://www.open-mpi.org/.Google Scholar
Krysta Svore, Alan Geller, Matthias Troyer, John Azariah, Christopher Granade, Bettina Heim, Vadym Kliuchnikov, Mariia Mykhailova, Andres Paz, and Martin Roetteler. 2018. Q# enabling scalable quantum computing and development with a high-level dsl. In Proceedings of the Real World Domain Specific Languages Workshop 2018. 1--10.Google ScholarDigital Library
James D Whitfield, Jacob Biamonte, and Alán Aspuru-Guzik. 2011. Simulation of electronic structure Hamiltonians using quantum computers. Molecular Physics 109, 5 (2011), 735--750.Google ScholarCross Ref
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.Google ScholarDigital Library
Xin-Chuan Wu, Sheng Di, Emma Maitreyee Dasgupta, Franck Cappello, Hal Finkel, Yuri Alexeev, and Frederic T Chong. 2019. Full-state quantum circuit simulation by using data compression. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--24.Google ScholarDigital Library
Pei Zhang, Jiabin Yuan, and Xiangwen Lu. 2015. Quantum computer simulation on multi-GPU incorporating data locality. In International Conference on Algorithms and Architectures for Parallel Processing. Springer, 241--256.Google ScholarCross Ref

Index Terms

SV-sim: scalable PGAS-based state vector simulation of quantum circuits
1. Computing methodologies
  1. Modeling and simulation
  2. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

Extending OpenSHMEM for GPU Computing
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Graphics Processing Units (GPUs) are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. In order to maximize utilization, it is imperative that applications running on these ...
Read More
Native Mode-Based Optimizations of Remote Memory Accesses in OpenSHMEM for Intel Xeon Phi
PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models

OpenSHMEM is a PGAS library that aims to deliver high performance while retaining portability. Communication operations are a major obstacle to scalable parallel performance and are highly dependent on the target architecture. However, to date there has ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2021
1493 pages
ISBN:9781450384421
DOI:10.1145/3458817
General Chair:
Bronis R. de Supinski,
Program Chairs:
Mary Hall,
Todd Gamblin
Copyright © 2021 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 November 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional / v1.1
- Artifacts Available / v1.1
Author Tags
GPU
NVSHMEM
OpenSHMEM
quantum simulation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 568
  Total Downloads
- Downloads (Last 12 months)215
- Downloads (Last 6 weeks)31
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SV-sim: scalable PGAS-based state vector simulation of quantum circuits

SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Extending OpenSHMEM for GPU Computing

Native Mode-Based Optimizations of Remote Memory Accesses in OpenSHMEM for Intel Xeon Phi

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing