research-article

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

Authors:
Jou-An Chen

North Carolina State University, Raleigh, NC, United States of America

North Carolina State University, Raleigh, NC, United States of America

https://orcid.org/0000-0002-9820-9210
View Profile

,
Hsin-Hsuan Sung

North Carolina State University, Raleigh, NC, United States of America

North Carolina State University, Raleigh, NC, United States of America

https://orcid.org/0000-0001-6186-7669
View Profile

,
Xipeng Shen

North Carolina State University, Raleigh, NC, United States of America

North Carolina State University, Raleigh, NC, United States of America

https://orcid.org/0000-0003-3599-8010
View Profile

,
Sutanay Choudhury

Pacific Northwest National Laboratory, Richland, WA, USA

Pacific Northwest National Laboratory, Richland, WA, USA

https://orcid.org/0000-0001-7352-2035
View Profile

,
Ang Li

Pacific Northwest National Laboratory, Richland, WA, United States of America

Pacific Northwest National Laboratory, Richland, WA, United States of America

https://orcid.org/0000-0003-3734-9137
View Profile

ICS '23: Proceedings of the 37th International Conference on SupercomputingJune 2023Pages 264–276https://doi.org/10.1145/3577193.3593725

Published:21 June 2023Publication History

ICS '23: Proceedings of the 37th International Conference on Supercomputing

Pages 264–276

ABSTRACT

Recent studies have shown that Binary Graph Neural Networks (GNNs) are promising for saving computations of GNNs through binarized tensors. Prior work, however, mainly focused on algorithm designs or training techniques, leaving it open to how to materialize the performance potential on accelerator hardware fully. This work redesigns the binary GNN inference backend from the efficiency perspective. It fills the gap by proposing a series of abstractions and techniques to map binary GNNs and their computations best to fit the nature of bit manipulations on GPUs. Results on real-world graphs with GCNs, GraphSAGE, and GraphSAINT show that the proposed techniques outperform state-of-the-art binary GNN implementations by 8-22X with the same accuracy maintained. BitGNN code is publicly available.¹.

References

Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In The world wide web conference, pages 417--426, 2019.Google ScholarDigital Library
Manon Réau, Nicolas Renaud, Li C Xue, and Alexandre MJJ Bonvin. Deeprankgnn: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics, 39(1):btac759, 2023.Google Scholar
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263--1272. PMLR, 2017.Google ScholarDigital Library
Hatem Helal, Jesun Firoz, Jenna Bilbrey, Mario Michael Krell, Tom Murray, Ang Li, Sotiris Xantheas, and Sutanay Choudhury. Extreme acceleration of graph neural network-based prediction models for quantum chemistry. arXiv preprint arXiv:2211.13853, 2022.Google Scholar
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph., 38(5), oct 2019.Google Scholar
Jie-Fang Zhang and Zhengya Zhang. Point-x: A spatial-locality-aware architecture for energy-efficient graph-based point-cloud deep learning. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, page 1078--1090, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.Google Scholar
Liang Yao, Chengsheng Mao, and Yuan Luo. Graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7370--7377, 2019.Google ScholarDigital Library
Hongfan Ye, Buqing Cao, Junjie Chen, Jianxun Liu, Yiping Wen, and Jinjun Chen. A web services classification method based on gcn. In 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), pages 1107--1114. IEEE, 2019.Google Scholar
Jiali Liang, Yufan Deng, and Dan Zeng. A deep neural network combined cnn and gcn for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13:4325--4338, 2020.Google ScholarCross Ref
Sheng Wan, Chen Gong, Ping Zhong, Shirui Pan, Guangyu Li, and Jian Yang. Hyperspectral image classification with context-aware dynamic graph convolutional network. IEEE Transactions on Geoscience and Remote Sensing, 59(1):597--612, 2020.Google ScholarCross Ref
Hao Jiang, Peng Cao, MingYi Xu, Jinzhu Yang, and Osmar Zaiane. Hi-gcn: A hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction. Computers in Biology and Medicine, 127:104096, 2020.Google ScholarDigital Library
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. Advances in neural information processing systems, 29, 2016.Google Scholar
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830, 2016.Google Scholar
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnornet: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525--542. Springer, 2016.Google ScholarCross Ref
Adrian Bulat and Georgios Tzimiropoulos. Xnor-net++: Improved binary neural networks. In Proceedings of the British Machine Vision Conference (BMVC), 2019.Google Scholar
Yichi Zhang, Zhiru Zhang, and Lukasz Lew. Pokebnn: A binary pursuit of lightweight accuracy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12475--12485, 2022.Google Scholar
Junfu Wang, Yunhong Wang, Zhen Yang, Liang Yang, and Yuanfang Guo. Bi-gcn: Binary graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1561--1570, 2021.Google Scholar
Mehdi Bahri, Gaétan Bahl, and Stefanos Zafeiriou. Binary graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9492--9501, 2021.Google ScholarCross Ref
Hanchen Wang, Defu Lian, Ying Zhang, Lu Qin, Xiangjian He, Yiguang Lin, and Xuemin Lin. Binarized graph neural network. World Wide Web, 24(3):825--848, 2021.Google ScholarDigital Library
Yongcheng Jing, Yiding Yang, Xinchao Wang, Mingli Song, and Dacheng Tao. Meta-aggregator: Learning to aggregate for 1-bit graph neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5301--5310, 2021.Google Scholar
Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.Google Scholar
William L Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 1025--1035, 2017.Google ScholarDigital Library
Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4602--4609, 2019.Google ScholarDigital Library
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Accurate, efficient and scalable graph embedding. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2019.Google ScholarCross Ref
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations, 2020.Google Scholar
Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, and Shuaiwen Leon Song. Warp-consolidation: A novel execution model for gpus. In Proceedings of the 2018 International Conference on Supercomputing, ICS '18, page 53--64, New York, NY, USA, 2018. Association for Computing Machinery.Google ScholarDigital Library
Ang Li, Tong Geng, Tianqi Wang, Martin Herbordt, Shuaiwen Leon Song, and Kevin Barker. Bstc: A novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '19, New York, NY, USA, 2019. Association for Computing Machinery.Google ScholarDigital Library
AMD. Hip programming guide: https://github.com/radeonopencompute/rocm/blob/rocm-4.5.2/amd_hip_programming_guide.pdf, 2022.Google Scholar
Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Nathan Tallent, Kevin Barker, and Ang Li. Bit-graphblas: Bit-level optimizations of matrix-centric graph processing on gpu. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 515--525, 2022.Google ScholarCross Ref
Ang Li and Simon Su. Accelerating binarized neural networks via bit-tensor-cores in turing gpus. IEEE Transactions on Parallel and Distributed Systems, 32(7):1878--1891, 2020.Google Scholar
Eli Ben-Sasson, Matan Hamilis, Mark Silberstein, and Eran Tromer. Fast multiplication in binary fields on gpus via register cache. In Proceedings of the 2016 International Conference on Supercomputing, pages 1--12, 2016.Google ScholarDigital Library
Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang. Ge-spmm: General-purpose sparse matrix-matrix multiplication on gpus for graph neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '20. IEEE Press, 2020.Google ScholarCross Ref
Md Rahman, Majedul Haque Sujon, Ariful Azad, et al. Fusedmm: A unified sddmm-spmm kernel for graph embedding and graph neural networks. In 35th Proceedings of IEEE IPDPS, 2021.Google ScholarCross Ref
Technical report - bitgnn: Unleashing the performance potential of binary graph neural networks on gpus: https://tinyurl.com/yuf87cax, 2023.Google Scholar
NVIDIA. Cuda programming guide, 2022.Google Scholar
Yue Zhao, Jiajia Li, Chunhua Liao, and Xipeng Shen. Bridging the gap between deep learning and sparse matrix format selection. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018.Google ScholarDigital Library
Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semi-supervised learning with graph embeddings. In International conference on machine learning, pages 40--48. PMLR, 2016.Google ScholarDigital Library
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118--22133, 2020.Google Scholar
Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.Google Scholar
Marinka Zitnik and Jure Leskovec. Predicting multicellular function through multi-layer tissue networks. Bioinformatics, 33(14):i190--i198, 2017.Google ScholarCross Ref
R. J. Fowlkes and D. G. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553--569, 1983.Google ScholarCross Ref
Gang Chen, Haitao Meng, Yucheng Liang, and Kai Huang. Gpu-accelerated real-time stereo estimation with binary neural network. IEEE Transactions on Parallel and Distributed Systems, 31(12):2896--2907, 2020.Google ScholarDigital Library
Chun-Hsian Huang. An fpga-based hardware/software design using binarized neural networks for agricultural applications: A case study. IEEE Access, 9:26523--26531, 2021.Google ScholarCross Ref
Nael Fasfous, Manoj-Rohit Vemparala, Alexander Frickenstein, Lukas Frickenstein, Mohamed Badawy, and Walter Stechele. Binarycop: Binary neural network-based covid-19 face-mask wear and positioning predictor on edge devices. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 108--115. IEEE, 2021.Google ScholarCross Ref
Chao Ma, Yulan Guo, Yinjie Lei, and Wei An. Binary volumetric convolutional neural networks for 3-d object recognition. IEEE Transactions on Instrumentation and Measurement, 68(1):38--48, 2018.Google ScholarCross Ref
Yinglan Ma, Hongyu Xiong, Zhe Hu, and Lizhuang Ma. Efficient super resolution using binarized neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0--0, 2019.Google ScholarCross Ref
Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Shuaiwen Leon Song, Ang Li, and Martin Herbordt. Lp-bnn: Ultra-low-latency bnn inference with layer parallelism. In 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), volume 2160, pages 9--16. IEEE, 2019.Google ScholarCross Ref
Tong Geng, Ang Li, Tianqi Wang, Chunshu Wu, Yanfei Li, Runbin Shi, Wei Wu, and Martin Herbordt. O3bnn-r: An out-of-order architecture for high-performance and regularized bnn inference. IEEE Transactions on parallel and distributed systems, 32(1):199--213, 2020.Google ScholarCross Ref
Angus Galloway, Graham W Taylor, and Medhat Moussa. Attacking binarized neural networks. arXiv preprint arXiv:1711.00449, 2017.Google Scholar
Oliver Wieder, Stefan Kohlbacher, Mélaine Kuenemann, Arthur Garon, Pierre Ducrot, Thomas Seidel, and Thierry Langer. A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies, 37:1--12, 2020.Google ScholarCross Ref
Xiaoyang Wang, Yao Ma, Yiqi Wang, Wei Jin, Xin Wang, Jiliang Tang, Caiyan Jia, and Jian Yu. Traffic flow prediction via spatial temporal graph neural network. In Proceedings of the web conference 2020, pages 1082--1092, 2020.Google ScholarDigital Library
Christian Nauck, Michael Lindner, Konstantin Schürholt, Haoming Zhang, Paul Schultz, Jürgen Kurths, Ingrid Isenhardt, and Frank Hellmann. Predicting basin stability of power grids using graph neural networks. New Journal of Physics, 24(4):043041, 2022.Google ScholarCross Ref
Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. Adaptive sampling towards fast graph representation learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.Google Scholar
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018.Google Scholar
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229--256, 1992.Google ScholarDigital Library
Fabrizio Pedersoli, George Tzanetakis, and Andrea Tagliasacchi. Espresso: Efficient forward propagation for binary deep neural networks. In International Conference on Learning Representations, 2018.Google Scholar
Mir Khan, Heikki Huttunen, and Jani Boutellier. Binarized convolutional neural networks for efficient inference on gpus. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 682--686. IEEE, 2018.Google ScholarCross Ref
Haitao Meng, Chonghao Zhong, Jianfeng Gu, and Gang Chen. A gpu-accelerated deep stereo-lidar fusion for real-time high-precision dense depth sensing. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 523--528. IEEE, 2021.Google Scholar
Chang Ye, Yuchen Li, Bingsheng He, Zhao Li, and Jianling Sun. GPU-Accelerated Graph Label Propagation for Real-Time Fraud Detection, page 2348--2356. Association for Computing Machinery, New York, NY, USA, 2021.Google Scholar
Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, and Yufei Ding. Apnn-tc: Accelerating arbitrary precision neural networks on ampere gpu tensor cores. In Proceedings of the international conference for high performance computing, networking, storage and analysis, pages 1--13, 2021.Google ScholarDigital Library

Index Terms

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

Toward the analysis of graph neural networks
ICSE-NIER '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results

Graph Neural Networks (GNNs) have recently emerged as an effective framework for representing and analyzing graph-structured data. GNNs have been applied to many real-world problems such as knowledge graph analysis, social networks recommendation, and ...
Read More
TLPGNN: A Lightweight Two-Level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUs
Graph Neural Networks (GNNs) are an emerging class of deep learning models specifically designed for graph-structured data. They have been effectively employed in a variety of real-world applications, including recommendation systems, drug development, ...
Read More
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '23: Proceedings of the 37th International Conference on Supercomputing
June 2023
505 pages
ISBN:9798400700569
DOI:10.1145/3577193
Chair:
Kyle Gallivan,
Co-chair:
Efstratios Gallopoulos,
Program Co-chairs:
Dimitrios S. Nikolopoulos,
Ramon Beivide
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph neural networks
binarized GNN
bit manipulation
GPU
sparse matrix
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Upcoming Conference
ICS '24

Sponsor:

sigarch

2024 International Conference on Supercomputing

June 4 - 7, 2024

Kyoto , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 149
  Total Downloads
- Downloads (Last 12 months)149
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

ICS '23: Proceedings of the 37th International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Toward the analysis of graph neural networks

TLPGNN: A Lightweight Two-Level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUs

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs

ICS '23: Proceedings of the 37th International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Toward the analysis of graph neural networks

TLPGNN: A Lightweight Two-Level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUs

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media