ABSTRACT
Recent studies have shown that Binary Graph Neural Networks (GNNs) are promising for saving computations of GNNs through binarized tensors. Prior work, however, mainly focused on algorithm designs or training techniques, leaving it open to how to materialize the performance potential on accelerator hardware fully. This work redesigns the binary GNN inference backend from the efficiency perspective. It fills the gap by proposing a series of abstractions and techniques to map binary GNNs and their computations best to fit the nature of bit manipulations on GPUs. Results on real-world graphs with GCNs, GraphSAGE, and GraphSAINT show that the proposed techniques outperform state-of-the-art binary GNN implementations by 8-22X with the same accuracy maintained. BitGNN code is publicly available.1.
- Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In The world wide web conference, pages 417--426, 2019.Google ScholarDigital Library
- Manon Réau, Nicolas Renaud, Li C Xue, and Alexandre MJJ Bonvin. Deeprankgnn: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics, 39(1):btac759, 2023.Google Scholar
- Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263--1272. PMLR, 2017.Google ScholarDigital Library
- Hatem Helal, Jesun Firoz, Jenna Bilbrey, Mario Michael Krell, Tom Murray, Ang Li, Sotiris Xantheas, and Sutanay Choudhury. Extreme acceleration of graph neural network-based prediction models for quantum chemistry. arXiv preprint arXiv:2211.13853, 2022.Google Scholar
- Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, and Justin M. Solomon. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph., 38(5), oct 2019.Google Scholar
- Jie-Fang Zhang and Zhengya Zhang. Point-x: A spatial-locality-aware architecture for energy-efficient graph-based point-cloud deep learning. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, page 1078--1090, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
- Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.Google Scholar
- Liang Yao, Chengsheng Mao, and Yuan Luo. Graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7370--7377, 2019.Google ScholarDigital Library
- Hongfan Ye, Buqing Cao, Junjie Chen, Jianxun Liu, Yiping Wen, and Jinjun Chen. A web services classification method based on gcn. In 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), pages 1107--1114. IEEE, 2019.Google Scholar
- Jiali Liang, Yufan Deng, and Dan Zeng. A deep neural network combined cnn and gcn for remote sensing scene classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13:4325--4338, 2020.Google ScholarCross Ref
- Sheng Wan, Chen Gong, Ping Zhong, Shirui Pan, Guangyu Li, and Jian Yang. Hyperspectral image classification with context-aware dynamic graph convolutional network. IEEE Transactions on Geoscience and Remote Sensing, 59(1):597--612, 2020.Google ScholarCross Ref
- Hao Jiang, Peng Cao, MingYi Xu, Jinzhu Yang, and Osmar Zaiane. Hi-gcn: A hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction. Computers in Biology and Medicine, 127:104096, 2020.Google ScholarDigital Library
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks. Advances in neural information processing systems, 29, 2016.Google Scholar
- Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830, 2016.Google Scholar
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. Xnornet: Imagenet classification using binary convolutional neural networks. In European conference on computer vision, pages 525--542. Springer, 2016.Google ScholarCross Ref
- Adrian Bulat and Georgios Tzimiropoulos. Xnor-net++: Improved binary neural networks. In Proceedings of the British Machine Vision Conference (BMVC), 2019.Google Scholar
- Yichi Zhang, Zhiru Zhang, and Lukasz Lew. Pokebnn: A binary pursuit of lightweight accuracy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12475--12485, 2022.Google Scholar
- Junfu Wang, Yunhong Wang, Zhen Yang, Liang Yang, and Yuanfang Guo. Bi-gcn: Binary graph convolutional network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1561--1570, 2021.Google Scholar
- Mehdi Bahri, Gaétan Bahl, and Stefanos Zafeiriou. Binary graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9492--9501, 2021.Google ScholarCross Ref
- Hanchen Wang, Defu Lian, Ying Zhang, Lu Qin, Xiangjian He, Yiguang Lin, and Xuemin Lin. Binarized graph neural network. World Wide Web, 24(3):825--848, 2021.Google ScholarDigital Library
- Yongcheng Jing, Yiding Yang, Xinchao Wang, Mingli Song, and Dacheng Tao. Meta-aggregator: Learning to aggregate for 1-bit graph neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5301--5310, 2021.Google Scholar
- Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.Google Scholar
- William L Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 1025--1035, 2017.Google ScholarDigital Library
- Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4602--4609, 2019.Google ScholarDigital Library
- Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Accurate, efficient and scalable graph embedding. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2019.Google ScholarCross Ref
- Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Graphsaint: Graph sampling based inductive learning method. In International Conference on Learning Representations, 2020.Google Scholar
- Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, and Shuaiwen Leon Song. Warp-consolidation: A novel execution model for gpus. In Proceedings of the 2018 International Conference on Supercomputing, ICS '18, page 53--64, New York, NY, USA, 2018. Association for Computing Machinery.Google ScholarDigital Library
- Ang Li, Tong Geng, Tianqi Wang, Martin Herbordt, Shuaiwen Leon Song, and Kevin Barker. Bstc: A novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '19, New York, NY, USA, 2019. Association for Computing Machinery.Google ScholarDigital Library
- AMD. Hip programming guide: https://github.com/radeonopencompute/rocm/blob/rocm-4.5.2/amd_hip_programming_guide.pdf, 2022.Google Scholar
- Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Nathan Tallent, Kevin Barker, and Ang Li. Bit-graphblas: Bit-level optimizations of matrix-centric graph processing on gpu. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 515--525, 2022.Google ScholarCross Ref
- Ang Li and Simon Su. Accelerating binarized neural networks via bit-tensor-cores in turing gpus. IEEE Transactions on Parallel and Distributed Systems, 32(7):1878--1891, 2020.Google Scholar
- Eli Ben-Sasson, Matan Hamilis, Mark Silberstein, and Eran Tromer. Fast multiplication in binary fields on gpus via register cache. In Proceedings of the 2016 International Conference on Supercomputing, pages 1--12, 2016.Google ScholarDigital Library
- Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang. Ge-spmm: General-purpose sparse matrix-matrix multiplication on gpus for graph neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '20. IEEE Press, 2020.Google ScholarCross Ref
- Md Rahman, Majedul Haque Sujon, Ariful Azad, et al. Fusedmm: A unified sddmm-spmm kernel for graph embedding and graph neural networks. In 35th Proceedings of IEEE IPDPS, 2021.Google ScholarCross Ref
- Technical report - bitgnn: Unleashing the performance potential of binary graph neural networks on gpus: https://tinyurl.com/yuf87cax, 2023.Google Scholar
- NVIDIA. Cuda programming guide, 2022.Google Scholar
- Yue Zhao, Jiajia Li, Chunhua Liao, and Xipeng Shen. Bridging the gap between deep learning and sparse matrix format selection. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018.Google ScholarDigital Library
- Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semi-supervised learning with graph embeddings. In International conference on machine learning, pages 40--48. PMLR, 2016.Google ScholarDigital Library
- Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems, 33:22118--22133, 2020.Google Scholar
- Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.Google Scholar
- Marinka Zitnik and Jure Leskovec. Predicting multicellular function through multi-layer tissue networks. Bioinformatics, 33(14):i190--i198, 2017.Google ScholarCross Ref
- R. J. Fowlkes and D. G. Mallows. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383):553--569, 1983.Google ScholarCross Ref
- Gang Chen, Haitao Meng, Yucheng Liang, and Kai Huang. Gpu-accelerated real-time stereo estimation with binary neural network. IEEE Transactions on Parallel and Distributed Systems, 31(12):2896--2907, 2020.Google ScholarDigital Library
- Chun-Hsian Huang. An fpga-based hardware/software design using binarized neural networks for agricultural applications: A case study. IEEE Access, 9:26523--26531, 2021.Google ScholarCross Ref
- Nael Fasfous, Manoj-Rohit Vemparala, Alexander Frickenstein, Lukas Frickenstein, Mohamed Badawy, and Walter Stechele. Binarycop: Binary neural network-based covid-19 face-mask wear and positioning predictor on edge devices. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 108--115. IEEE, 2021.Google ScholarCross Ref
- Chao Ma, Yulan Guo, Yinjie Lei, and Wei An. Binary volumetric convolutional neural networks for 3-d object recognition. IEEE Transactions on Instrumentation and Measurement, 68(1):38--48, 2018.Google ScholarCross Ref
- Yinglan Ma, Hongyu Xiong, Zhe Hu, and Lizhuang Ma. Efficient super resolution using binarized neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0--0, 2019.Google ScholarCross Ref
- Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Shuaiwen Leon Song, Ang Li, and Martin Herbordt. Lp-bnn: Ultra-low-latency bnn inference with layer parallelism. In 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), volume 2160, pages 9--16. IEEE, 2019.Google ScholarCross Ref
- Tong Geng, Ang Li, Tianqi Wang, Chunshu Wu, Yanfei Li, Runbin Shi, Wei Wu, and Martin Herbordt. O3bnn-r: An out-of-order architecture for high-performance and regularized bnn inference. IEEE Transactions on parallel and distributed systems, 32(1):199--213, 2020.Google ScholarCross Ref
- Angus Galloway, Graham W Taylor, and Medhat Moussa. Attacking binarized neural networks. arXiv preprint arXiv:1711.00449, 2017.Google Scholar
- Oliver Wieder, Stefan Kohlbacher, Mélaine Kuenemann, Arthur Garon, Pierre Ducrot, Thomas Seidel, and Thierry Langer. A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies, 37:1--12, 2020.Google ScholarCross Ref
- Xiaoyang Wang, Yao Ma, Yiqi Wang, Wei Jin, Xin Wang, Jiliang Tang, Caiyan Jia, and Jian Yu. Traffic flow prediction via spatial temporal graph neural network. In Proceedings of the web conference 2020, pages 1082--1092, 2020.Google ScholarDigital Library
- Christian Nauck, Michael Lindner, Konstantin Schürholt, Haoming Zhang, Paul Schultz, Jürgen Kurths, Ingrid Isenhardt, and Frank Hellmann. Predicting basin stability of power grids using graph neural networks. New Journal of Physics, 24(4):043041, 2022.Google ScholarCross Ref
- Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. Adaptive sampling towards fast graph representation learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.Google Scholar
- Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018.Google Scholar
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229--256, 1992.Google ScholarDigital Library
- Fabrizio Pedersoli, George Tzanetakis, and Andrea Tagliasacchi. Espresso: Efficient forward propagation for binary deep neural networks. In International Conference on Learning Representations, 2018.Google Scholar
- Mir Khan, Heikki Huttunen, and Jani Boutellier. Binarized convolutional neural networks for efficient inference on gpus. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 682--686. IEEE, 2018.Google ScholarCross Ref
- Haitao Meng, Chonghao Zhong, Jianfeng Gu, and Gang Chen. A gpu-accelerated deep stereo-lidar fusion for real-time high-precision dense depth sensing. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 523--528. IEEE, 2021.Google Scholar
- Chang Ye, Yuchen Li, Bingsheng He, Zhao Li, and Jianling Sun. GPU-Accelerated Graph Label Propagation for Real-Time Fraud Detection, page 2348--2356. Association for Computing Machinery, New York, NY, USA, 2021.Google Scholar
- Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, and Yufei Ding. Apnn-tc: Accelerating arbitrary precision neural networks on ampere gpu tensor cores. In Proceedings of the international conference for high performance computing, networking, storage and analysis, pages 1--13, 2021.Google ScholarDigital Library
Index Terms
- BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs
Recommendations
Toward the analysis of graph neural networks
ICSE-NIER '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging ResultsGraph Neural Networks (GNNs) have recently emerged as an effective framework for representing and analyzing graph-structured data. GNNs have been applied to many real-world problems such as knowledge graph analysis, social networks recommendation, and ...
TLPGNN: A Lightweight Two-Level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUs
Graph Neural Networks (GNNs) are an emerging class of deep learning models specifically designed for graph-structured data. They have been effectively employed in a variety of real-world applications, including recommendation systems, drug development, ...
Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and AnalysisOpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
Comments