research-article

Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems

Authors:
Sankha Baran Dutta

Pacific Northwest National Laboratory, Richland, USA

Pacific Northwest National Laboratory, Richland, USA

https://orcid.org/0009-0003-5691-6405
View Profile

,
Hoda Naghibijouybari

Binghamton University, Binghampton, USA

Binghamton University, Binghampton, USA

https://orcid.org/0000-0003-0468-3032
View Profile

,
Arjun Gupta

University of New Mexico, Albuquerque, USA

University of New Mexico, Albuquerque, USA

https://orcid.org/0000-0003-4573-2937
View Profile

,
Nael Abu-Ghazaleh

University of California, Riverside, Riverside, USA

University of California, Riverside, Riverside, USA

https://orcid.org/0000-0002-9485-5370
View Profile

,
Andres Marquez

Pacific Northwest National Lab, Richland, USA

Pacific Northwest National Lab, Richland, USA

https://orcid.org/0000-0002-4313-1882
View Profile

,
Kevin Barker

Pacific Northwest National Lab, Richland, USA

Pacific Northwest National Lab, Richland, USA

https://orcid.org/0000-0003-4947-0559
View Profile

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureJune 2023Article No.: 45Pages 1–13https://doi.org/10.1145/3579371.3589080

Published:17 June 2023Publication History

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture

Pages 1–13

ABSTRACT

The deep learning revolution has been enabled in large part by GPUs, and more recently accelerators, which make it possible to carry out computationally demanding training and inference in acceptable times. As the size of machine learning networks and workloads continues to increase, multi-GPU machines have emerged as an important platform offered on High Performance Computing and cloud data centers. Since these machines are shared among multiple users, it becomes increasingly important to protect applications against potential attacks. In this paper, we explore the vulnerability of Nvidia's DGX multi-GPU machines to covert and side channel attacks. These machines consist of a number of discrete GPUs that are interconnected through a combination of custom interconnect (NVLink) and PCIe connections. We reverse engineer the interconnected cache hierarchy and show that it is possible for an attacker on one GPU to cause contention on the L2 cache of another GPU. We use this observation to first develop a covert channel attack across two GPUs, achieving the best bandwidth of around 4 MB/s. We also develop a prime and probe attack on a remote GPU allowing an attacker to recover the cache access pattern of another workload. This access pattern can be used in any number of side channel attacks: we demonstrate a proof of concept attack that fingerprints the application running on the remote GPU, with high accuracy. We also develop a proof of concept attack to extract hyperparameters of a machine learning workload. Our work establishes for the first time the vulnerability of these machines to microarchitectural attacks and can guide future research to improve their security.

References

Jaeguk Ahn, Cheolgyu Jin, Jiho Kim, Minsoo Rhu, Yunsi Fei, David Kaeli, and John Kim. 2021. Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack for AES Key Recovery. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 332--344. Google ScholarCross Ref
Jaeguk Ahn, Jiho Kim, Hans Kasan, Leila Delshadtehrani, Wonjun Song, Ajay Joshi, and John Kim. 2021. Network-on-Chip Microarchitecture-Based Covert Channel in GPUs. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 565--577. Google ScholarDigital Library
Jaeguk Ahn, Jiho Kim, Hans Kasan, Leila Delshadtehrani, Wonjun Song, Ajay Joshi, and John Kim. 2021. Network-on-Chip Microarchitecture-Based Covert Channel in GPUs. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 565--577. Google ScholarDigital Library
AMD. 2017. AMD CrossFire guide for Direct3D® 11 applications.Google Scholar
AMD. 2022. Introducing AMD CDNA™ 2 Architecture. https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf.Google Scholar
Sangjin Choi, Taeksoo Kim, Jinwoo Jeong, Rachata Ausavarungnirun, Myeongjae Jeon, Youngjin Kwon, and Jeongseob Ahn. 2022. Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memory. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 625--638. https://www.usenix.org/conference/atc22/presentation/choi-sangjinGoogle Scholar
Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141--142.Google ScholarCross Ref
Leonid Domnitser, Aamer Jaleel, Jason Loew, Nael Abu-Ghazaleh, and Dmitry Ponomarev. 2012. Non-monopolizable caches: Low-complexity mitigation of cache side channel attacks. ACM Transactions on Architecture and Code Optimization (TACO) 8, 4 (2012), 1--21.Google ScholarDigital Library
Sankha Baran Dutta, Hoda Naghibijouybari, Nael Abu-Ghazaleh, Andres Marquez, and Kevin Barker. 2021. Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 972--984. Google ScholarDigital Library
Yiwen Gao, Hailong Zhang, Wei Cheng, Yongbin Zhou, and Yuchen Cao. 2018. Electro-Magnetic Analysis of GPU-Based AES Implementation. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC '18). Association for Computing Machinery, New York, NY, USA, Article 121, 6 pages. Google ScholarDigital Library
Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. 2016. Flush+ Flush: a fast and stealthy cache attack. In Detection of Intrusions and Malware, and Vulnerability Assessment: 13th International Conference, DIMVA 2016, San Sebastián, Spain, July 7--8, 2016, Proceedings 13. Springer, 279--299.Google Scholar
Xing Hu, Ling Liang, Shuangchen Li, Lei Deng, Pengfei Zuo, Yu Ji, Xinfeng Xie, Yufei Ding, Chang Liu, Timothy Sherwood, and Yuan Xie. 2020. DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 385--399. Google ScholarDigital Library
Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2016. Cross processor cache attacks. In Proceedings of the 11th ACM on Asia conference on computer and communications security. 353--364.Google ScholarDigital Library
Saksham Jain, Iljoo Baek, Shige Wang, and Ragunathan Rajkumar. 2019. Fractional GPUs: Software-based compute and memory bandwidth reservation for GPUs. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 29--41.Google ScholarCross Ref
Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P Scarpazza. 2018. Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv preprint arXiv:1804.06826 (2018).Google Scholar
Zhen Hang Jiang, Yunsi Fei, and David Kaeli. 2016. A complete key recovery timing attack on a GPU. In IEEE International Symposium on High Performance Computer Architecture (HPCA'16). IEEE, Barcelona Spain, 394--405. Google ScholarCross Ref
Zhen Hang Jiang, Yunsi Fei, and David Kaeli. 2017. A Novel Side-Channel Timing Attack on GPUs. In Proceedings of the on Great Lakes Symposium on VLSI (VLSI'17). 167--172. Google ScholarDigital Library
Seunghwa Kang, Alex Fender, Joe Eaton, and Brad Rees. 2020. Computing PageRank Scores of Web Crawl Data Using DGX A100 Clusters. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). 1--4. Google ScholarCross Ref
Jingfei Kong, Onur Aciicmez, Jean-Pierre Seifert, and Huiyang Zhou. 2009. Hardware-Software Integrated Approaches to Defend Against Software Cache-based Side Channel Attacks. In Proceedings of the International Symposium on High Performance Comp. Architecture (HPCA).Google ScholarCross Ref
Oak Ridge National Laboratory. 2022. Systems. https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html.Google Scholar
Pacific Northwest National Laboratory. 2016. Nvidia DGX-1 housed in PNNL's campus. https://www.pnnl.gov/science/highlights/highlight.asp?id=4431.Google Scholar
Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Gernot Heiser, and Ruby Lee. 2016. Catalyst: Defeating last-level cache side channel attacks in cloud computing. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA).Google ScholarCross Ref
Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee. 2015. Last-level cache side-channel attacks are practical. In 2015 IEEE symposium on security and privacy. IEEE, 605--622.Google ScholarDigital Library
S. Liu, Y. Wei, J. Chi, F. H. Shezan, and Y. Tian. 2019. Side Channel Attacks in Computation Offloading Systems with GPU Virtualization. In 2019 IEEE Security and Privacy Workshops (SPW). 156--161.Google Scholar
Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, and David Kaeli. 2015. Side-Channel Power Analysis of a GPU AES Implementation. In 33rd IEEE International Conference on Computer Design (ICCD'15). Google ScholarDigital Library
Tobias Mann. 2020. Amazon Amps Up Cloud With Nvidia A100s. https://www.sdxcentral.com/articles/news/amazon-amps-up-cloud-with-nvidia-a100s/2020/11/.Google Scholar
Clémentine Maurice, Christoph Neumann, Olivier Heen, and Aurélien Francillon. 2015. C5: cross-cores cache covert channel. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 46--64.Google ScholarDigital Library
Xinxin Mei and Xiaowen Chu. 2016. Dissecting GPU memory hierarchy through microbenchmarking. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2016), 72--86.Google ScholarDigital Library
Hoda Naghibijouybari, Khaled N. Khasawneh, and Nael Abu-Ghazaleh. 2017. Constructing and Characterizing Covert Channels on GPGPUs. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 354--366.Google Scholar
Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian, and Nael Abu-Ghazaleh. 2018. Rendered Insecure: GPU Side Channel Attacks Are Practical. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada) (CCS '18). Association for Computing Machinery, New York, NY, USA, 2139--2153. Google ScholarDigital Library
Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, and Matei Zaharia. 2021. Efficient Large-Scale Language Model Training on GPU Clusters. CoRR abs/2104.04473 (2021). https://arxiv.org/abs/2104.04473Google Scholar
Ajay Nayak, Pratheek B., Vinod Ganapathy, and Arkaprava Basu. 2021. (Mis)Managed: A Novel TLB-Based Covert Channel on GPUs. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security (Virtual Event, Hong Kong) (ASIA CCS '21). Association for Computing Machinery, New York, NY, USA, 872--885. Google ScholarDigital Library
Nvidia. [n. d.]. https://www.nvidia.com/en-us/data-center/resources/mlperf-benchmarks/.Google Scholar
Nvidia. 2017. NVIDIA DGX-1 System Architecture White Paper.Google Scholar
Nvidia. 2020. GPU-Accelerated Google Clouds, Google Cloud Anthos on NVIDIA DGX A100. https://www.nvidia.com/en-us/data-center/gpu-cloud-computing/google-cloud-platform/#:~:text=NVIDIA%20DGX%20A100%20is%20the,NVIDIA%20GPUs%20within%20Google%20Cloud.Google Scholar
Nvidia. 2021. Nvidia cuda samples. https://docs.nvidia.com/cuda/cuda-samples/index.html.Google Scholar
Nvidia. 2021. Nvidia Multi-Instance GPU. https://www.nvidia.com/en-us/technologies/multi-instance-gpu/.Google Scholar
Nvidia. 2021. Parallel Thread Execution ISA. https://docs.nvidia.com/cuda/pdf/ptx_isa_7.6.pdf.Google Scholar
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).Google Scholar
Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 565--581. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/pesslGoogle ScholarDigital Library
Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. Colt: Coalesced large-reach tlbs. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 258--269.Google ScholarDigital Library
Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2019. MLPerf Inference Benchmark. arXiv:1911.02549 [cs.LG]Google Scholar
Gururaj Saileshwar, Christopher W Fletcher, and Moinuddin Qureshi. 2021. Streamline: a fast, flushless cache covert-channel attack by enabling asynchronous collusion. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 1077--1090.Google ScholarDigital Library
Cirrascale Cloud Services. 2022. Bringing NVIDIA DGX A100 to the Cloud. https://cirrascale.com/platforms-nvidiadgx-a100.php.Google Scholar
Anatoly Shusterman, Lachlan Kang, Yarden Haskal, Yosef Meltser, Prateek Mittal, Yossi Oren, and Yuval Yarom. 2019. Robust website fingerprinting through the cache occupancy channel. In 28th {USENIX} Security Symposium ({USENIX} Security 19). 639--656.Google Scholar
Xin Wang and Wei Zhang. 2020. An Efficient Profiling-Based Side-Channel Attack on Graphics Processing Units. In National Cyber Summit (NCS) Research Track, Kim-Kwang Raymond Choo, Thomas H. Morris, and Gilbert L. Peterson (Eds.). Springer International Publishing, Cham, 126--139.Google Scholar
Zhenghong Wang and Ruby B. Lee. 2007. New Cache Designs for Thwarting Software Cache-based Side Channel Attacks. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google Scholar
Junyi Wei, Yicheng Zhangy, Zhe Zhou, Zhou Liy, and Mohammad Abdullah Al Faruque. 2020. Leaky DNN: Stealing Deep-learning Model Secret with GPU Context-switching Side-channel. In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (Valencia, Spain).Google ScholarCross Ref
Zhenyu Wu, Zhang Xu, and Haining Wang. 2012. Whispers in the Hyper-space: High-speed Covert Channel Attacks in the Cloud. In 21st USENIX Security Symposium (USENIX Security 12). USENIX Association, Bellevue, WA, 159--173. https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/wuGoogle Scholar
Qiumin Xu, Hoda Naghibijouybari, Shibo Wang, Nael Abu-Ghazaleh, and Murali Annavaram. 2019. GPUGuard: Mitigating Contention Based Side and Covert Channel Attacks on GPUs. In Proceedings of the ACM International Conference on Supercomputing (Phoenix, Arizona) (ICS '19). ACM, New York, NY, USA, 497--509. Google ScholarDigital Library
Yuval Yarom and Katrina Falkner. 2014. FLUSH+ RELOAD: A high resolution, low noise, L3 cache side-channel attack. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 719--732.Google Scholar
Heng Zhang, Lingda Li, Donglin Zhuang, Rui Liu, Shuang Song, Dingwen Tao, Yanjun Wu, and Shuaiwen Leon Song. 2021. An Efficient Uncertain Graph Processing Framework for Heterogeneous Architectures (PPoPP '21). Association for Computing Machinery, New York, NY, USA, 477--479. Google ScholarDigital Library
P. Zou, A. Li, K. Barker, and R. Ge. 2019. Fingerprinting Anomalous Computation with RNN for GPU-accelerated HPC Machines. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 253--256.Google Scholar

Index Terms

Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems
1. Computing methodologies
2. Security and privacy

Index terms have been assigned to the content through auto-classification.

Recommendations

Covert- and Side-Channel Attacks on Integrated and Distributed GPU Systems
Read More
Box-counting algorithm on GPU and multi-core CPU: an OpenCL cross-platform study

In this paper, we present the analysis and development of a cross-platform OpenCL implementation of the box-counting algorithm, which is one of the most widely-used methods for estimating the Fractal Dimension. The Fractal Dimension is a relevant image ...
Read More
Optimized HPL for AMD GPU and multi-core CPU usage

The installation of the LOEWE-CSC ( http://csc.uni-frankfurt.de/csc/__ __51 ) supercomputer at the Goethe University in Frankfurt lead to the development of a Linpack which can fully utilize the installed AMD Cypress GPUs. At its core, a fast DGEMM for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture
June 2023
1225 pages
ISBN:9798400700958
DOI:10.1145/3579371
Chair:
Yan Solihin,
General Chair:
Mark Heinrich
University of Central Florida
Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 697
  Total Downloads
- Downloads (Last 12 months)697
- Downloads (Last 6 weeks)57
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Covert- and Side-Channel Attacks on Integrated and Distributed GPU Systems

Box-counting algorithm on GPU and multi-core CPU: an OpenCL cross-platform study

Optimized HPL for AMD GPU and multi-core CPU usage

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Covert- and Side-Channel Attacks on Integrated and Distributed GPU Systems

Box-counting algorithm on GPU and multi-core CPU: an OpenCL cross-platform study

Optimized HPL for AMD GPU and multi-core CPU usage

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media