ABSTRACT
The deep learning revolution has been enabled in large part by GPUs, and more recently accelerators, which make it possible to carry out computationally demanding training and inference in acceptable times. As the size of machine learning networks and workloads continues to increase, multi-GPU machines have emerged as an important platform offered on High Performance Computing and cloud data centers. Since these machines are shared among multiple users, it becomes increasingly important to protect applications against potential attacks. In this paper, we explore the vulnerability of Nvidia's DGX multi-GPU machines to covert and side channel attacks. These machines consist of a number of discrete GPUs that are interconnected through a combination of custom interconnect (NVLink) and PCIe connections. We reverse engineer the interconnected cache hierarchy and show that it is possible for an attacker on one GPU to cause contention on the L2 cache of another GPU. We use this observation to first develop a covert channel attack across two GPUs, achieving the best bandwidth of around 4 MB/s. We also develop a prime and probe attack on a remote GPU allowing an attacker to recover the cache access pattern of another workload. This access pattern can be used in any number of side channel attacks: we demonstrate a proof of concept attack that fingerprints the application running on the remote GPU, with high accuracy. We also develop a proof of concept attack to extract hyperparameters of a machine learning workload. Our work establishes for the first time the vulnerability of these machines to microarchitectural attacks and can guide future research to improve their security.
- Jaeguk Ahn, Cheolgyu Jin, Jiho Kim, Minsoo Rhu, Yunsi Fei, David Kaeli, and John Kim. 2021. Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack for AES Key Recovery. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 332--344. Google ScholarCross Ref
- Jaeguk Ahn, Jiho Kim, Hans Kasan, Leila Delshadtehrani, Wonjun Song, Ajay Joshi, and John Kim. 2021. Network-on-Chip Microarchitecture-Based Covert Channel in GPUs. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 565--577. Google ScholarDigital Library
- Jaeguk Ahn, Jiho Kim, Hans Kasan, Leila Delshadtehrani, Wonjun Song, Ajay Joshi, and John Kim. 2021. Network-on-Chip Microarchitecture-Based Covert Channel in GPUs. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 565--577. Google ScholarDigital Library
- AMD. 2017. AMD CrossFire guide for Direct3D® 11 applications.Google Scholar
- AMD. 2022. Introducing AMD CDNA™ 2 Architecture. https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf.Google Scholar
- Sangjin Choi, Taeksoo Kim, Jinwoo Jeong, Rachata Ausavarungnirun, Myeongjae Jeon, Youngjin Kwon, and Jeongseob Ahn. 2022. Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memory. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 625--638. https://www.usenix.org/conference/atc22/presentation/choi-sangjinGoogle Scholar
- Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141--142.Google ScholarCross Ref
- Leonid Domnitser, Aamer Jaleel, Jason Loew, Nael Abu-Ghazaleh, and Dmitry Ponomarev. 2012. Non-monopolizable caches: Low-complexity mitigation of cache side channel attacks. ACM Transactions on Architecture and Code Optimization (TACO) 8, 4 (2012), 1--21.Google ScholarDigital Library
- Sankha Baran Dutta, Hoda Naghibijouybari, Nael Abu-Ghazaleh, Andres Marquez, and Kevin Barker. 2021. Leaky Buddies: Cross-Component Covert Channels on Integrated CPU-GPU Systems. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 972--984. Google ScholarDigital Library
- Yiwen Gao, Hailong Zhang, Wei Cheng, Yongbin Zhou, and Yuchen Cao. 2018. Electro-Magnetic Analysis of GPU-Based AES Implementation. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC '18). Association for Computing Machinery, New York, NY, USA, Article 121, 6 pages. Google ScholarDigital Library
- Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. 2016. Flush+ Flush: a fast and stealthy cache attack. In Detection of Intrusions and Malware, and Vulnerability Assessment: 13th International Conference, DIMVA 2016, San Sebastián, Spain, July 7--8, 2016, Proceedings 13. Springer, 279--299.Google Scholar
- Xing Hu, Ling Liang, Shuangchen Li, Lei Deng, Pengfei Zuo, Yu Ji, Xinfeng Xie, Yufei Ding, Chang Liu, Timothy Sherwood, and Yuan Xie. 2020. DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 385--399. Google ScholarDigital Library
- Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2016. Cross processor cache attacks. In Proceedings of the 11th ACM on Asia conference on computer and communications security. 353--364.Google ScholarDigital Library
- Saksham Jain, Iljoo Baek, Shige Wang, and Ragunathan Rajkumar. 2019. Fractional GPUs: Software-based compute and memory bandwidth reservation for GPUs. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 29--41.Google ScholarCross Ref
- Zhe Jia, Marco Maggioni, Benjamin Staiger, and Daniele P Scarpazza. 2018. Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv preprint arXiv:1804.06826 (2018).Google Scholar
- Zhen Hang Jiang, Yunsi Fei, and David Kaeli. 2016. A complete key recovery timing attack on a GPU. In IEEE International Symposium on High Performance Computer Architecture (HPCA'16). IEEE, Barcelona Spain, 394--405. Google ScholarCross Ref
- Zhen Hang Jiang, Yunsi Fei, and David Kaeli. 2017. A Novel Side-Channel Timing Attack on GPUs. In Proceedings of the on Great Lakes Symposium on VLSI (VLSI'17). 167--172. Google ScholarDigital Library
- Seunghwa Kang, Alex Fender, Joe Eaton, and Brad Rees. 2020. Computing PageRank Scores of Web Crawl Data Using DGX A100 Clusters. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). 1--4. Google ScholarCross Ref
- Jingfei Kong, Onur Aciicmez, Jean-Pierre Seifert, and Huiyang Zhou. 2009. Hardware-Software Integrated Approaches to Defend Against Software Cache-based Side Channel Attacks. In Proceedings of the International Symposium on High Performance Comp. Architecture (HPCA).Google ScholarCross Ref
- Oak Ridge National Laboratory. 2022. Systems. https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html.Google Scholar
- Pacific Northwest National Laboratory. 2016. Nvidia DGX-1 housed in PNNL's campus. https://www.pnnl.gov/science/highlights/highlight.asp?id=4431.Google Scholar
- Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Gernot Heiser, and Ruby Lee. 2016. Catalyst: Defeating last-level cache side channel attacks in cloud computing. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA).Google ScholarCross Ref
- Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee. 2015. Last-level cache side-channel attacks are practical. In 2015 IEEE symposium on security and privacy. IEEE, 605--622.Google ScholarDigital Library
- S. Liu, Y. Wei, J. Chi, F. H. Shezan, and Y. Tian. 2019. Side Channel Attacks in Computation Offloading Systems with GPU Virtualization. In 2019 IEEE Security and Privacy Workshops (SPW). 156--161.Google Scholar
- Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, and David Kaeli. 2015. Side-Channel Power Analysis of a GPU AES Implementation. In 33rd IEEE International Conference on Computer Design (ICCD'15). Google ScholarDigital Library
- Tobias Mann. 2020. Amazon Amps Up Cloud With Nvidia A100s. https://www.sdxcentral.com/articles/news/amazon-amps-up-cloud-with-nvidia-a100s/2020/11/.Google Scholar
- Clémentine Maurice, Christoph Neumann, Olivier Heen, and Aurélien Francillon. 2015. C5: cross-cores cache covert channel. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 46--64.Google ScholarDigital Library
- Xinxin Mei and Xiaowen Chu. 2016. Dissecting GPU memory hierarchy through microbenchmarking. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2016), 72--86.Google ScholarDigital Library
- Hoda Naghibijouybari, Khaled N. Khasawneh, and Nael Abu-Ghazaleh. 2017. Constructing and Characterizing Covert Channels on GPGPUs. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 354--366.Google Scholar
- Hoda Naghibijouybari, Ajaya Neupane, Zhiyun Qian, and Nael Abu-Ghazaleh. 2018. Rendered Insecure: GPU Side Channel Attacks Are Practical. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (Toronto, Canada) (CCS '18). Association for Computing Machinery, New York, NY, USA, 2139--2153. Google ScholarDigital Library
- Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, and Matei Zaharia. 2021. Efficient Large-Scale Language Model Training on GPU Clusters. CoRR abs/2104.04473 (2021). https://arxiv.org/abs/2104.04473Google Scholar
- Ajay Nayak, Pratheek B., Vinod Ganapathy, and Arkaprava Basu. 2021. (Mis)Managed: A Novel TLB-Based Covert Channel on GPUs. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security (Virtual Event, Hong Kong) (ASIA CCS '21). Association for Computing Machinery, New York, NY, USA, 872--885. Google ScholarDigital Library
- Nvidia. [n. d.]. https://www.nvidia.com/en-us/data-center/resources/mlperf-benchmarks/.Google Scholar
- Nvidia. 2017. NVIDIA DGX-1 System Architecture White Paper.Google Scholar
- Nvidia. 2020. GPU-Accelerated Google Clouds, Google Cloud Anthos on NVIDIA DGX A100. https://www.nvidia.com/en-us/data-center/gpu-cloud-computing/google-cloud-platform/#:~:text=NVIDIA%20DGX%20A100%20is%20the,NVIDIA%20GPUs%20within%20Google%20Cloud.Google Scholar
- Nvidia. 2021. Nvidia cuda samples. https://docs.nvidia.com/cuda/cuda-samples/index.html.Google Scholar
- Nvidia. 2021. Nvidia Multi-Instance GPU. https://www.nvidia.com/en-us/technologies/multi-instance-gpu/.Google Scholar
- Nvidia. 2021. Parallel Thread Execution ISA. https://docs.nvidia.com/cuda/pdf/ptx_isa_7.6.pdf.Google Scholar
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017).Google Scholar
- Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 565--581. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/pesslGoogle ScholarDigital Library
- Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. Colt: Coalesced large-reach tlbs. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 258--269.Google ScholarDigital Library
- Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, and Yuchen Zhou. 2019. MLPerf Inference Benchmark. arXiv:1911.02549 [cs.LG]Google Scholar
- Gururaj Saileshwar, Christopher W Fletcher, and Moinuddin Qureshi. 2021. Streamline: a fast, flushless cache covert-channel attack by enabling asynchronous collusion. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 1077--1090.Google ScholarDigital Library
- Cirrascale Cloud Services. 2022. Bringing NVIDIA DGX A100 to the Cloud. https://cirrascale.com/platforms-nvidiadgx-a100.php.Google Scholar
- Anatoly Shusterman, Lachlan Kang, Yarden Haskal, Yosef Meltser, Prateek Mittal, Yossi Oren, and Yuval Yarom. 2019. Robust website fingerprinting through the cache occupancy channel. In 28th {USENIX} Security Symposium ({USENIX} Security 19). 639--656.Google Scholar
- Xin Wang and Wei Zhang. 2020. An Efficient Profiling-Based Side-Channel Attack on Graphics Processing Units. In National Cyber Summit (NCS) Research Track, Kim-Kwang Raymond Choo, Thomas H. Morris, and Gilbert L. Peterson (Eds.). Springer International Publishing, Cham, 126--139.Google Scholar
- Zhenghong Wang and Ruby B. Lee. 2007. New Cache Designs for Thwarting Software Cache-based Side Channel Attacks. In Proceedings of the International Symposium on Computer Architecture (ISCA).Google Scholar
- Junyi Wei, Yicheng Zhangy, Zhe Zhou, Zhou Liy, and Mohammad Abdullah Al Faruque. 2020. Leaky DNN: Stealing Deep-learning Model Secret with GPU Context-switching Side-channel. In Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (Valencia, Spain).Google ScholarCross Ref
- Zhenyu Wu, Zhang Xu, and Haining Wang. 2012. Whispers in the Hyper-space: High-speed Covert Channel Attacks in the Cloud. In 21st USENIX Security Symposium (USENIX Security 12). USENIX Association, Bellevue, WA, 159--173. https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/wuGoogle Scholar
- Qiumin Xu, Hoda Naghibijouybari, Shibo Wang, Nael Abu-Ghazaleh, and Murali Annavaram. 2019. GPUGuard: Mitigating Contention Based Side and Covert Channel Attacks on GPUs. In Proceedings of the ACM International Conference on Supercomputing (Phoenix, Arizona) (ICS '19). ACM, New York, NY, USA, 497--509. Google ScholarDigital Library
- Yuval Yarom and Katrina Falkner. 2014. FLUSH+ RELOAD: A high resolution, low noise, L3 cache side-channel attack. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 719--732.Google Scholar
- Heng Zhang, Lingda Li, Donglin Zhuang, Rui Liu, Shuang Song, Dingwen Tao, Yanjun Wu, and Shuaiwen Leon Song. 2021. An Efficient Uncertain Graph Processing Framework for Heterogeneous Architectures (PPoPP '21). Association for Computing Machinery, New York, NY, USA, 477--479. Google ScholarDigital Library
- P. Zou, A. Li, K. Barker, and R. Ge. 2019. Fingerprinting Anomalous Computation with RNN for GPU-accelerated HPC Machines. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 253--256.Google Scholar
Index Terms
- Spy in the GPU-box: Covert and Side Channel Attacks on Multi-GPU Systems
Recommendations
Box-counting algorithm on GPU and multi-core CPU: an OpenCL cross-platform study
In this paper, we present the analysis and development of a cross-platform OpenCL implementation of the box-counting algorithm, which is one of the most widely-used methods for estimating the Fractal Dimension. The Fractal Dimension is a relevant image ...
Optimized HPL for AMD GPU and multi-core CPU usage
The installation of the LOEWE-CSC ( http://csc.uni-frankfurt.de/csc/__ __51 ) supercomputer at the Goethe University in Frankfurt lead to the development of a Linpack which can fully utilize the installed AMD Cypress GPUs. At its core, a fast DGEMM for ...
Comments