ABSTRACT
Artificial Neural Networks (NNs) play an increasingly important role in many services and applications, contributing significantly to compute infrastructures' workloads. When used in latency sensitive services, NNs are usually processed by CPUs since using an external dedicated hardware accelerator would be inefficient. However, with growing workloads size and complexity, CPUs are hitting their computation limits, requiring the introduction of new specialized hardware accelerators tailored to the task. In this paper we analyze the option to use programmable network devices, such as Network Cards and Switches, as NN accelerators in place of purpose built dedicated hardware. To this end, in this preliminary work we analyze in depth the properties of NN processing on CPUs, derive options to efficiently split such processing, and show that programmable network devices may be a suitable engine for implementing a CPU's NN co-processor.
- Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming protocol-independent packet processors. ACM SIGCOMM CCR 44, 3 (2014), 87--95. Google ScholarDigital Library
- Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-action Processing in Hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13). ACM, New York, NY, USA, 99--110. Google ScholarDigital Library
- Huynh Tu Dang, Marco Canini, Fernando Pedone, and Robert Soulé. 2016. Paxos Made Switch-y. SIGCOMM Comput. Commun. Rev. 46, 2 (May 2016), 18--24. Google ScholarDigital Library
- Nikos Hardavellas. 2012. The rise and fall of dark silicon. USENIX;login: 37 (2012), 7--17.Google Scholar
- Johann Hauswald, Yiping Kang, Michael A Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G Dreslinski, Jason Mars, and Lingjia Tang. 2015. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 27--40. Google ScholarDigital Library
- K. Hazelwood, S. Bird, D. Brooks, S. Chintala, U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia, A. Kalro, J. Law, K. Lee, J. Lu, P. Noordhuis, M. Smelyanskiy, L. Xiong, and X. Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 620--629.Google Scholar
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). Curran Associates Inc., USA, 4114--4122. Google ScholarDigital Library
- Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. 2017. NetCache: Balancing Key-Value Stores with Fast In-Network Caching. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 121--136. Google ScholarDigital Library
- Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 1--12. Google ScholarDigital Library
- Vadim Karpusenko, Andres Rodriguez, Jacek Czaja, and Mariusz Moczala. 2016. Caffe* Optimized for Intel Architecture: Applying Modern Code Techniques. Technical Report. Intel.Google Scholar
- Minje Kim and Paris Smaragdis. 2016. Bitwise Neural Networks. CoRR abs/1601.06071 (2016). arXiv:1601.06071 http://arxiv.org/abs/1601.06071Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
- Xiaozhou Li, Raghav Sethi, Michael Kaminsky, David G. Andersen, and Michael J. Freedman. 2016. Be Fast, Cheap and in Control with SwitchKV. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA, 31--44. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/li-xiaozhou Google ScholarDigital Library
- Ming Liu, Liang Luo, Jacob Nelson, Luis Ceze, Arvind Krishnamurthy, and Kishore Atreya. 2017. IncBricks: Toward In-Network Computation with an In-Network Cache. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 795--809. Google ScholarDigital Library
- Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). ACM, New York, NY, USA, 15--28. Google ScholarDigital Library
- Microsoft. 2017. Microsoft unveils Project Brainwave for realtime AI. https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/Google Scholar
- Daisuke Miyashita, Edward H Lee, and Boris Murmann. 2016. Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025 (2016).Google Scholar
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 (2016). arXiv:1603.05279 http://arxiv.org/abs/1603.05279Google Scholar
- Amedeo Sapio, Ibrahim Abdelaziz, Abdulla Aldilaijan, Marco Canini, and Panos Kalnis. 2017. In-Network Computation is a Dumb Idea Whose Time Has Come. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks, Palo Alto, CA, USA, HotNets 2017, November 30 - December 01, 2017. 150--156. Google ScholarDigital Library
- Naveen Kr. Sharma, Antoine Kaufmann, Thomas Anderson, Arvind Krishnamurthy, Jacob Nelson, and Simon Peter. 2017. Evaluating the Power of Flexible Packet Processing for Network Resource Allocation. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 67--82. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/sharma Google ScholarDigital Library
- Giuseppe Siracusano and Roberto Bifulco. 2018. In-network Neural Networks. arXiv preprint arXiv:1801.05731 (2018).Google Scholar
Index Terms
- Can the Network be the AI Accelerator?
Recommendations
An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and CommunicationsIn recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...
FPGAs and Their Evolving Role in Domain Specific Architectures: A Case Study of the AMD 400G Adaptive SmartNIC/DPU SoC
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate ArraysDomain Specific Architectures (DSA) typically apply heterogeneous compute elements such as FPGAs, GPUs, AI Engines, TPUs, etc. towards solving domain-specific problems, and have their accompanying Domain Specific Software. FPGAs have played a prominent ...
HyperParser: A High-Performance Parser Architecture for Next Generation Programmable Switch and SmartNIC
APNet '21: Proceedings of the 5th Asia-Pacific Workshop on NetworkingProgrammable switches and SmartNICs motivate the programmable network. ASIC is adopted in programmable switches to achieve high throughput, and FPGA-based SmartNIC is becoming increasingly popular. The programmable parser is a key element in ...
Comments