research-article

Free Access

Can the Network be the AI Accelerator?

Authors:
Davide Sanvito

Politecnico di Milano, NEC Laboratories Europe

Politecnico di Milano, NEC Laboratories Europe
View Profile

,
Giuseppe Siracusano

NEC Laboratories Europe

NEC Laboratories Europe
View Profile

,
Roberto Bifulco

NEC Laboratories Europe

NEC Laboratories Europe
View Profile

NetCompute '18: Proceedings of the 2018 Morning Workshop on In-Network ComputingAugust 2018Pages 20–25https://doi.org/10.1145/3229591.3229594

Published:07 August 2018Publication History

NetCompute '18: Proceedings of the 2018 Morning Workshop on In-Network Computing

Pages 20–25

ABSTRACT

Artificial Neural Networks (NNs) play an increasingly important role in many services and applications, contributing significantly to compute infrastructures' workloads. When used in latency sensitive services, NNs are usually processed by CPUs since using an external dedicated hardware accelerator would be inefficient. However, with growing workloads size and complexity, CPUs are hitting their computation limits, requiring the introduction of new specialized hardware accelerators tailored to the task. In this paper we analyze the option to use programmable network devices, such as Network Cards and Switches, as NN accelerators in place of purpose built dedicated hardware. To this end, in this preliminary work we analyze in depth the properties of NN processing on CPUs, derive options to efficiently split such processing, and show that programmable network devices may be a suitable engine for implementing a CPU's NN co-processor.

References

Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming protocol-independent packet processors. ACM SIGCOMM CCR 44, 3 (2014), 87--95. Google ScholarDigital Library
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-action Processing in Hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM (SIGCOMM '13). ACM, New York, NY, USA, 99--110. Google ScholarDigital Library
Huynh Tu Dang, Marco Canini, Fernando Pedone, and Robert Soulé. 2016. Paxos Made Switch-y. SIGCOMM Comput. Commun. Rev. 46, 2 (May 2016), 18--24. Google ScholarDigital Library
Nikos Hardavellas. 2012. The rise and fall of dark silicon. USENIX;login: 37 (2012), 7--17.Google Scholar
Johann Hauswald, Yiping Kang, Michael A Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G Dreslinski, Jason Mars, and Lingjia Tang. 2015. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 27--40. Google ScholarDigital Library
K. Hazelwood, S. Bird, D. Brooks, S. Chintala, U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia, A. Kalro, J. Law, K. Lee, J. Lu, P. Noordhuis, M. Smelyanskiy, L. Xiong, and X. Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 620--629.Google Scholar
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). Curran Associates Inc., USA, 4114--4122. Google ScholarDigital Library
Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, and Ion Stoica. 2017. NetCache: Balancing Key-Value Stores with Fast In-Network Caching. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 121--136. Google ScholarDigital Library
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 1--12. Google ScholarDigital Library
Vadim Karpusenko, Andres Rodriguez, Jacek Czaja, and Mariusz Moczala. 2016. Caffe* Optimized for Intel Architecture: Applying Modern Code Techniques. Technical Report. Intel.Google Scholar
Minje Kim and Paris Smaragdis. 2016. Bitwise Neural Networks. CoRR abs/1601.06071 (2016). arXiv:1601.06071 http://arxiv.org/abs/1601.06071Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarDigital Library
Xiaozhou Li, Raghav Sethi, Michael Kaminsky, David G. Andersen, and Michael J. Freedman. 2016. Be Fast, Cheap and in Control with SwitchKV. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA, 31--44. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/li-xiaozhou Google ScholarDigital Library
Ming Liu, Liang Luo, Jacob Nelson, Luis Ceze, Arvind Krishnamurthy, and Kishore Atreya. 2017. IncBricks: Toward In-Network Computation with an In-Network Cache. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 795--809. Google ScholarDigital Library
Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). ACM, New York, NY, USA, 15--28. Google ScholarDigital Library
Microsoft. 2017. Microsoft unveils Project Brainwave for realtime AI. https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/Google Scholar
Daisuke Miyashita, Edward H Lee, and Boris Murmann. 2016. Convolutional neural networks using logarithmic data representation. arXiv preprint arXiv:1603.01025 (2016).Google Scholar
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 (2016). arXiv:1603.05279 http://arxiv.org/abs/1603.05279Google Scholar
Amedeo Sapio, Ibrahim Abdelaziz, Abdulla Aldilaijan, Marco Canini, and Panos Kalnis. 2017. In-Network Computation is a Dumb Idea Whose Time Has Come. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks, Palo Alto, CA, USA, HotNets 2017, November 30 - December 01, 2017. 150--156. Google ScholarDigital Library
Naveen Kr. Sharma, Antoine Kaufmann, Thomas Anderson, Arvind Krishnamurthy, Jacob Nelson, and Simon Peter. 2017. Evaluating the Power of Flexible Packet Processing for Network Resource Allocation. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 67--82. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/sharma Google ScholarDigital Library
Giuseppe Siracusano and Roberto Bifulco. 2018. In-network Neural Networks. arXiv preprint arXiv:1801.05731 (2018).Google Scholar

Index Terms

Can the Network be the AI Accelerator?

Recommendations

An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...
Read More
FPGAs and Their Evolving Role in Domain Specific Architectures: A Case Study of the AMD 400G Adaptive SmartNIC/DPU SoC
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Domain Specific Architectures (DSA) typically apply heterogeneous compute elements such as FPGAs, GPUs, AI Engines, TPUs, etc. towards solving domain-specific problems, and have their accompanying Domain Specific Software. FPGAs have played a prominent ...
Read More
HyperParser: A High-Performance Parser Architecture for Next Generation Programmable Switch and SmartNIC
APNet '21: Proceedings of the 5th Asia-Pacific Workshop on Networking

Programmable switches and SmartNICs motivate the programmable network. ASIC is adopted in programmable switches to achieve high throughput, and FPGA-based SmartNIC is becoming increasingly popular. The programmable parser is a key element in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

NetCompute '18: Proceedings of the 2018 Morning Workshop on In-Network Computing
August 2018
44 pages
ISBN:9781450359085
DOI:10.1145/3229591

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Computation offloading
Programmable switches
SmartNIC
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 49
  Total Citations
  View Citations
- 2,709
  Total Downloads
- Downloads (Last 12 months)517
- Downloads (Last 6 weeks)74
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Can the Network be the AI Accelerator?

NetCompute '18: Proceedings of the 2018 Morning Workshop on In-Network Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

An FPGA-based accelerator platform implements for convolutional neural network

FPGAs and Their Evolving Role in Domain Specific Architectures: A Case Study of the AMD 400G Adaptive SmartNIC/DPU SoC

HyperParser: A High-Performance Parser Architecture for Next Generation Programmable Switch and SmartNIC

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Can the Network be the AI Accelerator?

NetCompute '18: Proceedings of the 2018 Morning Workshop on In-Network Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

An FPGA-based accelerator platform implements for convolutional neural network

FPGAs and Their Evolving Role in Domain Specific Architectures: A Case Study of the AMD 400G Adaptive SmartNIC/DPU SoC

HyperParser: A High-Performance Parser Architecture for Next Generation Programmable Switch and SmartNIC

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media