research-article

Open Access

SPINN: synergistic progressive inference of neural networks over device and cloud

Authors:
Stefanos Laskaridis

University of Cambridge

University of Cambridge
View Profile

,
Stylianos I. Venieris

University of Cambridge

University of Cambridge
View Profile

,
Mario Almeida

University of Cambridge

University of Cambridge
View Profile

,
Ilias Leontiadis

University of Cambridge

University of Cambridge
View Profile

,
Nicholas D. Lane

University of Cambridge

University of Cambridge
View Profile

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and NetworkingApril 2020Article No.: 37Pages 1–15https://doi.org/10.1145/3372224.3419194

Published:18 September 2020Publication History

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking

Pages 1–15

ABSTRACT

Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a novel scheduler that co-optimises the early-exit policy and the CNN splitting at run time, in order to adapt to dynamic conditions and meet user-defined service-level requirements. Quantitative evaluation illustrates that SPINN outperforms its state-of-the-art collaborative inference counterparts by up to 2× in achieved throughput under varying network conditions, reduces the server cost by up to 6.8× and improves accuracy by 20.7% under latency constraints, while providing robust operation under uncertain connectivity conditions and significant energy savings compared to cloud-centric execution.

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI). 265--283.Google ScholarDigital Library
Mario Almeida, Stefanos Laskaridis, Ilias Leontiadis, Stylianos I. Venieris, and Nicholas D. Lane. 2019. EmBench: Quantifying Performance Variations of Deep Neural Networks Across Modern Commodity Devices. In The 3rd International Workshop on Deep Learning for Mobile Systems and Applications (EMDL) (Seoul, Republic of Korea). 1--6.Google Scholar
Amazon. 2020. Amazon Inferentia ML Chip. https://aws.amazon.com/machine-learning/inferentia/. [Retrieved: August 23, 2020].Google Scholar
Alejandro Cartas, Martin Kocour, Aravindh Raman, Ilias Leontiadis, Jordi Luque, Nishanth Sastry, Jose Nuñez Martinez, Diego Perino, and Carlos Segura. 2019. A Reality Check on Inference at Mobile Networks Edge. In Proceedings of the 2nd International Workshop on Edge Systems, Analytics and Networking (EdgeSys). 54--59.Google ScholarDigital Library
D. Chandrasekar. 2016. AWS Flap Detector: An Efficient Way to Detect Flapping Auto Scaling Groups on AWS Cloud. University of Cincinnati.Google Scholar
E. Chung et al. 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro 38, 2 (2018), 8--20.Google ScholarCross Ref
E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, M. Abeydeera, L. Adams, H. Angepat, C. Boehn, D. Chiou, O. Firestein, A. Forin, K. S. Gatlin, M. Ghandi, S. Heil, K. Holohan, A. El Husseini, T. Juhasz, K. Kagi, R. Kovvuri, S. Lanka, F. van Megen, D. Mukhortov, P. Patel, B. Perez, A. Rapsang, S. Reinhardt, B. Rouhani, A. Sapek, R. Seera, S. Shekar, B. Sridharan, G. Weisz, L. Woods, P. Yi Xiao, D. Zhang, R. Zhao, and D. Burger. 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro 38, 2 (2018), 8--20.Google ScholarCross Ref
A. E. Eshratifar, M. S. Abrishami, and M. Pedram. 2019. JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services. IEEE Transactions on Mobile Computing (TMC) (2019).Google Scholar
Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom). 115--127.Google ScholarDigital Library
L. Fei-Fei, J. Deng, and K. Li. 2010. ImageNet: Constructing a large-scale image database. Journal of Vision 9, 8 (2010), 1037--1037.Google ScholarCross Ref
L. Fridman, D. E. Brown, M. Glazer, W. Angell, S. Dodd, B. Jenik, J. Terwilliger, A. Patsekin, J. Kindelsberger, L. Ding, S. Seaman, A. Mehler, A. Sipperley, A. Pettinato, B. D. Seppelt, L. Angell, B. Mehler, and B. Reimer. 2019. MIT Advanced Vehicle Technology Study: Large-Scale Naturalistic Driving Study of Driver Behavior and Interaction With Automation. IEEE Access 7 (2019), 102021--102038.Google ScholarCross Ref
Evangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj Kalamkar, Greg Henry, Hans Pabst, and Alexander Heinecke. 2018. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) (SC '18). IEEE Press, Article 66, 12 pages.Google ScholarDigital Library
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML). 1321--1330.Google Scholar
Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2017. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 37, 1 (2017), 35--47.Google ScholarCross Ref
Selim Gurun, Chandra Krintz, and Rich Wolski. 2004. NWSLite: A Light-Weight Prediction Utility for Mobile Devices. In Proceedings of the 2nd International Conference on Mobile Systems, Applications, and Services (MobiSys). 2--11.Google ScholarDigital Library
P. Gysel, J. Pimentel, M. Motamedi, and S. Ghiasi. 2018. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 29, 11 (2018), 5784--5789.Google ScholarCross Ref
Song Han, Huizi Mao, and William J Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. International Conference on Learning Representations (ICLR) (2016).Google Scholar
Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys). 123--136.Google ScholarDigital Library
K. Hazelwood, S. Bird, D. Brooks, S. Chintala, U. Diril, D. Dzhulgakov, M. Fawzy, B. Jia, Y. Jia, A. Kalro, J. Law, K. Lee, J. Lu, P. Noordhuis, M. Smelyanskiy, L. Xiong, and X. Wang. 2018. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 620--629.Google Scholar
K He, X Zhang, S Ren, and J Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google Scholar
Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, and Onur Mutlu. 2018. Focus: Querying Large Video Datasets with Low Latency and Low Cost. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI). USENIX Association, 269--286.Google Scholar
C. Hu, W. Bao, D. Wang, and F. Liu. 2019. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In IEEE INFOCOM 2019 - IEEE Conference on Computer Communications. 1423--1431.Google Scholar
Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Weinberger. 2018. Multi-Scale Dense Networks for Resource Efficient Image Classification. In International Conference on Learning Representations (ICLR).Google Scholar
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. J. Mach. Learn. Res. 18, 1 (2017), 6869--6898.Google ScholarDigital Library
Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI Benchmark: All About Deep Learning on Smart-phones in 2019. In International Conference on Computer Vision (ICCV) Workshops.Google ScholarCross Ref
UK ISPs. 2020. 4G Mobile Network Experience Report. https://www.opensignal.com/reports/2019/04/uk/mobile-network-experience.Google Scholar
UK ISPs. 2020. 5G Mobile Network Report. https://www.opensignal.com/2020/02/20/how-att-sprint-t-mobile-and-verizon-differ-in-their-early-5g-approach.Google Scholar
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2704--2713.Google Scholar
Hyuk-Jin Jeong, Hyeon-Jae Lee, Chang Hyun Shin, and Soo-Mook Moon. 2018. IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). 401--411.Google ScholarDigital Library
Yu Ji, Youhui Zhang, Wenguang Chen, and Yuan Xie. 2018. Bridge the Gap Between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 448--460.Google ScholarDigital Library
Jian Ouyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, and Song Jiang. 2014. SDA: Software-defined accelerator for large-scale DNN systems. In 2014 IEEE Hot Chips 26 Symposium (HCS). 1--23.Google ScholarCross Ref
Norman P. Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA). ACM, 1--12.Google Scholar
Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2017. NoScope: Optimizing Neural Network Queries over Video at Scale. Proc. VLDB Endow. 10, 11 (2017), 1586--1597.Google ScholarDigital Library
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 615--629.Google ScholarDigital Library
Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2019. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking. In International Conference on Machine Learning (ICML). 3301--3310.Google Scholar
Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. μLayer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization. In Proceedings of the Fourteenth EuroSys Conference 2019. 45:1--45:15.Google ScholarDigital Library
A. Kouris and C. Bouganis. 2018. Learning to Fly by My-Self: A Self-Supervised CNN-Based Approach for Autonomous Navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 1--9.Google Scholar
A. Kouris, S. I. Venieris, and C. Bouganis. 2018. CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). 155--1557.Google Scholar
A. Kouris, S. I. Venieris, and C. Bouganis. 2020. A Throughput-Latency Co-Optimised Cascade of Convolutional Neural Network Classifiers. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1656--1661.Google Scholar
C. Kozyrakis. 2013. Resource Efficient Computing for Warehouse-scale Datacenters. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE). 1351--1356.Google Scholar
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. Technical Report.Google Scholar
V. K. Kukkala, J. Tunnell, S. Pasricha, and T. Bradley. 2018. Advanced Driver-Assistance Systems: A Path Toward Autonomous Vehicles. IEEE Consumer Electronics Magazine 7, 5 (2018), 18--25.Google ScholarCross Ref
N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qendro, and F. Kawsar. 2016. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). 1--12.Google ScholarDigital Library
Stefanos Laskaridis, Stylianos I. Venieris, Hyeji Kim, and Nicholas D. Lane. 2020. HAPI: Hardware-Aware Progressive Inference. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD).Google Scholar
Royson Lee, Stylianos I. Venieris, Lukasz Dudziak, Sourav Bhattacharya, and Nicholas D. Lane. 2019. MobiSR: Efficient On-Device Super-Resolution Through Heterogeneous Mobile Processors. In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom).Google Scholar
E. Li, L. Zeng, Z. Zhou, and X. Chen. 2020. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. IEEE Transactions on Wireless Communications (TWC) (2020), 447--457.Google Scholar
Hongshan Li, Chenghao Hu, Jingyan Jiang, Zhi Wang, Yonggang Wen, and Wenwu Zhu. 2019. JALAD: Joint Accuracy-And Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution. In Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS). 671--678.Google Scholar
Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, and Gao Huang. 2019. Improved Techniques for Training Adaptive Deep Networks. In International Conference on Computer Vision (ICCV).Google Scholar
Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN Model Inference on CPUs. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 1025--1040.Google Scholar
Jiachen Mao, Xiang Chen, Kent W. Nixon, Christopher Krieger, and Yiran Chen. 2017. MoDNN: Local distributed mobile computing system for Deep Neural Network. Proceedings of the 2017 Design, Automation and Test in Europe (DATE) (2017), 1396--1401.Google ScholarCross Ref
Jiachen Mao, Zhongda Yang, Wei Wen, Chunpeng Wu, Linghao Song, Kent W. Nixon, Xiang Chen, Hai Li, and Yiran Chen. 2017. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs. IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (2017), 751--756.Google ScholarCross Ref
R Timothy Marler and Jasbir S Arora. 2004. Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 6 (2004), 369--395.Google Scholar
Szymon Migacz. 2017. 8-bit Inference with TensorRT. In GPU Technology Conference.Google Scholar
Vinod Nair and Geoffrey E Hinton. 2010. Rectified Linear Units improve Restricted Boltzmann Machines. In International Conference on Machine Learning (ICML). 807--814.Google ScholarDigital Library
Intel Nervana. 2020. Nervana's Early Exit Inference. https://nervanasystems.github.io/distiller/algo_earlyexit.html. [Retrieved: August 23, 2020].Google Scholar
Miloš Nikolić, Mostafa Mahmoud, Andreas Moshovos, Yiren Zhao, and Robert Mullins. 2019. Characterizing Sources of Ineffectual Computations in Deep Learning Networks. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 165--176.Google Scholar
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. SOCK: Rapid Task Provisioning with Serverless-Optimized Containers. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 57--70.Google ScholarDigital Library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems (NeurIPS). 8026--8037.Google ScholarDigital Library
Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, and Jascha Sohl-Dickstein. 2017. On the Expressive Power of Deep Neural Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), Vol. 70. 2847--2854.Google Scholar
M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, Y. Kwon, and S. W. Keckler. 2018. Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 78--91.Google Scholar
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4510--4520.Google ScholarCross Ref
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From High-level Deep Neural Models to FPGAs. In IEEE/ACM International Symposium on Microarchitecture (MICRO). 17:1--17:12.Google ScholarCross Ref
K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (ICLR).Google Scholar
Ashish Singh and Kakali Chatterjee. 2017. Cloud security issues and challenges: A survey. Journal of Network and Computer Applications 79 (2017), 88--115.Google ScholarDigital Library
Muthian Sivathanu, Tapan Chugh, Sanjay S. Singapuram, and Lidong Zhou. 2019. Astra: Exploiting Predictability to Optimize Deep Learning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 909--923.Google ScholarDigital Library
N. Smolyanskiy, A. Kamenev, J. Smith, and S. Birchfield. 2017. Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 4241--4247.Google Scholar
Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, and Hervé Jégou. 2020. And the Bit Goes Down: Revisiting the Quantization of Neural Networks. In International Conference on Learning Representations (ICLR).Google Scholar
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. 2017. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In AAAI Conference on Artificial Intelligence.Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2818--2826.Google Scholar
Willy Tarreau et al. 2012. HAProxy-the reliable, high-performance TCP/HTTP load balancer.Google Scholar
Ben Taylor, Vicent Sanz Marco, Willy Wolff, Yehia Elkhatib, and Zheng Wang. 2018. Adaptive Deep Learning Model Selection on Embedded Systems. In Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES) (LCTES 2018). 31--43.Google ScholarDigital Library
Surat Teerapittayanon, Bradley McDanel, and HT Kung. 2016. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). 2464--2469.Google ScholarCross Ref
S. Teerapittayanon, B. McDanel, and H. T. Kung. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 328--339.Google Scholar
Tenstorrent. 2020. Tenstorrent's Grayskull AI Chip. https://www.tenstorrent.com/technology/. [Retrieved: August 23, 2020].Google Scholar
S. I. Venieris and C. Bouganis. 2019. fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 30, 2 (2019), 326--342.Google ScholarCross Ref
Liang Wang, Mario Almeida, Jeremy Blackburn, and Jon Crowcroft. 2016. C3PO: Computation Congestion Control (PrOactive). In Proceedings of the 3rd ACM Conference on Information-Centric Networking (ACM-ICN '16). 231--236.Google ScholarDigital Library
Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. 2018. Peeking Behind the Curtains of Serverless Platforms. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 133--146.Google ScholarDigital Library
S. Wang, A. Pathania, and T. Mitra. 2020. Neural Network Inference on Mobile SoCs. IEEE Design Test (2020).Google Scholar
Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference (DAC). 29:1--29:6.Google ScholarDigital Library
C. Wu, D. Brooks, K. Chen, D. Chen, S. Choudhury, M. Dukhan, K. Hazelwood, E. Isaac, Y. Jia, B. Jia, T. Leyvand, H. Lu, Y. Lu, L. Qiao, B. Reagen, J. Spisak, F. Sun, A. Tulloch, P. Vajda, X. Wang, Y. Wang, B. Wasti, Y. Wu, R. Xian, S. Yoo, and P. Zhang. 2019. Machine Learning at Facebook: Understanding Inference at the Edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 331--344.Google Scholar
Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin. 2020. DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 2246--2251.Google ScholarCross Ref
Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. 2015. Empirical Evaluation of Rectified Activations in Convolutional Network. In CoRR.Google Scholar
Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Linfeng Zhang, Zhanhong Tan, Jiebo Song, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. SCAN: A Scalable Neural Networks Framework Towards Compact and Efficient Models. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerstlauer. 2018. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 37 (2018), 2348--2359.Google ScholarCross Ref
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights. In International Conference on Learning Representations (ICLR).Google Scholar

Index Terms

SPINN: synergistic progressive inference of neural networks over device and cloud
1. Computing methodologies
  1. Distributed computing methodologies
2. Human-centered computing
  1. Ubiquitous and mobile computing

Recommendations

A Conceptual Platform of SLA in Cloud Computing
DASC '11: Proceedings of the 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing

Cloud computing is a promising technology, where the infrastructure, developing platform, software and storage are delivered as a service. With the development of cloud computing, more and more cloud service providers emerge. However, there are no ...
Read More
Cloud service engineering
ICSE '10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2

Building on compute and storage virtualization, Cloud Computing provides scalable, network-centric, abstracted IT infrastructure, platforms, and applications as on-demand services that are billed by consumption. Cloud Service Engineering is the ...
Read More
Monitoring-based auto-scalability across hybrid clouds
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Cloud computing is a relatively new type of Internet-based computing that becomes more and more popular. Using methods like virtualization, adopting architectures based on microservices, automation of building and deployment processes, Cloud could ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking
April 2020
621 pages
ISBN:9781450370851
DOI:10.1145/3372224

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 September 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate440of2,972submissions,15%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 136
  Total Citations
  View Citations
- 3,768
  Total Downloads
- Downloads (Last 12 months)1,260
- Downloads (Last 6 weeks)172
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SPINN: synergistic progressive inference of neural networks over device and cloud

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Conceptual Platform of SLA in Cloud Computing

Cloud service engineering

Monitoring-based auto-scalability across hybrid clouds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SPINN: synergistic progressive inference of neural networks over device and cloud

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Conceptual Platform of SLA in Cloud Computing

Cloud service engineering

Monitoring-based auto-scalability across hybrid clouds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media