ABSTRACT
In recent years, the rapid development of edge computing enables us to process a wide variety of intelligent applications at the edge, such as real-time video analytics. However, edge computing could suffer from service outage caused by the fluctuated wireless connection or congested computing resource. During the service outage, the only choice is to process the deep neural network (DNN) inference at the local mobile devices. The obstacle is that due to the limited resource, it may not be possible to complete inference tasks on time. Inspired by the recently developedearly exit of DNNs, where we can exit DNN at earlier layers to shorten the inference delay by sacrificing an acceptable level of accuracy, we propose to adopt such mechanism to process inference tasks during the service outage. The challenge is how to obtain the optimal schedule with diverse early exit choices. To this end, we formulate an optimal scheduling problem with the objective to maximize a general overall utility. However, the problem is in the form of integer programming, which cannot be solved by a standard approach. We therefore prove the Ordered Scheduling structure, indicating that a frame arrived earlier must be scheduled earlier. Such structure greatly decreases the searching space for an optimal solution. Then, we propose the Scheduling Early Exit (SEE) algorithm based on dynamic programming, to solve the problem optimally with polynomial computational complexity. Finally, we conduct trace-driven simulations and compare SEE with two benchmarks. The result shows that SEE can outperform the benchmarks by 50.9%.
- 2017. LTE upload speed super slow everywhere. Retrieved May 31, 2019 from https://community.verizonwireless.com/t5/iPhone-X-Xr-Xs/LTE-uploadspeed- super-slow-everywhere/td-p/1026178Google Scholar
- 2019. Recommended upload encoding settings - YouTube Help. Retrieved May 31, 2019 from https://support.google.com/youtube/answer/1722171?hl=enGoogle Scholar
- N. Abbas, Y. Zhang, A. Taherkordi, and T. Skeie. 2018. Mobile Edge Computing: A Survey. IEEE Internet of Things Journal 5, 1 (Feb. 2018), pp. 450--465.Google ScholarCross Ref
- J. Almeida, V. Almeida, D. Ardagna, Í. Cunha, C. Francalanci, and M. Trubian. 2010. Joint admission control and resource allocation in virtualized servers. J. Parallel and Distrib. Comput. 70, 4 (2010), 344 -- 362.Google ScholarDigital Library
- W. Bao, D. Yuan, Z. Yang, S. Wang, B. Zhou, S. Adams, and A. Zomaya. Oct. 2018. sFog: Seamless Fog Computing Environment for Mobile IoT Applications. In Proceedings of ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWIM). Montreal, Canada.Google Scholar
- S. Bhattacharya and Nicholas D. Lane. Nov. 2016. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables. In Proceedings of ACM Conference on Embedded Network Sensor Systems (SenSys). Stanford, CA, USA.Google Scholar
- T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama. Aug. 2017. Adaptive Neural Networks for Efficient Inference. In Proceedings of International Conference on Machine Learning (ICML). Sydney, NSW, Australia.Google Scholar
- J. Chen and X. Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (Aug 2019), 1655--1674.Google ScholarCross Ref
- M. Chen and Y. Hao. 2018. Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network. IEEE Journal on Selected Areas in Communications 36, 3 (Mar. 2018), pp. 587--597.Google ScholarCross Ref
- X. Chen, L. Jiao,W. Li, and X. Fu. 2016. Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing. IEEE/ACM Transactions on Networking 24, 5 (Oct. 2016), pp. 2795--2808.Google ScholarDigital Library
- Z. Fang, D. Hong, and R. K. Gupta. Jun. 2019. Serving Deep Neural Networks at the Cloud Edge for Vision Applications on Mobile Platforms (MMSys). Amherst, MA, USA.Google Scholar
- K. Ha, Y. Abe, T. Eiszler, Z. Chen, W. Hu, B. Amos, R. Upadhyaya, P. Pillai, and M. Satyanarayanan. April. 2017. You Can Teach Elephants to Dance: Agile VM Handoff for Edge Computing. In Proceedings of ACM/IEEE Symposium on Edge Computing (SEC). San Jose/Fremont, CA, USA.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. Jun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, USA.Google Scholar
- C. Hu, W. Bao, D. Wang, and F. Liu. Apr.-May. 2019. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM). Paris, France.Google Scholar
- Y. Kang, J. Hauswald, C Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang. Apr. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Xi'an, China.Google Scholar
- Y. Kim, J. Kim, D. Chae, D. Kim, and J. Kim. Mar. 2019. ?Layer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor- Friendly Quantization. In Proceedings of European Conference on Computer Systems (EuroSys). Dresden, Germany.Google Scholar
- Alex Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc.Google Scholar
- K. Kumar and Y. Lu. 2010. Cloud Computing for Mobile Users: Can Offloading Computation Save Energy? Computer 43, 4 (Apr. 2010), pp. 51--56.Google ScholarDigital Library
- L. Liu, H. Li, and M. Gruteser. Oct. 2019. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of Annual International Conference on Mobile Computing and Networking (MobiCom). Los Cabos, Mexico.Google Scholar
- J. Luo, J. Wu, and W. Lin. Oct. 2017. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of IEEE International Conference on Computer Vision (ICCV). Venice, Italy.Google Scholar
- L. Ma, S. Yi, and Q. Li. 2017. Efficient Service Handoff Across Edge Servers via Docker Container Migration. In Proceedings of ACM/IEEE Symposium on Edge Computing (SEC). San Jose/Fremont, CA, USA.Google Scholar
- P. Mach and Z. Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (Mar. 2017), pp. 1628--1656.Google ScholarDigital Library
- Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief. 2017. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Communications Surveys Tutorials 19, 4 (Aug. 2017), pp. 2322--2358.Google ScholarCross Ref
- MATLAB. [n. d.]. Piecewise Cubic Hermite Interpolating Polynomial (PCHIP). Retrieved May 31, 2019 from https://au.mathworks.com/help/matlab/ref/pchip. html#ReferencesGoogle Scholar
- C. Pei, Z. Wang, Y. Zhao, Z. Wang, Y. Meng, D. Pei, Y. Peng, W. Tang, and X. Qu. May. 2017. Why it takes so long to connect to a WiFi access point. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM). Atlanta, GA, USA.Google Scholar
- D. Satria, D. Park, and M. Jo. 2017. Recovery for overloaded mobile edge computing. Future Generation Computer Systems 70 (May. 2017), pp. 138 -- 147.Google Scholar
- V. Sindhwani, T. N. Sainath, and S. Kumar. 2015. Structured Transforms for Small-Footprint Deep Learning. In Advances in Neural Information Processing Systems 28. Curran Associates, Inc.Google Scholar
- M. Sun, D. Snyder, Y. Gao, V. Nagaraja, M. Rodehorst, S. Panchapagesan, N. Strom, S. Matsoukas, and S. Vitaladevuni. Aug. 2017. Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH). Stockholm, Sweden.Google Scholar
- S. Teerapittayanon, B. McDanel, and HT. Kung. Dec. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In Proceedings of International Conference on Pattern Recognition (ICPR). Cancun, Mexifco.Google Scholar
- S. Teerapittayanon, B. McDanel, and H. T. Kung. Jun. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS). Atlanta, GA, USA.Google Scholar
- X. Wang, F. Yu, Z.Y. Dou, T. Darrell, and J. E. Gonzalez. Sep. 2018. SkipNet: Learning Dynamic Routing in Convolutional Networks. In Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany.Google Scholar
- Q. Xia, W. Liang, and W. Xu. Oct. 2013. Throughput maximization for online request admissions in mobile cloudlets. In Proceedings of Annual IEEE Conference on Local Computer Networks. Sydney, NSW, Australia.Google Scholar
- M. Xu, F. Qian, M. Zhu, F. Huang, S. Pushp, and X. Liu. 2019. DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning. IEEE Transactions on Mobile Computing (2019).Google Scholar
- Y. Zhang, D. Niyato, and P. Wang. 2015. Offloading in Mobile Cloudlet Systems with Intermittent Connectivity. IEEE Transactions on Mobile Computing 14, 12 (Dec. 2015), pp. 2516--2529.Google ScholarDigital Library
- Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang. 2019. Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing. CoRR abs/1905.10083 (2019). arXiv:1905.10083Google Scholar
Index Terms
- SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage
Recommendations
Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementRecent advances in Deep Neural Networks (DNNs) have dramatically improved the accuracy of DNN inference, but also introduce larger latency. In this paper, we investigate how to utilize early exit, a novel method that allows inference to exit at earlier ...
eDeepSave: Saving DNN Inference using Early Exit During Handovers in Mobile Edge Environment
Recent advances in deep neural networks (DNNs) have substantially improved the accuracy of intelligent applications. One effective scheme known as DNN partition further improves the speed of the inference by partitioning the DNN to a mobile device and its ...
Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy
MECOMM'18: Proceedings of the 2018 Workshop on Mobile Edge CommunicationsAs the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy ...
Comments