research-article

SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage

Authors:
Zizhao Wang

University of Sydney, Sydney, Australia

University of Sydney, Sydney, Australia
View Profile

,
Wei Bao

University of Sydney, Sydney, Australia

University of Sydney, Sydney, Australia
View Profile

,
Dong Yuan

University of Sydney, Sydney, Australia

University of Sydney, Sydney, Australia
View Profile

,
Liming Ge

University of Sydney, Sydney, Australia

University of Sydney, Sydney, Australia
View Profile

,
Nguyen H. Tran

University of Sydney, Sydney, Australia

University of Sydney, Sydney, Australia
View Profile

,
Albert Y. Zomaya

University of Sydney, Sydney, Australia

University of Sydney, Sydney, Australia
View Profile

MSWIM '19: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile SystemsNovember 2019Pages 279–288https://doi.org/10.1145/3345768.3355917

Published:25 November 2019Publication History

MSWIM '19: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems

Pages 279–288

ABSTRACT

In recent years, the rapid development of edge computing enables us to process a wide variety of intelligent applications at the edge, such as real-time video analytics. However, edge computing could suffer from service outage caused by the fluctuated wireless connection or congested computing resource. During the service outage, the only choice is to process the deep neural network (DNN) inference at the local mobile devices. The obstacle is that due to the limited resource, it may not be possible to complete inference tasks on time. Inspired by the recently developedearly exit of DNNs, where we can exit DNN at earlier layers to shorten the inference delay by sacrificing an acceptable level of accuracy, we propose to adopt such mechanism to process inference tasks during the service outage. The challenge is how to obtain the optimal schedule with diverse early exit choices. To this end, we formulate an optimal scheduling problem with the objective to maximize a general overall utility. However, the problem is in the form of integer programming, which cannot be solved by a standard approach. We therefore prove the Ordered Scheduling structure, indicating that a frame arrived earlier must be scheduled earlier. Such structure greatly decreases the searching space for an optimal solution. Then, we propose the Scheduling Early Exit (SEE) algorithm based on dynamic programming, to solve the problem optimally with polynomial computational complexity. Finally, we conduct trace-driven simulations and compare SEE with two benchmarks. The result shows that SEE can outperform the benchmarks by 50.9%.

References

2017. LTE upload speed super slow everywhere. Retrieved May 31, 2019 from https://community.verizonwireless.com/t5/iPhone-X-Xr-Xs/LTE-uploadspeed- super-slow-everywhere/td-p/1026178Google Scholar
2019. Recommended upload encoding settings - YouTube Help. Retrieved May 31, 2019 from https://support.google.com/youtube/answer/1722171?hl=enGoogle Scholar
N. Abbas, Y. Zhang, A. Taherkordi, and T. Skeie. 2018. Mobile Edge Computing: A Survey. IEEE Internet of Things Journal 5, 1 (Feb. 2018), pp. 450--465.Google ScholarCross Ref
J. Almeida, V. Almeida, D. Ardagna, Í. Cunha, C. Francalanci, and M. Trubian. 2010. Joint admission control and resource allocation in virtualized servers. J. Parallel and Distrib. Comput. 70, 4 (2010), 344 -- 362.Google ScholarDigital Library
W. Bao, D. Yuan, Z. Yang, S. Wang, B. Zhou, S. Adams, and A. Zomaya. Oct. 2018. sFog: Seamless Fog Computing Environment for Mobile IoT Applications. In Proceedings of ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWIM). Montreal, Canada.Google Scholar
S. Bhattacharya and Nicholas D. Lane. Nov. 2016. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables. In Proceedings of ACM Conference on Embedded Network Sensor Systems (SenSys). Stanford, CA, USA.Google Scholar
T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama. Aug. 2017. Adaptive Neural Networks for Efficient Inference. In Proceedings of International Conference on Machine Learning (ICML). Sydney, NSW, Australia.Google Scholar
J. Chen and X. Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (Aug 2019), 1655--1674.Google ScholarCross Ref
M. Chen and Y. Hao. 2018. Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network. IEEE Journal on Selected Areas in Communications 36, 3 (Mar. 2018), pp. 587--597.Google ScholarCross Ref
X. Chen, L. Jiao,W. Li, and X. Fu. 2016. Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing. IEEE/ACM Transactions on Networking 24, 5 (Oct. 2016), pp. 2795--2808.Google ScholarDigital Library
Z. Fang, D. Hong, and R. K. Gupta. Jun. 2019. Serving Deep Neural Networks at the Cloud Edge for Vision Applications on Mobile Platforms (MMSys). Amherst, MA, USA.Google Scholar
K. Ha, Y. Abe, T. Eiszler, Z. Chen, W. Hu, B. Amos, R. Upadhyaya, P. Pillai, and M. Satyanarayanan. April. 2017. You Can Teach Elephants to Dance: Agile VM Handoff for Edge Computing. In Proceedings of ACM/IEEE Symposium on Edge Computing (SEC). San Jose/Fremont, CA, USA.Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun. Jun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Las Vegas, USA.Google Scholar
C. Hu, W. Bao, D. Wang, and F. Liu. Apr.-May. 2019. Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM). Paris, France.Google Scholar
Y. Kang, J. Hauswald, C Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang. Apr. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Xi'an, China.Google Scholar
Y. Kim, J. Kim, D. Chae, D. Kim, and J. Kim. Mar. 2019. ?Layer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor- Friendly Quantization. In Proceedings of European Conference on Computer Systems (EuroSys). Dresden, Germany.Google Scholar
Alex Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc.Google Scholar
K. Kumar and Y. Lu. 2010. Cloud Computing for Mobile Users: Can Offloading Computation Save Energy? Computer 43, 4 (Apr. 2010), pp. 51--56.Google ScholarDigital Library
L. Liu, H. Li, and M. Gruteser. Oct. 2019. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of Annual International Conference on Mobile Computing and Networking (MobiCom). Los Cabos, Mexico.Google Scholar
J. Luo, J. Wu, and W. Lin. Oct. 2017. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. In Proceedings of IEEE International Conference on Computer Vision (ICCV). Venice, Italy.Google Scholar
L. Ma, S. Yi, and Q. Li. 2017. Efficient Service Handoff Across Edge Servers via Docker Container Migration. In Proceedings of ACM/IEEE Symposium on Edge Computing (SEC). San Jose/Fremont, CA, USA.Google Scholar
P. Mach and Z. Becvar. 2017. Mobile Edge Computing: A Survey on Architecture and Computation Offloading. IEEE Communications Surveys Tutorials 19, 3 (Mar. 2017), pp. 1628--1656.Google ScholarDigital Library
Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief. 2017. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Communications Surveys Tutorials 19, 4 (Aug. 2017), pp. 2322--2358.Google ScholarCross Ref
MATLAB. [n. d.]. Piecewise Cubic Hermite Interpolating Polynomial (PCHIP). Retrieved May 31, 2019 from https://au.mathworks.com/help/matlab/ref/pchip. html#ReferencesGoogle Scholar
C. Pei, Z. Wang, Y. Zhao, Z. Wang, Y. Meng, D. Pei, Y. Peng, W. Tang, and X. Qu. May. 2017. Why it takes so long to connect to a WiFi access point. In Proceedings of IEEE International Conference on Computer Communications (INFOCOM). Atlanta, GA, USA.Google Scholar
D. Satria, D. Park, and M. Jo. 2017. Recovery for overloaded mobile edge computing. Future Generation Computer Systems 70 (May. 2017), pp. 138 -- 147.Google Scholar
V. Sindhwani, T. N. Sainath, and S. Kumar. 2015. Structured Transforms for Small-Footprint Deep Learning. In Advances in Neural Information Processing Systems 28. Curran Associates, Inc.Google Scholar
M. Sun, D. Snyder, Y. Gao, V. Nagaraja, M. Rodehorst, S. Panchapagesan, N. Strom, S. Matsoukas, and S. Vitaladevuni. Aug. 2017. Compressed Time Delay Neural Network for Small-Footprint Keyword Spotting. In Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH). Stockholm, Sweden.Google Scholar
S. Teerapittayanon, B. McDanel, and HT. Kung. Dec. 2016. Branchynet: Fast inference via early exiting from deep neural networks. In Proceedings of International Conference on Pattern Recognition (ICPR). Cancun, Mexifco.Google Scholar
S. Teerapittayanon, B. McDanel, and H. T. Kung. Jun. 2017. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices. In Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS). Atlanta, GA, USA.Google Scholar
X. Wang, F. Yu, Z.Y. Dou, T. Darrell, and J. E. Gonzalez. Sep. 2018. SkipNet: Learning Dynamic Routing in Convolutional Networks. In Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany.Google Scholar
Q. Xia, W. Liang, and W. Xu. Oct. 2013. Throughput maximization for online request admissions in mobile cloudlets. In Proceedings of Annual IEEE Conference on Local Computer Networks. Sydney, NSW, Australia.Google Scholar
M. Xu, F. Qian, M. Zhu, F. Huang, S. Pushp, and X. Liu. 2019. DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning. IEEE Transactions on Mobile Computing (2019).Google Scholar
Y. Zhang, D. Niyato, and P. Wang. 2015. Offloading in Mobile Cloudlet Systems with Intermittent Connectivity. IEEE Transactions on Mobile Computing 14, 12 (Dec. 2015), pp. 2516--2529.Google ScholarDigital Library
Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang. 2019. Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing. CoRR abs/1905.10083 (2019). arXiv:1905.10083Google Scholar

Index Terms

SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage
1. Networks
  1. Network performance evaluation
    1. Network performance modeling

Recommendations

Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Recent advances in Deep Neural Networks (DNNs) have dramatically improved the accuracy of DNN inference, but also introduce larger latency. In this paper, we investigate how to utilize early exit, a novel method that allows inference to exit at earlier ...
Read More
eDeepSave: Saving DNN Inference using Early Exit During Handovers in Mobile Edge Environment
Recent advances in deep neural networks (DNNs) have substantially improved the accuracy of intelligent applications. One effective scheme known as DNN partition further improves the speed of the inference by partitioning the DNN to a mobile device and its ...
Read More
Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy
MECOMM'18: Proceedings of the 2018 Workshop on Mobile Edge Communications

As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MSWIM '19: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems
November 2019
340 pages
ISBN:9781450369046
DOI:10.1145/3345768
General Chair:
Antonio A. F. Loureiro
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Salil Kanhere
University of New South Wales, Australia
,
Paolo Bellavista
University of Bologna, Italy
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 November 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
computation offloading
dnn inference
early exit
edge computing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate398of1,577submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 554
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage

MSWIM '19: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits

eDeepSave: Saving DNN Inference using Early Exit During Handovers in Mobile Edge Environment

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy