ABSTRACT
As mobile devices continuously generate streams of images and videos, a new class of mobile deep vision applications are rapidly emerging, which usually involve running deep neural networks on these multimedia data in real-time. To support such applications, having mobile devices offload the computation, especially the neural network inference, to edge clouds has proved effective. Existing solutions often assume there exists a dedicated and powerful server, to which the entire inference can be offloaded. In reality, however, we may not be able to find such a server but need to make do with less powerful ones. To address these more practical situations, we propose to partition the video frame and offload the partial inference tasks to multiple servers for parallel processing. This paper presents the design of Elf, a framework to accelerate the mobile deep vision applications with any server provisioning through the parallel offloading. Elf employs a recurrent region proposal prediction algorithm, a region proposal centric frame partitioning, and a resource-aware multi-offloading scheme. We implement and evaluate Elf upon Linux and Android platforms using four commercial mobile devices and three deep vision applications with ten state-of-the-art models. The comprehensive experiments show that Elf can speed up the applications by 4.85× with saving bandwidth usage by 52.6%, while with <1% application accuracy sacrifice.
- Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, p. 436, 2015.Google Scholar
- M. Xu, J. Liu, Y. Liu, F. X. Lin, Y. Liu, and X. Liu, "A first look at deep learning apps on smartphones," in The World Wide Web Conference, pp. 2125--2136, 2019.Google Scholar
- S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, pp. 91--99, 2015.Google Scholar
- M. Teichmann, M. Weber, M. Zoellner, R. Cipolla, and R. Urtasun, "Multinet: Realtime joint semantic reasoning for autonomous driving," in 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013--1020, IEEE, 2018.Google Scholar
- C. Xiang, C. R. Qi, and B. Li, "Generating 3d adversarial point clouds," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9136--9144, 2019.Google Scholar
- S. Xu, D. Liu, L. Bao, W. Liu, and P. Zhou, "Mhp-vos: Multiple hypotheses propagation for video object segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 314--323, 2019.Google Scholar
- Y. He, X. Zhang, and J. Sun, "Channel pruning for accelerating very deep neural networks," in Proceedings of the IEEE International Conference on Computer Vision, pp. 1389--1397, 2017.Google Scholar
- B. Fang, X. Zeng, and M. Zhang, "Nestdnn: Resource-aware multi-tenant ondevice deep learning for continuous mobile vision," in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 115--127, ACM, 2018.Google Scholar
- J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, "Quantized convolutional neural networks for mobile devices," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4820--4828, 2016.Google Scholar
- Z. He and D. Fan, "Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11438--11446, 2019.Google Scholar
- J. Yim, D. Joo, J. Bae, and J. Kim, "A gift from knowledge distillation: Fast optimization, network minimization and transfer learning," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133--4141, 2017.Google Scholar
- B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697--8710, 2018.Google Scholar
- T. Lee, Z. Lin, S. Pushp, C. Li, Y. Liu, Y. Lee, C. Xu, F. Xu, L. Zhang, and J. Song, "Occlumency: Privacy-preserving remote deep-learning inference using sgx," in Proceedings of the 25th Annual International Conference on Mobile Computing and Networking, MobiCom 2019, October 21--25, 2019, Los Cabos, Mexico, ACM, 2019.Google Scholar
- C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161--170, ACM, 2015.Google Scholar
- N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al., "In-datacenter performance analysis of a tensor processing unit," in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1--12, IEEE, 2017.Google Scholar
- L. Liu, H. Li, and M. Gruteser, "Edge assisted real-time object detection for mobile augmented reality," in The 25th Annual International Conference on Mobile Computing and Networking, pp. 1--16, 2019.Google Scholar
- W. Zhang, S. Li, L. Liu, Z. Jia, Y. Zhang, and D. Raychaudhuri, "Hetero-edge: Orchestration of real-time vision applications on heterogeneous edge clouds," in IEEE INFOCOM 2019-IEEE Conference on Computer Communications, IEEE, 2019.Google Scholar
- J. Emmons, S. Fouladi, G. Ananthanarayanan, S. Venkataraman, S. Savarese, and K. Winstein, "Cracking open the dnn black-box: Video analytics with dnns across the camera-cloud boundary," in Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, pp. 27--32, 2019.Google Scholar
- C. Canel, T. Kim, G. Zhou, C. Li, H. Lim, D. G. Andersen, M. Kaminsky, and S. R. Dulloor, "Scaling video analytics on constrained edge nodes," arXiv preprint arXiv:1905.13536, 2019.Google Scholar
- Y. Li, A. Padmanabhan, P. Zhao, Y. Wang, G. H. Xu, and R. Netravali, "Reducto: On-camera filtering for resource-efficient real-time video analytics," in Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pp. 359--376, 2020.Google Scholar
- S. Naderiparizi, P. Zhang, M. Philipose, B. Priyantha, J. Liu, and D. Ganesan, "Glimpse: A programmable early-discard camera architecture for continuous mobile vision," in Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pp. 292--305, 2017.Google Scholar
- T. Zhang, A. Chowdhery, P. Bahl, K. Jamieson, and S. Banerjee, "The design and implementation of a wireless video surveillance system," MobiCom, ACM, 2015.Google Scholar
- "Aws wavelength: Bring aws services to the edge of the verizon 5g network.." https://enterprise.verizon.com/business/learn/edge-computing/.Google Scholar
- A. Narayanan, E. Ramadan, J. Carpenter, Q. Liu, Y. Liu, F. Qian, and Z.-L. Zhang, "A first look at commercial 5g performance on smartphones," in Proceedings of The Web Conference 2020, pp. 894--905, 2020.Google Scholar
- S. Zhou, W. Shen, D. Zeng, M. Fang, Y. Wei, and Z. Zhang, "Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes," Signal Processing: Image Communication, vol. 47, pp. 358--368, 2016.Google ScholarDigital Library
- N. Tijtgat, W. Van Ranst, T. Goedeme, B. Volckaert, and F. De Turck, "Embedded real-time object detection for a uav warning system," in The IEEE International Conference on Computer Vision (ICCV) Workshops, Oct 2017.Google Scholar
- H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, "Icnet for real-time semantic segmentation on high-resolution images," in Proceedings of the European Conference on Computer Vision (ECCV), pp. 405--420, 2018.Google Scholar
- "Intel xeon scalable processors." https://www.intel.com/content/www/us/en/products/processors/xeon/scalable.html.Google Scholar
- "Nvidia egx a100: delivering real-time ai processing and enhanced security at the edge." https://www.nvidia.com/en-us/data-center/products/egx-a100/.Google Scholar
- R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella, "Multi-resource packing for cluster schedulers," ACM SIGCOMM Computer Communication Review, vol. 44, no. 4, pp. 455--466, 2014.Google ScholarDigital Library
- L. Peterson, T. Anderson, S. Katti, N. McKeown, G. Parulkar, J. Rexford, M. Satyanarayanan, O. Sunay, and A. Vahdat, "Democratizing the network edge," ACM SIGCOMM Computer Communication Review, vol. 49, no. 2, pp. 31--36, 2019.Google ScholarDigital Library
- S. Yang, E. Bailey, Z. Yang, J. Ostrometzky, G. Zussman, I. Seskar, and Z. Kostic, "Cosmos smart intersection: Edge compute and communications for bird's eye object tracking," in Proc. 4th International Workshop on Smart Edge Computing and Networking (SmartEdge'20), 2020.Google ScholarCross Ref
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, pp. 2961--2969, 2017.Google Scholar
- H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, "Rmpe: Regional multi-person pose estimation," in Proceedings of the IEEE International Conference on Computer Vision, pp. 2334--2343, 2017.Google Scholar
- X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen, "Deepdecision: A mobile deep learning framework for edge video analytics," in IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 1421--1429, IEEE, 2018.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770--778, 2016.Google Scholar
- "Nvidia jetson nano, the ai platform for autonomous everything." https://www.nvidia.com/jetson-nano.Google Scholar
- "Amazon sagemaker: Machine learning for every developer and data scientist." https://aws.amazon.com/sagemaker/.Google Scholar
- K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep high-resolution representation learning for human pose estimation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.Google Scholar
- D. Raychaudhuri, I. Seskar, G. Zussman, T. Korakis, D. Kilper, T. Chen, J. Kolodziejski, M. Sherman, Z. Kostic, X. Gu, et al., "Challenge: Cosmos: A city-scale programmable testbed for experimentation with advanced wireless," in Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pp. 1--13, 2020.Google Scholar
- J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al., "Speed/accuracy trade-offs for modern convolutional object detectors," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7310--7311, 2017.Google Scholar
- "Nvidia jetson tx2, the fastest, most power-efficient embedded ai computing device." https://developer.nvidia.com/embedded/jetson-tx2.Google Scholar
- M. Wang, C.-c. Huang, and J. Li, "Supporting very large models using automatic dataflow graph partitioning," in Proceedings of the Fourteenth EuroSys Conference 2019, p. 26, ACM, 2019.Google Scholar
- P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, "Mots: Multi-object tracking and segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7942--7951, 2019.Google Scholar
- M. Najibi, B. Singh, and L. S. Davis, "Autofocus: Efficient multi-scale inference," in Proceedings of the IEEE International Conference on Computer Vision, pp. 9745--9755, 2019.Google Scholar
- M. Figurnov, M. D. Collins, Y. Zhu, L. Zhang, J. Huang, D. Vetrov, and R. Salakhutdinov, "Spatially adaptive computation time for residual networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1039--1048, 2017.Google Scholar
- D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, "End-to-end attention-based large vocabulary speech recognition," in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4945--4949, IEEE, 2016.Google Scholar
- Y. Wang, M. Huang, X. Zhu, and L. Zhao, "Attention-based lstm for aspect-level sentiment classification," in Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 606--615, 2016.Google Scholar
- Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, "A dual-stage attention-based recurrent neural network for time series prediction," arXiv preprint arXiv:1704.02971, 2017.Google Scholar
- F. A. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with lstm," 1999.Google ScholarDigital Library
- Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, "Fast online object tracking and segmentation: A unifying approach," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1328--1338, 2019.Google Scholar
- P. Hintjens, ZeroMQ: messaging for many applications. " O'Reilly Media, Inc.", 2013.Google Scholar
- "Build and run docker containers leveraging nvidia gpus." https://github.com/NVIDIA/nvidia-docker.Google Scholar
- "Nvidia gpu-accelerated jpeg encoder and decoder." https://developer.nvidia.com/nvjpeg.Google Scholar
- A. Narayanan, J. Carpenter, E. Ramadan, Q. Liu, Y. Liu, F. Qian, and Z.-L. Zhang, "A first measurement study of commercial mmwave 5g performance on smart-phones," arXiv preprint arXiv:1909.07532, 2019.Google Scholar
- Z. Cai and N. Vasconcelos, "Cascade r-cnn: Delving into high quality object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154--6162, 2018.Google Scholar
- H. Zhang, H. Chang, B. Ma, N. Wang, and X. Chen, "Dynamic r-cnn: Towards high quality object detection via dynamic training," arXiv preprint arXiv:2004.06002, 2020.Google Scholar
- Z. Tian, C. Shen, H. Chen, and T. He, "Fcos: Fully convolutional one-stage object detection," in Proceedings of the IEEE international conference on computer vision, pp. 9627--9636, 2019.Google Scholar
- T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, and J. Shi, "Foveabox: Beyound anchor-based object detection," IEEE Transactions on Image Processing, vol. 29, pp. 7389--7398, 2020.Google ScholarDigital Library
- X. Zhang, F. Wan, C. Liu, R. Ji, and Q. Ye, "Freeanchor: Learning to match anchors for visual object detection," in Advances in Neural Information Processing Systems, pp. 147--155, 2019.Google Scholar
- C. Zhu, Y. He, and M. Savvides, "Feature selective anchor-free module for singleshot object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 840--849, 2019.Google Scholar
- G. Ghiasi, T.-Y. Lin, and Q. V. Le, "Nas-fpn: Learning scalable feature pyramid architecture for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7036--7045, 2019.Google Scholar
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, pp. 2980--2988, 2017.Google Scholar
- P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, "Mots: Multi-object tracking and segmentation," in Conference on Computer Vision and Pattern Recognition (CVPR), 2019.Google Scholar
- A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the kitti vision benchmark suite," in Conference on Computer Vision and Pattern Recognition (CVPR), 2012.Google Scholar
- M. Andriluka, U. Iqbal, E. Ensafutdinov, L. Pishchulin, A. Milan, J. Gall, and S. B., "PoseTrack: A benchmark for human pose estimation and tracking," in CVPR, 2018.Google Scholar
- R. Alp Güler, N. Neverova, and I. Kokkinos, "Densepose: Dense human pose estimation in the wild," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297--7306, 2018.Google Scholar
- "Nvidia tensorrt programmable inference accelerator." https://developer.nvidia.com/tensorrt.Google Scholar
- M. Menze and A. Geiger, "Object scene flow for autonomous vehicles," in Conference on Computer Vision and Pattern Recognition (CVPR), 2015.Google Scholar
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The cityscapes dataset for semantic urban scene understanding," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.Google Scholar
- D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," ICLR, 2015.Google Scholar
- Y. Guan and T. Plötz, "Ensembles of deep lstm learners for activity recognition using wearables," Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 2, pp. 1--28, 2017.Google ScholarDigital Library
- P. Zhang, W. Ouyang, P. Zhang, J. Xue, and N. Zheng, "Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12085--12094, 2019.Google Scholar
- D. Held, S. Thrun, and S. Savarese, "Learning to track at 100 fps with deep regression networks," in European Conference on Computer Vision, pp. 749--765, Springer, 2016.Google Scholar
- Y. Guan, C. Zheng, X. Zhang, Z. Guo, and J. Jiang, "Pano: Optimizing 360 video streaming with a better understanding of quality perception," in Proceedings of the ACM Special Interest Group on Data Communication, pp. 394--407, 2019.Google Scholar
- J. Jiang, G. Ananthanarayanan, P. Bodik, S. Sen, and I. Stoica, "Chameleon: scalable adaptation of video analytics," in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pp. 253--266, 2018.Google Scholar
- H. Zhang, G. Ananthanarayanan, P. Bodik, M. Philipose, P. Bahl, and M. J. Freedman, "Live video analytics at scale with approximation and delay-tolerance," in 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17), pp. 377--392, 2017.Google Scholar
- B. Zhang, X. Jin, S. Ratnasamy, J. Wawrzynek, and E. A. Lee, "Awstream: Adaptive wide-area streaming analytics," in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pp. 236--252, 2018.Google Scholar
- Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, "Neurosurgeon: Collaborative intelligence between the cloud and mobile edge," ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615--629, 2017.Google ScholarDigital Library
- Z. Zhao, K. M. Barijough, and A. Gerstlauer, "Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2348--2359, 2018.Google ScholarCross Ref
- K. Apicharttrisorn, X. Ran, J. Chen, S. V. Krishnamurthy, and A. K. Roy-Chowdhury, "Frugal following: Power thrifty object detection and tracking for mobile augmented reality," in Proceedings of the 17th Conference on Embedded Networked Sensor Systems, pp. 96--109, 2019.Google Scholar
- K. Du, A. Pervaiz, X. Yuan, A. Chowdhery, Q. Zhang, H. Hoffmann, and J. Jiang, "Server-driven video streaming for deep learning inference," in Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pp. 557--570, 2020.Google Scholar
- A. Veit and S. Belongie, "Convolutional networks with adaptive inference graphs," in Proceedings of the European Conference on Computer Vision (ECCV), pp. 3--18, 2018.Google Scholar
- S. Liu, Y. Lin, Z. Zhou, K. Nan, H. Liu, and J. Du, "On-demand deep model compression for mobile devices: A usage-driven model selection framework," in Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 389--400, 2018.Google Scholar
- S. Jiang, Z. Ma, X. Zeng, C. Xu, M. Zhang, C. Zhang, and Y. Liu, "Scylla: Qoe-aware continuous mobile vision with fpga-based dynamic deep neural network reconfiguration," in IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 1369--1378, IEEE, 2020.Google Scholar
- M. Xu, M. Zhu, Y. Liu, F. X. Lin, and X. Liu, "Deepcache: principled cache for mobile deep vision," in Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 129--144, ACM, 2018.Google Scholar
- M. Long, H. Zhu, J. Wang, and M. I. Jordan, "Unsupervised domain adaptation with residual transfer networks," in Advances in Neural Information Processing Systems, pp. 136--144, 2016.Google Scholar
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "Eie: efficient inference engine on compressed deep neural network," in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 243--254, IEEE, 2016.Google Scholar
Index Terms
- Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading
Recommendations
The missing link: explaining ELF static linking, semantically
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsBeneath the surface, software usually depends on complex linker behaviour to work as intended. Even linking <pre>hello_world.c</pre> is surprisingly involved, and systems software such as <pre>libc</pre> and operating system kernels rely on a host of ...
The missing link: explaining ELF static linking, semantically
OOPSLA '16Beneath the surface, software usually depends on complex linker behaviour to work as intended. Even linking <pre>hello_world.c</pre> is surprisingly involved, and systems software such as <pre>libc</pre> and operating system kernels rely on a host of ...
ELF-Miner: using structural knowledge and data mining methods to detect new (Linux) malicious executables
Linux malware can pose a significant threat--its (Linux) penetration is exponentially increasing--because little is known or understood about Linux OS vulnerabilities. We believe that now is the right time to devise non-signature based zero-day (...
Comments