ABSTRACT
The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the cloud is often infeasible due to privacy concerns, high latency, or the lack of connectivity. As such, there is a critical need to find a way to effectively execute the DNN models locally on the devices.
This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time. Our approach employs machine learning to develop a predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this by first training off-line a predictive model, and then use the learnt model to select a DNN model to use for new, unseen inputs. We apply our approach to the image classification task and evaluate it on a Jetson TX2 embedded deep learning platform using the ImageNet ILSVRC 2012 validation dataset. We consider a range of influential DNN models. Experimental results show that our approach achieves a 7.52% improvement in inference accuracy, and a 1.8x reduction in inference time over the most-capable single DNN model.
- JJ Allaire, Dirk Eddelbuettel, Nick Golding, and Yuan Tang. 2016. TensorFlow for R. https://tensorflow.rstudio.com/Google Scholar
- Dario Amodei et al. 2016. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In ICML ’16. Google ScholarDigital Library
- Dzmitry Bahdanau et al. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google Scholar
- Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Conference on Embedded Networked Sensor Systems. Google ScholarDigital Library
- Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR (2016).Google Scholar
- Shizhao Chen et al. 2018. Adaptive Optimization of Sparse MatrixVector Multiplication on Emerging Many-Core Architectures. In HPCC ’18.Google Scholar
- Wenlin Chen et al. 2015. Compressing Neural Networks with the Hashing Trick. In ICML ’16. Google ScholarDigital Library
- Kyunghyun Cho et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP ’14.Google Scholar
- Chris Cummins et al. 2017. End-to-end Deep Learning of Optimization Heuristics. In PACT ’17.Google Scholar
- Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resourceefficient and QoS-aware Cluster Management. In ASPLOS ’14. Google ScholarDigital Library
- Jeff Donahue et al. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML ’14. Google ScholarDigital Library
- Murali Krishna Emani et al. 2013. Smart, adaptive mapping of parallelism in the presence of external workload. In CGO ’13.Google Scholar
- Murali Krishna Emani and Michael O’Boyle. 2015. Celebrating Diversity: A Mixture of Experts Approach for Runtime Mapping in Dynamic Environments. In PLDI ’15. Google ScholarDigital Library
- Petko Georgiev et al. 2017. Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations. ACM Interact. Mob. Wearable Ubiquitous Technol. (2017). Google ScholarDigital Library
- Dominik Grewe et al. 2011. A workload-aware mapping approach for data-parallel programs. In HiPEAC ’11. Google ScholarDigital Library
- Dominik Grewe et al. 2013. OpenCL task partitioning in the presence of GPU contention. In LCPC ’13.Google Scholar
- Dominik Grewe et al. 2013. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In CGO ’13.Google Scholar
- Tian Guo. 2017. Towards Efficient Deep Inference for Mobile Applications. CoRR abs/1707.04610 (2017).Google Scholar
- Song Han et al. 2015. Learning both weights and connections for efficient neural network. In NIPS ’15. Google ScholarDigital Library
- Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In ISCA ’16. Google ScholarDigital Library
- M Hassaballah et al. 2016. Image features detection, description and matching. In Image Feature Detectors and Descriptors.Google Scholar
- Kaiming He et al. 2016. Deep residual learning for image recognition. In CVPR ’16.Google Scholar
- Kaiming He et al. 2016. Identity mappings in deep residual networks. In ECCV ’16.Google Scholar
- Andrew G. Howard et al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- Loc N. Huynh et al. 2017. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications. In MobiSys ’17. Google ScholarDigital Library
- Forrest N. Iandola et al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016).Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML ’15. Google ScholarDigital Library
- Jonghoon Jin, Aysegul Dundar, and Eugenio Culurciello. 2015. Flattened Convolutional Neural Networks for Feedforward Acceleration. (2015).Google Scholar
- Yiping Kang et al. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In ASPLOS ’17. Google ScholarDigital Library
- Aaron Klein et al. 2016. Fast bayesian optimization of machine learning hyperparameters on large datasets. arXiv preprint arXiv:1605.07079 (2016).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In NIPS ’12. Google ScholarDigital Library
- Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. DeepX: A software accelerator for low-power deep learning inference on mobile devices. In IPSN ’16. Google ScholarDigital Library
- Seyyed Salar Latifi Oskouei et al. 2016. Cnndroid: GPU-accelerated execution of trained deep convolutional neural networks on android. In Multimedia Conference. Google ScholarDigital Library
- Honglak Lee et al. 2009. Unsupervised Feature Learning for Audio Classification Using Convolutional Deep Belief Networks. In NIPS ’09. Google ScholarDigital Library
- Vicent Sanz Marco et al. 2017. Improving Spark Application Throughput via Memory Aware Task Co-location: A Mixture of Experts Approach. In Middleware ’17.Google Scholar
- Mohammad Motamedi et al. 2017. Machine Intelligence on ResourceConstrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference. ACM Trans. Embed. Comput. Syst. (2017). Google ScholarDigital Library
- William F Ogilvie et al. 2014. Fast automatic heuristic construction using active learning. In LCPC ’14.Google Scholar
- William F Ogilvie et al. 2017. Minimizing the cost of iterative compilation with active learning. In CGO ’17. Google ScholarDigital Library
- Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R Rabiee, Nic Lane, and Hamed Haddadi. 2017. A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics. arXiv preprint arXiv:1703.02952 (2017).Google Scholar
- Omkar M Parkhi et al. 2015. Deep Face Recognition. In BMVC ’15.Google Scholar
- Sundari K. Rallapalli et al. 2016. Are Very Deep Neural Networks Feasible on Mobile Devices? Technical Report. University of Southern California.Google Scholar
- Mohammad Rastegari et al. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 (2016).Google Scholar
- Sujith Ravi. 2015. ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections. arXiv:1708.00630 (2015).Google Scholar
- Jie Ren et al. 2017. Optimise web browsing on heterogeneous mobile platforms: a machine learning based approach. In INFOCOM ’17.Google Scholar
- Sandra Servia Rodríguez et al. 2017. Personal Model Training under Privacy Constraints. CoRR abs/1703.00380 (2017).Google Scholar
- Olga Russakovsky et al. 2015. ImageNet Large Scale Visual Recognition Challenge. In IJCV ’15. Google ScholarDigital Library
- Faiza Samreen et al. 2016. Daleel: Simplifying Cloud Instance Selection Using Machine Learning. In NOMS ’16.Google Scholar
- Nathan Silberman and Sergio Guadarrama. 2013. TensorFlow-slim image classification library. https://github.com/tensorflow/models/tree/master/research/slim. (2013).Google Scholar
- Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. 2017. Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures. In HPCA ’17.Google Scholar
- Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. 2014. Striving for Simplicity: The All Convolutional Net. CoRR abs/1412.6806 (2014).Google Scholar
- Yi Sun, Yuheng Chen, et al. 2014. Deep learning face representation by joint identification-verification. In NIPS ’14. Google ScholarDigital Library
- Ben Taylor et al. 2017. Adaptive optimization for OpenCL programs on embedded heterogeneous systems. In LCTES ’17. Google ScholarDigital Library
- Surat Teerapittayanon et al. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In ICDCS ’17.Google Scholar
- Georgios Tournavitis et al. 2009. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping. In PLDI ’09. Google ScholarDigital Library
- Zheng Wang et al. 2014. Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems. ACM TACO (2014). Google ScholarDigital Library
- Zheng Wang et al. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM TACO (2014). Google ScholarDigital Library
- Zheng Wang and Michael O’Boyle. 2018. Machine Learning in Compiler Optimisation. Proc. IEEE (2018).Google ScholarCross Ref
- Zheng Wang and Michael F.P. O’Boyle. 2009. Mapping Parallelism to Multi-cores: A Machine Learning Based Approach. In PPoPP ’09. Google ScholarDigital Library
- Zheng Wang and Michael FP O’Boyle. 2010. Partitioning streaming parallelism for multi-cores: a machine learning based approach. In PACT ’10. Google ScholarDigital Library
- Zheng Wang and Michael FP O’boyle. 2013. Using machine learning to partition streaming programs. ACM TACO (2013). Google ScholarDigital Library
- Peng Zhang, et al. 2018. Auto-tuning Streamed Applications on Intel Xeon Phi. In IPDPS ’18.Google Scholar
- Will Y Zou et al. 2013. Bilingual word embeddings for phrase-based machine translation. In EMNLP ’13.Google Scholar
Index Terms
- Adaptive deep learning model selection on embedded systems
Recommendations
Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection
Deep neural networks (DNNs) are becoming a key enabling technique for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and ...
Adaptive deep learning model selection on embedded systems
LCTES '18The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the ...
Moving convolutional neural networks to embedded systems: the alexnet and VGG-16 case
IPSN '18: Proceedings of the 17th ACM/IEEE International Conference on Information Processing in Sensor NetworksExecution of deep learning solutions is mostly restricted to high performing computing platforms, e.g., those endowed with GPUs or FPGAs, due to the high demand on computation and memory such solutions require. Despite the fact that dedicated hardware ...
Comments