skip to main content
10.1145/3211332.3211336acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections

Adaptive deep learning model selection on embedded systems

Published:19 June 2018Publication History

ABSTRACT

The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the cloud is often infeasible due to privacy concerns, high latency, or the lack of connectivity. As such, there is a critical need to find a way to effectively execute the DNN models locally on the devices.

This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time. Our approach employs machine learning to develop a predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this by first training off-line a predictive model, and then use the learnt model to select a DNN model to use for new, unseen inputs. We apply our approach to the image classification task and evaluate it on a Jetson TX2 embedded deep learning platform using the ImageNet ILSVRC 2012 validation dataset. We consider a range of influential DNN models. Experimental results show that our approach achieves a 7.52% improvement in inference accuracy, and a 1.8x reduction in inference time over the most-capable single DNN model.

References

  1. JJ Allaire, Dirk Eddelbuettel, Nick Golding, and Yuan Tang. 2016. TensorFlow for R. https://tensorflow.rstudio.com/Google ScholarGoogle Scholar
  2. Dario Amodei et al. 2016. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In ICML ’16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dzmitry Bahdanau et al. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  4. Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Conference on Embedded Networked Sensor Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR (2016).Google ScholarGoogle Scholar
  6. Shizhao Chen et al. 2018. Adaptive Optimization of Sparse MatrixVector Multiplication on Emerging Many-Core Architectures. In HPCC ’18.Google ScholarGoogle Scholar
  7. Wenlin Chen et al. 2015. Compressing Neural Networks with the Hashing Trick. In ICML ’16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kyunghyun Cho et al. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP ’14.Google ScholarGoogle Scholar
  9. Chris Cummins et al. 2017. End-to-end Deep Learning of Optimization Heuristics. In PACT ’17.Google ScholarGoogle Scholar
  10. Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resourceefficient and QoS-aware Cluster Management. In ASPLOS ’14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jeff Donahue et al. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML ’14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Murali Krishna Emani et al. 2013. Smart, adaptive mapping of parallelism in the presence of external workload. In CGO ’13.Google ScholarGoogle Scholar
  13. Murali Krishna Emani and Michael O’Boyle. 2015. Celebrating Diversity: A Mixture of Experts Approach for Runtime Mapping in Dynamic Environments. In PLDI ’15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Petko Georgiev et al. 2017. Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations. ACM Interact. Mob. Wearable Ubiquitous Technol. (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dominik Grewe et al. 2011. A workload-aware mapping approach for data-parallel programs. In HiPEAC ’11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dominik Grewe et al. 2013. OpenCL task partitioning in the presence of GPU contention. In LCPC ’13.Google ScholarGoogle Scholar
  17. Dominik Grewe et al. 2013. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In CGO ’13.Google ScholarGoogle Scholar
  18. Tian Guo. 2017. Towards Efficient Deep Inference for Mobile Applications. CoRR abs/1707.04610 (2017).Google ScholarGoogle Scholar
  19. Song Han et al. 2015. Learning both weights and connections for efficient neural network. In NIPS ’15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In ISCA ’16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M Hassaballah et al. 2016. Image features detection, description and matching. In Image Feature Detectors and Descriptors.Google ScholarGoogle Scholar
  22. Kaiming He et al. 2016. Deep residual learning for image recognition. In CVPR ’16.Google ScholarGoogle Scholar
  23. Kaiming He et al. 2016. Identity mappings in deep residual networks. In ECCV ’16.Google ScholarGoogle Scholar
  24. Andrew G. Howard et al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  25. Loc N. Huynh et al. 2017. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications. In MobiSys ’17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Forrest N. Iandola et al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016).Google ScholarGoogle Scholar
  27. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML ’15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jonghoon Jin, Aysegul Dundar, and Eugenio Culurciello. 2015. Flattened Convolutional Neural Networks for Feedforward Acceleration. (2015).Google ScholarGoogle Scholar
  29. Yiping Kang et al. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In ASPLOS ’17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Aaron Klein et al. 2016. Fast bayesian optimization of machine learning hyperparameters on large datasets. arXiv preprint arXiv:1605.07079 (2016).Google ScholarGoogle Scholar
  31. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In NIPS ’12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. DeepX: A software accelerator for low-power deep learning inference on mobile devices. In IPSN ’16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Seyyed Salar Latifi Oskouei et al. 2016. Cnndroid: GPU-accelerated execution of trained deep convolutional neural networks on android. In Multimedia Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Honglak Lee et al. 2009. Unsupervised Feature Learning for Audio Classification Using Convolutional Deep Belief Networks. In NIPS ’09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Vicent Sanz Marco et al. 2017. Improving Spark Application Throughput via Memory Aware Task Co-location: A Mixture of Experts Approach. In Middleware ’17.Google ScholarGoogle Scholar
  36. Mohammad Motamedi et al. 2017. Machine Intelligence on ResourceConstrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference. ACM Trans. Embed. Comput. Syst. (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. William F Ogilvie et al. 2014. Fast automatic heuristic construction using active learning. In LCPC ’14.Google ScholarGoogle Scholar
  38. William F Ogilvie et al. 2017. Minimizing the cost of iterative compilation with active learning. In CGO ’17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Seyed Ali Ossia, Ali Shahin Shamsabadi, Ali Taheri, Hamid R Rabiee, Nic Lane, and Hamed Haddadi. 2017. A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics. arXiv preprint arXiv:1703.02952 (2017).Google ScholarGoogle Scholar
  40. Omkar M Parkhi et al. 2015. Deep Face Recognition. In BMVC ’15.Google ScholarGoogle Scholar
  41. Sundari K. Rallapalli et al. 2016. Are Very Deep Neural Networks Feasible on Mobile Devices? Technical Report. University of Southern California.Google ScholarGoogle Scholar
  42. Mohammad Rastegari et al. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 (2016).Google ScholarGoogle Scholar
  43. Sujith Ravi. 2015. ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections. arXiv:1708.00630 (2015).Google ScholarGoogle Scholar
  44. Jie Ren et al. 2017. Optimise web browsing on heterogeneous mobile platforms: a machine learning based approach. In INFOCOM ’17.Google ScholarGoogle Scholar
  45. Sandra Servia Rodríguez et al. 2017. Personal Model Training under Privacy Constraints. CoRR abs/1703.00380 (2017).Google ScholarGoogle Scholar
  46. Olga Russakovsky et al. 2015. ImageNet Large Scale Visual Recognition Challenge. In IJCV ’15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Faiza Samreen et al. 2016. Daleel: Simplifying Cloud Instance Selection Using Machine Learning. In NOMS ’16.Google ScholarGoogle Scholar
  48. Nathan Silberman and Sergio Guadarrama. 2013. TensorFlow-slim image classification library. https://github.com/tensorflow/models/tree/master/research/slim. (2013).Google ScholarGoogle Scholar
  49. Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. 2017. Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures. In HPCA ’17.Google ScholarGoogle Scholar
  50. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. 2014. Striving for Simplicity: The All Convolutional Net. CoRR abs/1412.6806 (2014).Google ScholarGoogle Scholar
  51. Yi Sun, Yuheng Chen, et al. 2014. Deep learning face representation by joint identification-verification. In NIPS ’14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ben Taylor et al. 2017. Adaptive optimization for OpenCL programs on embedded heterogeneous systems. In LCTES ’17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Surat Teerapittayanon et al. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In ICDCS ’17.Google ScholarGoogle Scholar
  54. Georgios Tournavitis et al. 2009. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping. In PLDI ’09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Zheng Wang et al. 2014. Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems. ACM TACO (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zheng Wang et al. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM TACO (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zheng Wang and Michael O’Boyle. 2018. Machine Learning in Compiler Optimisation. Proc. IEEE (2018).Google ScholarGoogle ScholarCross RefCross Ref
  58. Zheng Wang and Michael F.P. O’Boyle. 2009. Mapping Parallelism to Multi-cores: A Machine Learning Based Approach. In PPoPP ’09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zheng Wang and Michael FP O’Boyle. 2010. Partitioning streaming parallelism for multi-cores: a machine learning based approach. In PACT ’10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Zheng Wang and Michael FP O’boyle. 2013. Using machine learning to partition streaming programs. ACM TACO (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Peng Zhang, et al. 2018. Auto-tuning Streamed Applications on Intel Xeon Phi. In IPDPS ’18.Google ScholarGoogle Scholar
  62. Will Y Zou et al. 2013. Bilingual word embeddings for phrase-based machine translation. In EMNLP ’13.Google ScholarGoogle Scholar

Index Terms

  1. Adaptive deep learning model selection on embedded systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
        June 2018
        112 pages
        ISBN:9781450358033
        DOI:10.1145/3211332

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 June 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate116of438submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader