Skip to main content

Enabling Inference and Training of Deep Learning Models for AI Applications on IoT Edge Devices

  • Chapter
  • First Online:
Artificial Intelligence-based Internet of Things Systems

Part of the book series: Internet of Things ((ITTCC))

  • 1627 Accesses

Abstract

IoT edge devices sense and process data to support real-time decision-making in latency-sensitive and mission-critical applications such as autonomous driving, industry automation, safety compliance, and security-threat monitoring. Running AI at edge brings the ability to make intelligent real-time decisions on the device. Moreover, on-device AI is vital to preserving data privacy. Hence, edge AI is an active topic for research and engineering at the major technology corporations, numerous start-ups, and academia.

Deep learning neural network models have made tremendous improvements in prediction accuracies tending to surpass human intelligence for several tasks. Typically, these models are large-sized and hence, not suitable for resource-constrained edge devices and real-time inference. It is also challenging to train deep learning models on the edge device because they require large amounts of data and compute resources to train the model.

We present the active ongoing research in optimizing deep learning models for inference at the edge using connection pruning, model quantization, and knowledge distillation. Then, we describe techniques to train/retrain the deep learning models at the resource-constrained edge device using new learning paradigms such as federated learning, weight imprinting, and training smaller models on fewer data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. General Data Protection Regulation (GDPR) (https://gdpr-info.eu/)

  2. Edge AI Software Market by Component (Solutions and Services), Data Source, Application (Autonomous Vehicles, Access Management, Video Surveillance, Remote Monitoring & Predictive Maintenance, Telemetry), Vertical, and Region – Global Forecast to 2023. Markets and Markets (https://www.researchandmarkets.com/reports/4752886/edge-ai-software-market-by-component-solutions)

  3. Deep learning models with pre-trained weights. (https://keras.io/api/applications/)

  4. Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2018). Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine, 35(1), 126–136.

    Article  Google Scholar 

  5. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).

    Google Scholar 

  6. Vanhoucke, V., Senior, A., & Mao, M. Z. (2011). Improving the speed of neural networks on CPUs. Deep Learning and Unsupervised Feature Learning Workshop, Neural Information Processing Systems.

    Google Scholar 

  7. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with limited numerical precision. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (pp. 1737–1746).

    Google Scholar 

  8. Han, S., Pool, J., Tran, J., and Dally, W. J, (2015). Learning both weights and connections for efficient neural networks. Advances in Neural Information Processing Systems.

    Google Scholar 

  9. Zhu, M., & Gupta, S. (2018). To prune, or not to prune: exploring the efficacy of pruning for model compression. International Conference on Learning Representations (ICLR) Workshop.

    Google Scholar 

  10. Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., & Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. Advances in Neural Information Processing Systems.

    Google Scholar 

  11. Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low-rank expansions. In Proceedings of the British Machine Vision Conference. BMVA Press.

    Google Scholar 

  12. Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I. V., & Lempitsky, V. S. (2015). Speeding-up convolutional neural networks using fine-tuned CP-decomposition. International Conference on Learning Representations (ICLR Poster).

    Google Scholar 

  13. Tai, C., Xiao, T., Wang, X., & Weinan, E. (2015). Convolutional neural networks with low-rank regularization. International Conference on Learning Representations (ICLR Poster).

    Google Scholar 

  14. Denil, M., Shakibi, B., Dinh, L., Ranzato, M., & Freitas, N. D. (2013). Predicting parameters in Deep Learning. Advances in Neural Information Processing Systems.

    Google Scholar 

  15. Sainath, T. N., Kingsbury, B., Sindhwani, V., Arisoy, E., & Ramanhadran, B. (2013). Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing.

    Google Scholar 

  16. Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2020). Knowledge distillation: A survey. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR).

    Google Scholar 

  17. Kang, Y., Hauswald, J., Gao, C., et al. (2017). Neurosurgeon: collaborative intelligence between the Cloud and mobile Edge. In Proceeding of 22nd International Conference Architecture Support Programming Language Operator System (ASPLOS) (pp. 615–629).

    Google Scholar 

  18. Teerapittayanon, S., McDaniel, B., & Kung, H. (2016). Branchynet: Fast inference via early exiting from deep neural networks. International Conference on Pattern Recognition.

    Google Scholar 

  19. TFLite: ML for mobile and Edge devices. (https://www.tensorflow.org/lite)

  20. NVIDIA TensorRT: Programmable inference accelerator. (https://developer.nvidia.com/tensorrt)

  21. NVIDIA Jetson: The AI platform for autonomous everything (https://www.nvidia.com/en-in/autonomous-machines/embedded-systems/)

  22. Google coral (https://coral.ai/)

  23. Intel Open Vino (https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)

  24. Intel Movidius Vision Process Units (https://www.intel.com/content/www/us/en/products/processors/movidius-vpu/movidius-myriad-x.html)

  25. Qualcomm’s Snapdragon (https://www.qualcomm.com/news/releases/2019/04/10/qualcomm-expands-ecosystem-enable-next-gen-Edge-ai-and-machine-learning)

  26. Konecný, J., McMahan, H. B., Ramage, D., & Richtárik, P. (2016). Federated Optimization: Distributed Machine Learning for On-Device Intelligence. CoRR abs/1610.02527. arXiv:1610.02527. (http://arxiv.org/abs/1610.02527)

  27. Retrain a classification model on-device with backpropagation. (https://coral.ai/docs/Edgetpu/retrain-classification-ondevice-backprop/)

  28. Qi, H., Brown, M., & Lowe, D.G. (2018). Low-shot learning with imprinted weights. Conference on Computer Vision and Pattern recognition (CVPR).

    Google Scholar 

  29. Denker, J., Schwartz, D., Wittner, B., Solla, S. A., Howard, R., Jackel, L., & Hopfield, J. (1987). Large automatic learning, rule extraction and generalization. Complex Systems, 1, 877–922.

    MathSciNet  MATH  Google Scholar 

  30. Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1, 151–160.

    Article  Google Scholar 

  31. Solla, S. A., Schwartz, D. B., Tishby, N., & Levin, E. (1990). Supervised learning: A theoretical framework. Neural Information Processing Systems.

    Google Scholar 

  32. Le Cun, Y. (1989). Generalization and network design strategies. In R. Pfeifer, Z. Schreter, F. Fogelman, & L. Steels (Eds.), Connectionism in Perspective. Elsevier.

    Google Scholar 

  33. Blalock, D., Gonzalez Ortiz, J. J., Frankle, J., & Guttag, J. (2020). What is the state of neural network pruning? In Machine Learning and Systems (MLSys).

    Google Scholar 

  34. Konecný, J., McMahan, H. B., Yu, F. X., Richtárik, P., Suresh, A. T., & Bacon, D. (2016). Federated Learning: Strategies for Improving Communication Efficiency. NIPS 2016 Workshop on Private Multi-Party Machine Learning.

    Google Scholar 

  35. McMahan, H. B, Moore, E., Ramage, D., and Agüera-Arcas, B., (2016). Federated Learning of Deep Networks using Model Averaging. CoRR abs/1602.05629. arXiv:1602.05629. (http://arxiv.org/abs/1602.05629).

  36. Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology. Article no. 12.

    Google Scholar 

  37. Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations (ICLR).

    Google Scholar 

  38. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., & Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2704–2713.

    Google Scholar 

  39. Guo, Y. (2018). A survey on methods and theories on quantized neural networks. CoRR, abs/1808.04752, arXiv:1808.04752.

    Google Scholar 

  40. Courbariaux, M., Bengio, Y., & David, J.-P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in Neural Information Processing Systems, 3123–3131.

    Google Scholar 

  41. Gong, Y., Liu, L., Yang, M., & Bourdev, L. (2014). Compressed deep convolutional networks using vector quantization. CoRR, abs/1412.6115, arXiv:1808.04752.

    Google Scholar 

  42. Rastegari, M., Ordonez, V., Redmon, J., & Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision (ECCV) (pp. 525–542). Springer.

    Google Scholar 

  43. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A survey on deep transfer learning. In International Conference on Artificial Neural Networks (ICANN) (pp. 270–279). Springer.

    Google Scholar 

  44. Brutzkus, A., & Globerson, A. (2019). Why do larger models generalize better? A theoretical perspective via the XOR problem. International Conference on Machine Learning (ICML).

    Google Scholar 

  45. Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. Neural Information Processing Systems Deep Learning and Representation Learning Workshop.

    Google Scholar 

  46. Bucilua, C., Caruana, R., & Niculescu-Mizil, A. (2006). Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (pp. 535–541). NY, USA.

    Chapter  Google Scholar 

  47. Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2020). Knowledge Distillation: a survey. CoRR, abs/2006.05525 arXiv:2006.05525.

    Google Scholar 

  48. McMahan, H. B., Moore, E., Ramage, D., Hampson, S., & Aguera-Arcas, B. (2017). Communication-efficient learning of deep networks from decentralized data. AISTATS.

    Google Scholar 

  49. Hard, A., Rao, K., Mathews, R., Ramaswamy, S., Beaufays, F., Augenstein, S., Eichner, H., Kiddon, C., & Ramage, D. (2018). Federated learning for mobile keyboard prediction. CoRR, abs/ 1811.03604 (2018) arXiv: 1811.03604.

    Google Scholar 

  50. McMahan, H. B., Ramage, D., Talwar, K., & Zhang, L. (2018). Learning differentially private recurrent language models. International Conference on Learning Representations (ICLR).

    Google Scholar 

  51. Qi, H., Brown, M., & Lowe, D. G. (2018). Low-shot learning with imprinted weights. Conference on Computer Vision and Pattern Recognition (CVPR).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santonu Sarkar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sharma, D., Sarkar, S. (2022). Enabling Inference and Training of Deep Learning Models for AI Applications on IoT Edge Devices. In: Pal, S., De, D., Buyya, R. (eds) Artificial Intelligence-based Internet of Things Systems. Internet of Things. Springer, Cham. https://doi.org/10.1007/978-3-030-87059-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87059-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87058-4

  • Online ISBN: 978-3-030-87059-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics