skip to main content
10.1145/3498361.3539765acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

Memory-efficient DNN training on mobile devices

Authors Info & Claims
Published:27 June 2022Publication History

ABSTRACT

On-device deep neural network (DNN) training holds the potential to enable a rich set of privacy-aware and infrastructure-independent personalized mobile applications. However, despite advancements in mobile hardware, locally training a complex DNN is still a nontrivial task given its resource demands. In this work, we show that the limited memory resources on mobile devices are the main constraint and propose Sage as a framework for efficiently optimizing memory resources for on-device DNN training. Specifically, Sage configures a flexible computation graph for DNN gradient evaluation and reduces the memory footprint of the graph using operator- and graph-level optimizations. In run-time, Sage employs a hybrid of gradient checkpointing and micro-batching techniques to dynamically adjust its memory use to the available system memory budget. Using implementation on off-the-shelf smartphones, we show that Sage enables local training of complex DNN models by reducing memory use by more than 20-fold compared to a baseline approach. We also show that Sage successfully adapts to run-time memory budget variations, and evaluate its energy consumption to show Sage's practical applicability.

References

  1. 2020. ML Commons Benchmark: Mobile Inference v0.7 Results. https://mlcommons.org/en/inference-mobile-07/Google ScholarGoogle Scholar
  2. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018, Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 578--594. https://www.usenix.org/conference/osdi18/presentation/chenGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  3. Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training Deep Nets with Sublinear Memory Cost. CoRR abs/1604.06174 (2016). arXiv:1604.06174 http://arxiv.org/abs/1604.06174Google ScholarGoogle Scholar
  4. Tim Dettmers and Luke Zettlemoyer. 2019. Sparse Networks from Scratch: Faster Training without Losing Performance. CoRR abs/1907.04840 (2019). arXiv:1907.04840 http://arxiv.org/abs/1907.04840Google ScholarGoogle Scholar
  5. Android Developers. 2013. Profile battery usage with Batterystats and Battery Historian. https://developer.android.com/topic/performance/power/setup-battery-historian.Google ScholarGoogle Scholar
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805Google ScholarGoogle Scholar
  7. Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. 2020. Sparse GPU kernels for deep learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November 9-19, 2020, Christine Cuicchi, Irene Qualters, and William T. Kramer (Eds.). IEEE/ACM, 17. Google ScholarGoogle ScholarCross RefCross Ref
  8. Taesik Gong, Yeonsu Kim, Jinwoo Shin, and Sung-Ju Lee. 2019. MetaSense: few-shot adaptation to untrained conditions in deep mobile sensing. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems, SenSys 2019, New York, NY, USA, November 10-13, 2019, Raghu K. Ganti, Xiaofan Fred Jiang, Gian Pietro Picco, and Xia Zhou (Eds.). ACM, 110--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andreas Griewank and Andrea Walther. 2000. Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation. ACM Trans. Math. Softw. 26, 1 (2000), 19--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770--778. Google ScholarGoogle ScholarCross RefCross Ref
  11. Chien-Chin Huang, Gu Jin, and Jinyang Li. 2020. SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping. In ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16-20, 2020, James R. Larus, Luis Ceze, and Karin Strauss (Eds.). ACM, 1341--1355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 2261--2269. Google ScholarGoogle ScholarCross RefCross Ref
  13. Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 103--112. https://proceedings.neurips.cc/paper/2019/hash/093f65e080a295f8076b1c5722a46aa2-Abstract.htmlGoogle ScholarGoogle Scholar
  14. Sinh Huynh, Rajesh Krishna Balan, and JeongGil Ko. 2022. IMon: Appearance-Based Gaze Tracking System on Mobile Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 161 (dec 2022), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sinh Huynh, Rajesh Krishna Balan, JeongGil Ko, and Youngki Lee. 2019. VitaMon: Measuring Heart Rate Variability Using Smartphone Front Camera. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems (New York, New York) (SenSys '19). Association for Computing Machinery, New York, NY, USA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Apple Inc. 2013. CoreML Documentation. https://developer.apple.com/documentation/coreml.Google ScholarGoogle Scholar
  17. Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, and Joseph Gonzalez. 2020. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org. https://proceedings.mlsys.org/book/320.pdfGoogle ScholarGoogle Scholar
  18. Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lyu, and Zhihua Wu. 2020. MNN: A Universal and Efficient Inference Engine. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org. https://proceedings.mlsys.org/book/287.pdfGoogle ScholarGoogle Scholar
  19. Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor N. Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017, Xi'an, China, April 8-12, 2017, Yunji Chen, Olivier Temam, and John Carter (Eds.). ACM, 615--629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Marisa Kirisame, Steven Lyubomirsky, Altan Haan, Jennifer Brennan, Mike He, Jared Roesch, Tianqi Chen, and Zachary Tatlock. 2021. Dynamic Tensor Rematerialization. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=Vfs_2RnOD0HGoogle ScholarGoogle Scholar
  21. Marisa Kirisame, Steven Lyubomirsky, Altan Haan, Jennifer Brennan, Mike He, Jared Roesch, Tianqi Chen, and Zachary Tatlock. 2021. Dynamic Tensor Rematerialization. In International Conference on Learning Representations. https://openreview.net/forum?id=Vfs_2RnOD0HGoogle ScholarGoogle Scholar
  22. Mitsuru Kusumoto, Takuya Inoue, Gentaro Watanabe, Takuya Akiba, and Masanori Koyama. 2019. A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 1161--1170. https://proceedings.neurips.cc/paper/2019/hash/e555ebe0ce426f7f9b2bef0706315e0c-Abstract.htmlGoogle ScholarGoogle Scholar
  23. Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-Chang Liang, Qiang Yang, Dusit Niyato, and Chunyan Miao. 2020. Federated Learning in Mobile Edge Networks: A Comprehensive Survey. IEEE Commun. Surv. Tutorials 22, 3 (2020), 2031--2063. Google ScholarGoogle ScholarCross RefCross Ref
  24. Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Françoise Beaufays, and Carolina Parada. 2016. Personalized speech recognition on mobile devices. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016. IEEE, 5955--5959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA (Proceedings of Machine Learning Research, Vol. 54), Aarti Singh and Xiaojin (Jerry) Zhu (Eds.). PMLR, 1273--1282. http://proceedings.mlr.press/v54/mcmahan17a.htmlGoogle ScholarGoogle Scholar
  26. Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed Precision Training. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=r1gs9JgRZGoogle ScholarGoogle Scholar
  27. Cedric Nugteren. 2018. CLBlast: A Tuned OpenCL BLAS Library. In Proceedings of the International Workshop on OpenCL, IWOCL 2018, Oxford, United Kingdom, May 14-16, 2018, Simon McIntosh-Smith and Ben Bergen (Eds.). ACM, 5:1--5:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. HyeonJung Park, Jong-Seok Lee, and JeongGil Ko. 2020. Achieving Real-Time Sign Language Translation Using a Smartphone's True Depth Images. In 2020 International Conference on COMmunication Systems NETworkS (COMSNETS). 622--625. Google ScholarGoogle ScholarCross RefCross Ref
  29. HyeonJung Park, Youngki Lee, and JeongGil Ko. 2021. Enabling Real-Time Sign Language Translation on Mobile Platforms with On-Board Depth Cameras. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 2, Article 77 (jun 2021), 30 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jaeyeon Park, Hyeon Cho, Rajesh Krishna Balan, and JeongGil Ko. 2020. HeartQuake: Accurate Low-Cost Non-Invasive ECG Monitoring Using Bed-Mounted Geophones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 3, Article 93 (sep 2020), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xuan Peng, Xuanhua Shi, Hulin Dai, Hai Jin, Weiliang Ma, Qian Xiong, Fan Yang, and Xuehai Qian. 2020. Capuchin: Tensor-based GPU Memory Management for Deep Learning. In ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, March 16-20, 2020, James R. Larus, Luis Ceze, and Karin Strauss (Eds.). ACM, 891--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, and Yuxiong He. 2021. ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning. In SC '21: The International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, Missouri, USA, November 14 - 19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). ACM, 59:1--59:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. IEEE Computer Society, 18:1--18:13. Google ScholarGoogle ScholarCross RefCross Ref
  34. Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016. IEEE Computer Society, 18:1--18:13. Google ScholarGoogle ScholarCross RefCross Ref
  35. Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Summer Deng, Roman Dzhabarov, James Hegeman, Roman Levenstein, Bert Maher, Nadathur Satish, Jakob Olesen, Jongsoo Park, Artem Rakhov, and Misha Smelyanskiy. 2018. Glow: Graph Lowering Compiler Techniques for Neural Networks. CoRR abs/1805.00907 (2018). arXiv:1805.00907 http://arxiv.org/abs/1805.00907Google ScholarGoogle Scholar
  36. Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 4510--4520. Google ScholarGoogle ScholarCross RefCross Ref
  37. Charles Michael Stein, Dinei A. Rockenbach, Dalvan Griebler, Massimo Torquati, Gabriele Mencagli, Marco Danelutto, and Luiz Gustavo Fernandes. 2021. Latency-aware adaptive micro-batching techniques for streamed data compression on graphics processing units. Concurr. Comput. Pract. Exp. 33, 11 (2021). Google ScholarGoogle ScholarCross RefCross Ref
  38. Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. 2020. MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 2158--2170. Google ScholarGoogle ScholarCross RefCross Ref
  39. Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. 2014. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, 1701--1708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. TensorFlow. 2022. TensorFlow Lite: ML for Mobile and edge devices. https://www.tensorflow.org/lite.Google ScholarGoogle Scholar
  41. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recognit. Lett. 119 (2019), 3--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: dynamic GPU memory management for training deep neural networks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, Vienna, Austria, February 24-28, 2018, Andreas Krall and Thomas R. Gross (Eds.). ACM, 41--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018. Recent Trends in Deep Learning Based Natural Language Processing [Review Article]. IEEE Comput. Intell. Mag. 13, 3 (2018), 55--75. Google ScholarGoogle ScholarCross RefCross Ref
  45. Shuai Yu, Xin Wang, and Rami Langar. 2017. Computation offloading for mobile edge computing: A deep learning approach. In 28th IEEE Annual International Symposium on Personal, Indoor, and Mobile Radio Communications, PIMRC 2017, Montreal, QC, Canada, October 8-13, 2017. IEEE, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Memory-efficient DNN training on mobile devices

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MobiSys '22: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services
        June 2022
        668 pages
        ISBN:9781450391856
        DOI:10.1145/3498361

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 June 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate274of1,679submissions,16%

        Upcoming Conference

        MOBISYS '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader