Skip to main content

Horus: An Interference-Aware Resource Manager for Deep Learning Systems

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12453))

Abstract

Deep Learning (DL) models are deployed as jobs within machines containing GPUs. These DL systems - ranging from a singular GPU device to machine clusters - require state-of-the-art resource management to increase resource utilization and job throughput. While it has been identified that co-location - multiple jobs co-located within the same GPU - is an effective means to achieve this, such co-location incurs performance interference that directly debilitates DL training and inference performance. Existing approaches to mitigate interference require resource intensive and time consuming kernel profiling ill-suited for runtime scheduling decisions. Current DL system resource management are not designed to deal with these problems. This paper proposes Horus, an interference-aware resource manager for DL systems. Instead of leveraging expensive kernel-profiling, our approach estimates job resource utilization and co-location patterns to determine effective DL job placement to minimize likelihood of interference, as well as improve system resource utilization and makespan. Our analysis shows that interference cause up to 3.2x DL job slowdown. We integrated our approach within the Kubernetes resource manager, and conduct experiments in a DL cluster by training 2,500 DL jobs using 13 different models types. Results demonstrate that Horus is able to outperform other DL resource managers by up to 61.5% for resource utilization and 33.6% for makespan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Which we refer to as DL resource managers.

  2. 2.

    https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/profiler.

  3. 3.

    https://pytorch.org/docs/stable/jit.html.

References

  1. Nvidia Deep Learning Performance Guide, https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html

  2. Pytorch, https://pytorch.org/

  3. Amaral, M., Polo, J., Carrera, D., Seelam, S., Steinder, M.: Topology-aware GPU scheduling for learning workloads in cloud environments. In: ACM SC (2017)

    Google Scholar 

  4. Bhuiyan, A., Guo, Z., Saifullah, A., Guan, N., Xiong, H.: Energy-efficient real-time scheduling of DAG tasks. ACM TECS 17, 1–25 (2018)

    Article  Google Scholar 

  5. Chaudhary, S., et al.: Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning. In: ACM EuroSys 2020 (2020)

    Google Scholar 

  6. Chen, Q., Yang, H., et al.: Prophet: precise QoS prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. In: ACM SIGOPS Operating Systems Review (2017)

    Google Scholar 

  7. Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: Advances in Neural Information Processing Systems, pp. 4467–4475 (2017)

    Google Scholar 

  8. Delimitrou, C., Kozyrakis, C.: Paragon: QoS-aware scheduling for heterogeneous datacenters. In: ACM SIGPLAN Notices. ACM (2013)

    Google Scholar 

  9. Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-aware cluster management. In: ACM ASPLOS (2014)

    Google Scholar 

  10. Gardner, M., Grus, J., Neumann, M., Tafjord, O., et al.: AllenNLP: a deep semantic natural language processing platform (2017)

    Google Scholar 

  11. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)

    Google Scholar 

  12. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  13. Gu, J., Chowdhury, M., Shin, K.G., Zhu, Y., et al.: Tiresias: a \(\{\)GPU\(\}\) cluster manager for distributed deep learning. In: USENIX NSDI (2019)

    Google Scholar 

  14. Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In: IEEE CVPR, pp. 5927–5935 (2017)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR (2016)

    Google Scholar 

  16. Hightower, K., Burns, B., Beda, J.: Kubernetes: Up and Running: Dive into the Future of Infrastructure. O’Reilly Media Inc., Sebastopol (2017)

    Google Scholar 

  17. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE CVPR, pp. 4700–4708 (2017)

    Google Scholar 

  18. Iandola, F.N., Han, S., et al.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)

  19. Jeon, M., et al.: Analysis of large-scale multi-tenant GPU clusters for DNN training workloads. arXiv preprint arXiv:1901.05758 (2019)

  20. Kambatla, K., Yarlagadda, V., Goiri, Í., Grama, A.: UBIS: utilization-aware cluster scheduling. In: IEEE IPDPS (2018)

    Google Scholar 

  21. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Tech. rep, Citeseer (2009)

    Google Scholar 

  22. Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: ECCV (2018)

    Google Scholar 

  23. Mars, J., Tang, L., et al.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: IEEE/ACM MICRO (2011)

    Google Scholar 

  24. Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016)

  25. Peng, Y., Bao, Y., Chen, Y., Wu, C., Guo, C.: Optimus: an efficient dynamic resource scheduler for deep learning clusters. In: ACM EuroSys (2018)

    Google Scholar 

  26. Phull, R., et al.: Interference-driven resource management for GPU-based heterogeneous clusters. In: Proceedings of HDPC. ACM (2012)

    Google Scholar 

  27. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: IEEE CVPR, pp. 4510–4520 (2018)

    Google Scholar 

  28. Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omega: flexible, scalable schedulers for large compute clusters. In: ACM EuroSys (2013)

    Google Scholar 

  29. Shen, H., et al.: Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In: ACM SOSP (2019)

    Google Scholar 

  30. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556 (2014)

  31. Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  32. Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)

  33. Tan, M., et al.: MNASNet: platform-aware neural architecture search for mobile. In: IEEE CVPR, pp. 2820–2828 (2019)

    Google Scholar 

  34. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  35. Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: ACM SoCC (2013)

    Google Scholar 

  36. (WMT19), A.M.T.: Shared task: machine translation of news. http://www.statmt.org/wmt19/translation-task.html

  37. Xiao, W., et al.: Gandiva: introspective cluster scheduling for deep learning. In: USENIX OSDI (2018)

    Google Scholar 

  38. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE CVPR (2017)

    Google Scholar 

  39. Xu, X., et al.: Characterization and prediction of performance interference on mediated passthrough GPUs for interference-aware scheduler. In: HotCloud (2019)

    Google Scholar 

  40. Yeung, G.F., Borowiec, D., Friday, A., Harper, R., Garraghan, P.: Towards GPU utilization prediction for cloud deep learning. In: USENIX HotCloud (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renyu Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yeung, G., Borowiec, D., Yang, R., Friday, A., Harper, R., Garraghan, P. (2020). Horus: An Interference-Aware Resource Manager for Deep Learning Systems. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_33

Download citation

Publish with us

Policies and ethics