Horus: An Interference-Aware Resource Manager for Deep Learning Systems

Yeung, Gingfung; Borowiec, Damian; Yang, Renyu; Friday, Adrian; Harper, Richard; Garraghan, Peter

doi:10.1007/978-3-030-60239-0_33

Gingfung Yeung⁹,
Damian Borowiec⁹,
Renyu Yang¹⁰,
Adrian Friday⁹,
Richard Harper⁹ &
…
Peter Garraghan⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12453))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1880 Accesses
3 Citations

Abstract

Deep Learning (DL) models are deployed as jobs within machines containing GPUs. These DL systems - ranging from a singular GPU device to machine clusters - require state-of-the-art resource management to increase resource utilization and job throughput. While it has been identified that co-location - multiple jobs co-located within the same GPU - is an effective means to achieve this, such co-location incurs performance interference that directly debilitates DL training and inference performance. Existing approaches to mitigate interference require resource intensive and time consuming kernel profiling ill-suited for runtime scheduling decisions. Current DL system resource management are not designed to deal with these problems. This paper proposes Horus, an interference-aware resource manager for DL systems. Instead of leveraging expensive kernel-profiling, our approach estimates job resource utilization and co-location patterns to determine effective DL job placement to minimize likelihood of interference, as well as improve system resource utilization and makespan. Our analysis shows that interference cause up to 3.2x DL job slowdown. We integrated our approach within the Kubernetes resource manager, and conduct experiments in a DL cluster by training 2,500 DL jobs using 13 different models types. Results demonstrate that Horus is able to outperform other DL resource managers by up to 61.5% for resource utilization and 33.6% for makespan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Which we refer to as DL resource managers.
2.
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/profiler.
3.
https://pytorch.org/docs/stable/jit.html.

References

Nvidia Deep Learning Performance Guide, https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html
Pytorch, https://pytorch.org/
Amaral, M., Polo, J., Carrera, D., Seelam, S., Steinder, M.: Topology-aware GPU scheduling for learning workloads in cloud environments. In: ACM SC (2017)
Google Scholar
Bhuiyan, A., Guo, Z., Saifullah, A., Guan, N., Xiong, H.: Energy-efficient real-time scheduling of DAG tasks. ACM TECS 17, 1–25 (2018)
Article Google Scholar
Chaudhary, S., et al.: Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning. In: ACM EuroSys 2020 (2020)
Google Scholar
Chen, Q., Yang, H., et al.: Prophet: precise QoS prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. In: ACM SIGOPS Operating Systems Review (2017)
Google Scholar
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: Advances in Neural Information Processing Systems, pp. 4467–4475 (2017)
Google Scholar
Delimitrou, C., Kozyrakis, C.: Paragon: QoS-aware scheduling for heterogeneous datacenters. In: ACM SIGPLAN Notices. ACM (2013)
Google Scholar
Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-aware cluster management. In: ACM ASPLOS (2014)
Google Scholar
Gardner, M., Grus, J., Neumann, M., Tafjord, O., et al.: AllenNLP: a deep semantic natural language processing platform (2017)
Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
MATH Google Scholar
Gu, J., Chowdhury, M., Shin, K.G., Zhu, Y., et al.: Tiresias: a \(\{\)GPU\(\}\) cluster manager for distributed deep learning. In: USENIX NSDI (2019)
Google Scholar
Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In: IEEE CVPR, pp. 5927–5935 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR (2016)
Google Scholar
Hightower, K., Burns, B., Beda, J.: Kubernetes: Up and Running: Dive into the Future of Infrastructure. O’Reilly Media Inc., Sebastopol (2017)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE CVPR, pp. 4700–4708 (2017)
Google Scholar
Iandola, F.N., Han, S., et al.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Jeon, M., et al.: Analysis of large-scale multi-tenant GPU clusters for DNN training workloads. arXiv preprint arXiv:1901.05758 (2019)
Kambatla, K., Yarlagadda, V., Goiri, Í., Grama, A.: UBIS: utilization-aware cluster scheduling. In: IEEE IPDPS (2018)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Tech. rep, Citeseer (2009)
Google Scholar
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: ECCV (2018)
Google Scholar
Mars, J., Tang, L., et al.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: IEEE/ACM MICRO (2011)
Google Scholar
Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016)
Peng, Y., Bao, Y., Chen, Y., Wu, C., Guo, C.: Optimus: an efficient dynamic resource scheduler for deep learning clusters. In: ACM EuroSys (2018)
Google Scholar
Phull, R., et al.: Interference-driven resource management for GPU-based heterogeneous clusters. In: Proceedings of HDPC. ACM (2012)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: IEEE CVPR, pp. 4510–4520 (2018)
Google Scholar
Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omega: flexible, scalable schedulers for large compute clusters. In: ACM EuroSys (2013)
Google Scholar
Shen, H., et al.: Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In: ACM SOSP (2019)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Tan, M., et al.: MNASNet: platform-aware neural architecture search for mobile. In: IEEE CVPR, pp. 2820–2828 (2019)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: ACM SoCC (2013)
Google Scholar
(WMT19), A.M.T.: Shared task: machine translation of news. http://www.statmt.org/wmt19/translation-task.html
Xiao, W., et al.: Gandiva: introspective cluster scheduling for deep learning. In: USENIX OSDI (2018)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE CVPR (2017)
Google Scholar
Xu, X., et al.: Characterization and prediction of performance interference on mediated passthrough GPUs for interference-aware scheduler. In: HotCloud (2019)
Google Scholar
Yeung, G.F., Borowiec, D., Friday, A., Harper, R., Garraghan, P.: Towards GPU utilization prediction for cloud deep learning. In: USENIX HotCloud (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Communications, Lancaster University, Lancaster, UK
Gingfung Yeung, Damian Borowiec, Adrian Friday, Richard Harper & Peter Garraghan
School of Computing, University of Leeds, Leeds, UK
Renyu Yang

Authors

Gingfung Yeung
View author publications
You can also search for this author in PubMed Google Scholar
Damian Borowiec
View author publications
You can also search for this author in PubMed Google Scholar
Renyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Friday
View author publications
You can also search for this author in PubMed Google Scholar
Richard Harper
View author publications
You can also search for this author in PubMed Google Scholar
Peter Garraghan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renyu Yang .

Editor information

Editors and Affiliations

Columbia University, New York, NY, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yeung, G., Borowiec, D., Yang, R., Friday, A., Harper, R., Garraghan, P. (2020). Horus: An Interference-Aware Resource Manager for Deep Learning Systems. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-60239-0_33
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60238-3
Online ISBN: 978-3-030-60239-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics