Abstract
Deep Learning (DL) models are deployed as jobs within machines containing GPUs. These DL systems - ranging from a singular GPU device to machine clusters - require state-of-the-art resource management to increase resource utilization and job throughput. While it has been identified that co-location - multiple jobs co-located within the same GPU - is an effective means to achieve this, such co-location incurs performance interference that directly debilitates DL training and inference performance. Existing approaches to mitigate interference require resource intensive and time consuming kernel profiling ill-suited for runtime scheduling decisions. Current DL system resource management are not designed to deal with these problems. This paper proposes Horus, an interference-aware resource manager for DL systems. Instead of leveraging expensive kernel-profiling, our approach estimates job resource utilization and co-location patterns to determine effective DL job placement to minimize likelihood of interference, as well as improve system resource utilization and makespan. Our analysis shows that interference cause up to 3.2x DL job slowdown. We integrated our approach within the Kubernetes resource manager, and conduct experiments in a DL cluster by training 2,500 DL jobs using 13 different models types. Results demonstrate that Horus is able to outperform other DL resource managers by up to 61.5% for resource utilization and 33.6% for makespan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Which we refer to as DL resource managers.
- 2.
- 3.
References
Nvidia Deep Learning Performance Guide, https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html
Pytorch, https://pytorch.org/
Amaral, M., Polo, J., Carrera, D., Seelam, S., Steinder, M.: Topology-aware GPU scheduling for learning workloads in cloud environments. In: ACM SC (2017)
Bhuiyan, A., Guo, Z., Saifullah, A., Guan, N., Xiong, H.: Energy-efficient real-time scheduling of DAG tasks. ACM TECS 17, 1–25 (2018)
Chaudhary, S., et al.: Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning. In: ACM EuroSys 2020 (2020)
Chen, Q., Yang, H., et al.: Prophet: precise QoS prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers. In: ACM SIGOPS Operating Systems Review (2017)
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: Advances in Neural Information Processing Systems, pp. 4467–4475 (2017)
Delimitrou, C., Kozyrakis, C.: Paragon: QoS-aware scheduling for heterogeneous datacenters. In: ACM SIGPLAN Notices. ACM (2013)
Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-aware cluster management. In: ACM ASPLOS (2014)
Gardner, M., Grus, J., Neumann, M., Tafjord, O., et al.: AllenNLP: a deep semantic natural language processing platform (2017)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Gu, J., Chowdhury, M., Shin, K.G., Zhu, Y., et al.: Tiresias: a \(\{\)GPU\(\}\) cluster manager for distributed deep learning. In: USENIX NSDI (2019)
Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In: IEEE CVPR, pp. 5927–5935 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR (2016)
Hightower, K., Burns, B., Beda, J.: Kubernetes: Up and Running: Dive into the Future of Infrastructure. O’Reilly Media Inc., Sebastopol (2017)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE CVPR, pp. 4700–4708 (2017)
Iandola, F.N., Han, S., et al.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Jeon, M., et al.: Analysis of large-scale multi-tenant GPU clusters for DNN training workloads. arXiv preprint arXiv:1901.05758 (2019)
Kambatla, K., Yarlagadda, V., Goiri, Í., Grama, A.: UBIS: utilization-aware cluster scheduling. In: IEEE IPDPS (2018)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Tech. rep, Citeseer (2009)
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: ECCV (2018)
Mars, J., Tang, L., et al.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: IEEE/ACM MICRO (2011)
Merity, S., Xiong, C., Bradbury, J., Socher, R.: Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843 (2016)
Peng, Y., Bao, Y., Chen, Y., Wu, C., Guo, C.: Optimus: an efficient dynamic resource scheduler for deep learning clusters. In: ACM EuroSys (2018)
Phull, R., et al.: Interference-driven resource management for GPU-based heterogeneous clusters. In: Proceedings of HDPC. ACM (2012)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: IEEE CVPR, pp. 4510–4520 (2018)
Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., Wilkes, J.: Omega: flexible, scalable schedulers for large compute clusters. In: ACM EuroSys (2013)
Shen, H., et al.: Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In: ACM SOSP (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Tan, M., et al.: MNASNet: platform-aware neural architecture search for mobile. In: IEEE CVPR, pp. 2820–2828 (2019)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS (2017)
Vavilapalli, V.K., et al.: Apache hadoop yarn: yet another resource negotiator. In: ACM SoCC (2013)
(WMT19), A.M.T.: Shared task: machine translation of news. http://www.statmt.org/wmt19/translation-task.html
Xiao, W., et al.: Gandiva: introspective cluster scheduling for deep learning. In: USENIX OSDI (2018)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE CVPR (2017)
Xu, X., et al.: Characterization and prediction of performance interference on mediated passthrough GPUs for interference-aware scheduler. In: HotCloud (2019)
Yeung, G.F., Borowiec, D., Friday, A., Harper, R., Garraghan, P.: Towards GPU utilization prediction for cloud deep learning. In: USENIX HotCloud (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yeung, G., Borowiec, D., Yang, R., Friday, A., Harper, R., Garraghan, P. (2020). Horus: An Interference-Aware Resource Manager for Deep Learning Systems. In: Qiu, M. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2020. Lecture Notes in Computer Science(), vol 12453. Springer, Cham. https://doi.org/10.1007/978-3-030-60239-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-60239-0_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60238-3
Online ISBN: 978-3-030-60239-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)