ABSTRACT
Among the high-radix and low-diameter networks, fat-tree topology is commonly used in high-performance computing (HPC) and datacenter systems. Resource and job management on HPC systems is critically important to mitigate application interference in order to achieve high system performance and utilization. Preliminary studies have shown the effect of job placement on parallel scientific applications performance in fat-tree network. In this work we explore the joint effects of job placement and network routing aware of applications communication pattern on fat-tree system. Applications can be classified into various groups according to the communication patterns. We further combine various job placement policies and routing algorithms and create six different configurations. The system performance is analyzed using communication, hops, traffic, and saturation data by performing fine-grained high-fidelity discrete event-driven simulation. Initial experimentation shows that the performance of HPC applications not only is related with the communication pattern, but also relies on the job placement and network routing on fat-tree systems.
- Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In ACM SIGCOMM Computer Communication Review, Vol. 38. ACM, 63--74. Google ScholarDigital Library
- J Bell, A Almgren, V Beckner, M Day, M Lijewski, A Nonaka, and W Zhang. 2012. BoxLib userÃćÂĂÂŹs guide. github. com/BoxLib-Codes/BoxLib (2012).Google Scholar
- Texas Advanced Computing Center. accessed Nov. 5, 2017. Stampede User Guide. https://portal.tacc.utexas.edu/user-guides/stampedeGoogle Scholar
- Jason Cope, Ning Liu, Sam Lang, Phil Cams, Chris Carothers, and Robert Ross. 2011. Codes: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies, Vol. 2011.Google Scholar
- Christina Delimitrou and Christos Kozyrakis. 2013. ibench: Quantifying interference for datacenter applications. In Workload Characterization (IISWC), 2013 IEEE International Symposium on. IEEE, 23--33.Google ScholarCross Ref
- Jack Dongarra. 2013. Visit to the National University for Defense Technology Changsha. Oak Ridge National Laboratory.Google Scholar
- National Center for Atmospheric Research. accessed Nov. 5, 2017. YellowStone Supercomputer. https://www2.cisl.ucar.edu/resources/computational-systems/yellowstoneGoogle Scholar
- Ana Jokanovic, German Rodriguez, Jose Carlos Sancho, and Jesus Labarta. 2010. Impact of inter-application contention in current and future HPC systems. In High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on. IEEE, 15--24. Google ScholarDigital Library
- Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Alejandro Lucero, Cyriel Minkenberg, and Jesus Labarta. 2015. Quiet neighborhoods: Key to protect job performance predictability. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 449--459. Google ScholarDigital Library
- Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Cyriel Minkenberg, Ramon Beivide, and Jesus Labarta. 2013. On the trade-off of mixing scientific applications on capacity high-performance computing systems. IET Computers & Digital Techniques 7, 2 (2013), 81--92.Google ScholarCross Ref
- Melanie Kambadur, Tipp Moseley, Rick Hank, and Martha A Kim. 2012. Measuring interference between live datacenter applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 51. Google ScholarDigital Library
- Sandia National Laboratories. accessed Nov. 2, 2017. SST DUMPI trace library. https://github.com/sstsimulator/sst-dumpiGoogle Scholar
- Lawrence Livermore National Laboratory. accessed Nov. 5, 2017. Cab Supercompute. https://computation.llnl.gov/computers/cabGoogle Scholar
- Oak Ridge National Laboratory. accessed Nov. 5, 2017. Summit Supercomputer. https://www.olcf.ornl.gov/summit/Google Scholar
- Ning Liu, Christopher Carothers, Jason Cope, Philip Cams, Robert Ross, Adam Crume, and Carlos Maltzahn. 2012. Modeling a leadership-scale storage system. Parallel Processing and Applied Mathematics (2012), 10--19. Google ScholarDigital Library
- Javier Navaridas, Jose A Pascual, and Jose Miguel-Alonso. 2009. Effects of job and task placement on parallel scientific applications performance. In Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on. IEEE, 55--61. Google ScholarDigital Library
- Sabine R Ohring, Maximilian Ibel, Sajal K Das, and Mohan J Kumar. 1995. On generalized fat trees. In Parallel Processing Symposium, 1995. Proceedings., 9th International. IEEE, 37--44. Google ScholarDigital Library
- Sabine R Ohring, Maximilian Ibel, Sajal K Das, and Mohan J Kumar. 1995. On generalized fat trees. In Parallel Processing Symposium, 1995. Proceedings., 9th International. IEEE, 37--44. Google ScholarDigital Library
- E. Merzari P. Fischer, A. Obabko and O. Marin. accessed Nov. 5, 2017. Nek5000: Computational fluid dynamics code. http://nek5000.mcs.anl.govGoogle Scholar
- Philip C Roth, Jeremy S Meredith, and Jeffrey S Vetter. 2015. Automated Characterization of Parallel Application Communication Patterns. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 73--84. Google ScholarDigital Library
- Jeffrey S Vetter and Frank Mueller. 2003. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. J. Parallel and Distrib. Comput. 63, 9 (2003), 853--865. Google ScholarDigital Library
- Noah Wolfe, Christopher D Carothers, Misbah Mubarak, Robert Ross, and Philip Cams. 2016. Modeling a million-node slim fly network using parallel discrete-event simulation. In Proceedings of the 2016 annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation. ACM, 189--199. Google ScholarDigital Library
- Noah Wolfe, Misbah Mubarak, Nikhil Jain, Jens Domke, Abhinav Bhatele, Christopher D Carothers, and Robert B Ross. {n. d.}. Preliminary Performance Analysis of Multi-rail Fat-tree Networks. ({n. d.}).Google Scholar
- Ulrike Meier Yang et al. 2002. BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41, 1 (2002), 155--177. Google ScholarDigital Library
- Xu Yang, John Jenkins, Misbah Mubarak, Robert B Ross, and Zhiling Lan. 2016. Watch out for the bully! job interference study on dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC, Vol. 16. Google ScholarDigital Library
- Xu Yang, John Jenkins, Misbah Mubarak, Xin Wang, Robert B Ross, and Zhiling Lan. 2016. Study of intra-and interjob interference on torus networks. In Parallel and Distributed Systems (ICPADS), 2016 IEEE 22nd International Conference on. IEEE, 239--246.Google ScholarCross Ref
- Xu Yang, Zhou Zhou, Wei Tang, Xingwu Zheng, Jia Wang, and Zhiling Lan. 2014. Balancing job performance with system performance via locality-aware scheduling on torus-connected systems. In Cluster Computing (CLUSTER), 2014 IEEE International Conference on. IEEE, 140--148.Google ScholarCross Ref
- Eitan Zahavi. 2012. Fat-tree routing and node ordering providing contention free traffic for MPI global collectives. J. Parallel and Distrib. Comput. 72, 11 (2012), 1423--1432. Google ScholarDigital Library
- Eitan Zahavi, Gregory Johnson, Darren J Kerbyson, and Michael Lang. 2010. Optimized InfiniBandTM fat-tree routing for shift all-to-all communication patterns. Concurrency and Computation: Practice and Experience 22, 2 (2010), 217--231. Google ScholarCross Ref
- Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2014. Bandwidth-aware resource management for extreme scale systems. In Int. Conf. High Perform. Comput., Netw., Storage Anal.(SC14), poster session.Google Scholar
- Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2015. Improving batch scheduling on Blue Gene/Q by relaxing 5D torus network allocation constraints. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 439--448. Google ScholarDigital Library
- Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2016. Improving Batch Scheduling on Blue Gene/Q by Relaxing Network Allocation Constraints. IEEE Transactions on Parallel and Distributed Systems 27, 11 (2016), 3269--3282. Google ScholarDigital Library
Index Terms
- Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems
Recommendations
Evaluation of an interference-free node allocation policy on fat-tree clusters
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisInterference between jobs competing for network bandwidth on a fat-tree cluster can cause significant variability and degradation in performance. These performance issues can be mitigated or completely eliminated if the resource allocation policy takes ...
Evaluation of an interference-free node allocation policy on fat-tree clusters
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisInterference between jobs competing for network bandwidth on a fat-tree cluster can cause significant variability and degradation in performance. These performance issues can be mitigated or completely eliminated if the resource allocation policy takes ...
Exploring pattern-aware routing in generalized fat tree networks
ICS '09: Proceedings of the 23rd international conference on SupercomputingNew static source routing algorithms for High Performance Computing (HPC) are presented in this work. The target parallel architectures are based on the commonly used fat-tree networks and their slimmed versions. The evaluation of such proposals and ...
Comments