skip to main content
10.1145/3229710.3229747acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Public Access

Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems

Authors Info & Claims
Published:13 August 2018Publication History

ABSTRACT

Among the high-radix and low-diameter networks, fat-tree topology is commonly used in high-performance computing (HPC) and datacenter systems. Resource and job management on HPC systems is critically important to mitigate application interference in order to achieve high system performance and utilization. Preliminary studies have shown the effect of job placement on parallel scientific applications performance in fat-tree network. In this work we explore the joint effects of job placement and network routing aware of applications communication pattern on fat-tree system. Applications can be classified into various groups according to the communication patterns. We further combine various job placement policies and routing algorithms and create six different configurations. The system performance is analyzed using communication, hops, traffic, and saturation data by performing fine-grained high-fidelity discrete event-driven simulation. Initial experimentation shows that the performance of HPC applications not only is related with the communication pattern, but also relies on the job placement and network routing on fat-tree systems.

References

  1. Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In ACM SIGCOMM Computer Communication Review, Vol. 38. ACM, 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J Bell, A Almgren, V Beckner, M Day, M Lijewski, A Nonaka, and W Zhang. 2012. BoxLib userÃćÂĂÂŹs guide. github. com/BoxLib-Codes/BoxLib (2012).Google ScholarGoogle Scholar
  3. Texas Advanced Computing Center. accessed Nov. 5, 2017. Stampede User Guide. https://portal.tacc.utexas.edu/user-guides/stampedeGoogle ScholarGoogle Scholar
  4. Jason Cope, Ning Liu, Sam Lang, Phil Cams, Chris Carothers, and Robert Ross. 2011. Codes: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies, Vol. 2011.Google ScholarGoogle Scholar
  5. Christina Delimitrou and Christos Kozyrakis. 2013. ibench: Quantifying interference for datacenter applications. In Workload Characterization (IISWC), 2013 IEEE International Symposium on. IEEE, 23--33.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jack Dongarra. 2013. Visit to the National University for Defense Technology Changsha. Oak Ridge National Laboratory.Google ScholarGoogle Scholar
  7. National Center for Atmospheric Research. accessed Nov. 5, 2017. YellowStone Supercomputer. https://www2.cisl.ucar.edu/resources/computational-systems/yellowstoneGoogle ScholarGoogle Scholar
  8. Ana Jokanovic, German Rodriguez, Jose Carlos Sancho, and Jesus Labarta. 2010. Impact of inter-application contention in current and future HPC systems. In High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on. IEEE, 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Alejandro Lucero, Cyriel Minkenberg, and Jesus Labarta. 2015. Quiet neighborhoods: Key to protect job performance predictability. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 449--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Cyriel Minkenberg, Ramon Beivide, and Jesus Labarta. 2013. On the trade-off of mixing scientific applications on capacity high-performance computing systems. IET Computers & Digital Techniques 7, 2 (2013), 81--92.Google ScholarGoogle ScholarCross RefCross Ref
  11. Melanie Kambadur, Tipp Moseley, Rick Hank, and Martha A Kim. 2012. Measuring interference between live datacenter applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sandia National Laboratories. accessed Nov. 2, 2017. SST DUMPI trace library. https://github.com/sstsimulator/sst-dumpiGoogle ScholarGoogle Scholar
  13. Lawrence Livermore National Laboratory. accessed Nov. 5, 2017. Cab Supercompute. https://computation.llnl.gov/computers/cabGoogle ScholarGoogle Scholar
  14. Oak Ridge National Laboratory. accessed Nov. 5, 2017. Summit Supercomputer. https://www.olcf.ornl.gov/summit/Google ScholarGoogle Scholar
  15. Ning Liu, Christopher Carothers, Jason Cope, Philip Cams, Robert Ross, Adam Crume, and Carlos Maltzahn. 2012. Modeling a leadership-scale storage system. Parallel Processing and Applied Mathematics (2012), 10--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Javier Navaridas, Jose A Pascual, and Jose Miguel-Alonso. 2009. Effects of job and task placement on parallel scientific applications performance. In Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on. IEEE, 55--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sabine R Ohring, Maximilian Ibel, Sajal K Das, and Mohan J Kumar. 1995. On generalized fat trees. In Parallel Processing Symposium, 1995. Proceedings., 9th International. IEEE, 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sabine R Ohring, Maximilian Ibel, Sajal K Das, and Mohan J Kumar. 1995. On generalized fat trees. In Parallel Processing Symposium, 1995. Proceedings., 9th International. IEEE, 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Merzari P. Fischer, A. Obabko and O. Marin. accessed Nov. 5, 2017. Nek5000: Computational fluid dynamics code. http://nek5000.mcs.anl.govGoogle ScholarGoogle Scholar
  20. Philip C Roth, Jeremy S Meredith, and Jeffrey S Vetter. 2015. Automated Characterization of Parallel Application Communication Patterns. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 73--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jeffrey S Vetter and Frank Mueller. 2003. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. J. Parallel and Distrib. Comput. 63, 9 (2003), 853--865. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Noah Wolfe, Christopher D Carothers, Misbah Mubarak, Robert Ross, and Philip Cams. 2016. Modeling a million-node slim fly network using parallel discrete-event simulation. In Proceedings of the 2016 annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation. ACM, 189--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Noah Wolfe, Misbah Mubarak, Nikhil Jain, Jens Domke, Abhinav Bhatele, Christopher D Carothers, and Robert B Ross. {n. d.}. Preliminary Performance Analysis of Multi-rail Fat-tree Networks. ({n. d.}).Google ScholarGoogle Scholar
  24. Ulrike Meier Yang et al. 2002. BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41, 1 (2002), 155--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xu Yang, John Jenkins, Misbah Mubarak, Robert B Ross, and Zhiling Lan. 2016. Watch out for the bully! job interference study on dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC, Vol. 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xu Yang, John Jenkins, Misbah Mubarak, Xin Wang, Robert B Ross, and Zhiling Lan. 2016. Study of intra-and interjob interference on torus networks. In Parallel and Distributed Systems (ICPADS), 2016 IEEE 22nd International Conference on. IEEE, 239--246.Google ScholarGoogle ScholarCross RefCross Ref
  27. Xu Yang, Zhou Zhou, Wei Tang, Xingwu Zheng, Jia Wang, and Zhiling Lan. 2014. Balancing job performance with system performance via locality-aware scheduling on torus-connected systems. In Cluster Computing (CLUSTER), 2014 IEEE International Conference on. IEEE, 140--148.Google ScholarGoogle ScholarCross RefCross Ref
  28. Eitan Zahavi. 2012. Fat-tree routing and node ordering providing contention free traffic for MPI global collectives. J. Parallel and Distrib. Comput. 72, 11 (2012), 1423--1432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Eitan Zahavi, Gregory Johnson, Darren J Kerbyson, and Michael Lang. 2010. Optimized InfiniBandTM fat-tree routing for shift all-to-all communication patterns. Concurrency and Computation: Practice and Experience 22, 2 (2010), 217--231. Google ScholarGoogle ScholarCross RefCross Ref
  30. Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2014. Bandwidth-aware resource management for extreme scale systems. In Int. Conf. High Perform. Comput., Netw., Storage Anal.(SC14), poster session.Google ScholarGoogle Scholar
  31. Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2015. Improving batch scheduling on Blue Gene/Q by relaxing 5D torus network allocation constraints. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 439--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2016. Improving Batch Scheduling on Blue Gene/Q by Relaxing Network Allocation Constraints. IEEE Transactions on Parallel and Distributed Systems 27, 11 (2016), 3269--3282. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing
        August 2018
        409 pages
        ISBN:9781450365239
        DOI:10.1145/3229710

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 August 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate91of313submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader