research-article

Public Access

Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems

Authors:
Peixin Qiao

Illinois Institute of Technology, Chicago, Illinois

Illinois Institute of Technology, Chicago, Illinois
View Profile

,
Xin Wang

Illinois Institute of Technology, Chicago, Illinois

Illinois Institute of Technology, Chicago, Illinois
View Profile

,
Xu Yang

Illinois Institute of Technology, Chicago, Illinois

Illinois Institute of Technology, Chicago, Illinois
View Profile

,
Yuping Fan

Illinois Institute of Technology, Chicago, Illinois

Illinois Institute of Technology, Chicago, Illinois
View Profile

,
Zhiling Lan

Illinois Institute of Technology, Chicago, Illinois

Illinois Institute of Technology, Chicago, Illinois
View Profile

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel ProcessingAugust 2018Article No.: 36Pages 1–10https://doi.org/10.1145/3229710.3229747

Published:13 August 2018Publication History

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Pages 1–10

ABSTRACT

Among the high-radix and low-diameter networks, fat-tree topology is commonly used in high-performance computing (HPC) and datacenter systems. Resource and job management on HPC systems is critically important to mitigate application interference in order to achieve high system performance and utilization. Preliminary studies have shown the effect of job placement on parallel scientific applications performance in fat-tree network. In this work we explore the joint effects of job placement and network routing aware of applications communication pattern on fat-tree system. Applications can be classified into various groups according to the communication patterns. We further combine various job placement policies and routing algorithms and create six different configurations. The system performance is analyzed using communication, hops, traffic, and saturation data by performing fine-grained high-fidelity discrete event-driven simulation. Initial experimentation shows that the performance of HPC applications not only is related with the communication pattern, but also relies on the job placement and network routing on fat-tree systems.

References

Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In ACM SIGCOMM Computer Communication Review, Vol. 38. ACM, 63--74. Google ScholarDigital Library
J Bell, A Almgren, V Beckner, M Day, M Lijewski, A Nonaka, and W Zhang. 2012. BoxLib userÃćÂĂÂ&Zacute;s guide. github. com/BoxLib-Codes/BoxLib (2012).Google Scholar
Texas Advanced Computing Center. accessed Nov. 5, 2017. Stampede User Guide. https://portal.tacc.utexas.edu/user-guides/stampedeGoogle Scholar
Jason Cope, Ning Liu, Sam Lang, Phil Cams, Chris Carothers, and Robert Ross. 2011. Codes: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies, Vol. 2011.Google Scholar
Christina Delimitrou and Christos Kozyrakis. 2013. ibench: Quantifying interference for datacenter applications. In Workload Characterization (IISWC), 2013 IEEE International Symposium on. IEEE, 23--33.Google ScholarCross Ref
Jack Dongarra. 2013. Visit to the National University for Defense Technology Changsha. Oak Ridge National Laboratory.Google Scholar
National Center for Atmospheric Research. accessed Nov. 5, 2017. YellowStone Supercomputer. https://www2.cisl.ucar.edu/resources/computational-systems/yellowstoneGoogle Scholar
Ana Jokanovic, German Rodriguez, Jose Carlos Sancho, and Jesus Labarta. 2010. Impact of inter-application contention in current and future HPC systems. In High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on. IEEE, 15--24. Google ScholarDigital Library
Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Alejandro Lucero, Cyriel Minkenberg, and Jesus Labarta. 2015. Quiet neighborhoods: Key to protect job performance predictability. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 449--459. Google ScholarDigital Library
Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Cyriel Minkenberg, Ramon Beivide, and Jesus Labarta. 2013. On the trade-off of mixing scientific applications on capacity high-performance computing systems. IET Computers & Digital Techniques 7, 2 (2013), 81--92.Google ScholarCross Ref
Melanie Kambadur, Tipp Moseley, Rick Hank, and Martha A Kim. 2012. Measuring interference between live datacenter applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 51. Google ScholarDigital Library
Sandia National Laboratories. accessed Nov. 2, 2017. SST DUMPI trace library. https://github.com/sstsimulator/sst-dumpiGoogle Scholar
Lawrence Livermore National Laboratory. accessed Nov. 5, 2017. Cab Supercompute. https://computation.llnl.gov/computers/cabGoogle Scholar
Oak Ridge National Laboratory. accessed Nov. 5, 2017. Summit Supercomputer. https://www.olcf.ornl.gov/summit/Google Scholar
Ning Liu, Christopher Carothers, Jason Cope, Philip Cams, Robert Ross, Adam Crume, and Carlos Maltzahn. 2012. Modeling a leadership-scale storage system. Parallel Processing and Applied Mathematics (2012), 10--19. Google ScholarDigital Library
Javier Navaridas, Jose A Pascual, and Jose Miguel-Alonso. 2009. Effects of job and task placement on parallel scientific applications performance. In Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on. IEEE, 55--61. Google ScholarDigital Library
Sabine R Ohring, Maximilian Ibel, Sajal K Das, and Mohan J Kumar. 1995. On generalized fat trees. In Parallel Processing Symposium, 1995. Proceedings., 9th International. IEEE, 37--44. Google ScholarDigital Library
Sabine R Ohring, Maximilian Ibel, Sajal K Das, and Mohan J Kumar. 1995. On generalized fat trees. In Parallel Processing Symposium, 1995. Proceedings., 9th International. IEEE, 37--44. Google ScholarDigital Library
E. Merzari P. Fischer, A. Obabko and O. Marin. accessed Nov. 5, 2017. Nek5000: Computational fluid dynamics code. http://nek5000.mcs.anl.govGoogle Scholar
Philip C Roth, Jeremy S Meredith, and Jeffrey S Vetter. 2015. Automated Characterization of Parallel Application Communication Patterns. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 73--84. Google ScholarDigital Library
Jeffrey S Vetter and Frank Mueller. 2003. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. J. Parallel and Distrib. Comput. 63, 9 (2003), 853--865. Google ScholarDigital Library
Noah Wolfe, Christopher D Carothers, Misbah Mubarak, Robert Ross, and Philip Cams. 2016. Modeling a million-node slim fly network using parallel discrete-event simulation. In Proceedings of the 2016 annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation. ACM, 189--199. Google ScholarDigital Library
Noah Wolfe, Misbah Mubarak, Nikhil Jain, Jens Domke, Abhinav Bhatele, Christopher D Carothers, and Robert B Ross. {n. d.}. Preliminary Performance Analysis of Multi-rail Fat-tree Networks. ({n. d.}).Google Scholar
Ulrike Meier Yang et al. 2002. BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 41, 1 (2002), 155--177. Google ScholarDigital Library
Xu Yang, John Jenkins, Misbah Mubarak, Robert B Ross, and Zhiling Lan. 2016. Watch out for the bully! job interference study on dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC, Vol. 16. Google ScholarDigital Library
Xu Yang, John Jenkins, Misbah Mubarak, Xin Wang, Robert B Ross, and Zhiling Lan. 2016. Study of intra-and interjob interference on torus networks. In Parallel and Distributed Systems (ICPADS), 2016 IEEE 22nd International Conference on. IEEE, 239--246.Google ScholarCross Ref
Xu Yang, Zhou Zhou, Wei Tang, Xingwu Zheng, Jia Wang, and Zhiling Lan. 2014. Balancing job performance with system performance via locality-aware scheduling on torus-connected systems. In Cluster Computing (CLUSTER), 2014 IEEE International Conference on. IEEE, 140--148.Google ScholarCross Ref
Eitan Zahavi. 2012. Fat-tree routing and node ordering providing contention free traffic for MPI global collectives. J. Parallel and Distrib. Comput. 72, 11 (2012), 1423--1432. Google ScholarDigital Library
Eitan Zahavi, Gregory Johnson, Darren J Kerbyson, and Michael Lang. 2010. Optimized InfiniBandTM fat-tree routing for shift all-to-all communication patterns. Concurrency and Computation: Practice and Experience 22, 2 (2010), 217--231. Google ScholarCross Ref
Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2014. Bandwidth-aware resource management for extreme scale systems. In Int. Conf. High Perform. Comput., Netw., Storage Anal.(SC14), poster session.Google Scholar
Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2015. Improving batch scheduling on Blue Gene/Q by relaxing 5D torus network allocation constraints. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 439--448. Google ScholarDigital Library
Zhou Zhou, Xu Yang, Zhiling Lan, Paul Rich, Wei Tang, Vitali Morozov, and Narayan Desai. 2016. Improving Batch Scheduling on Blue Gene/Q by Relaxing Network Allocation Constraints. IEEE Transactions on Parallel and Distributed Systems 27, 11 (2016), 3269--3282. Google ScholarDigital Library

Index Terms

Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems
1. Networks
  1. Network performance evaluation
    1. Network performance analysis
    2. Network simulations

Recommendations

Evaluation of an interference-free node allocation policy on fat-tree clusters
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

Interference between jobs competing for network bandwidth on a fat-tree cluster can cause significant variability and degradation in performance. These performance issues can be mitigated or completely eliminated if the resource allocation policy takes ...
Read More
Evaluation of an interference-free node allocation policy on fat-tree clusters
SC '18: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis

Interference between jobs competing for network bandwidth on a fat-tree cluster can cause significant variability and degradation in performance. These performance issues can be mitigated or completely eliminated if the resource allocation policy takes ...
Read More
Exploring pattern-aware routing in generalized fat tree networks
ICS '09: Proceedings of the 23rd international conference on Supercomputing

New static source routing algorithms for High Performance Computing (HPC) are presented in this work. The target parallel architectures are based on the commonly used fat-tree networks and their slimmed versions. The evaluation of such proposals and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing
August 2018
409 pages
ISBN:9781450365239
DOI:10.1145/3229710

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HPC parallel application
discrete event-driven simulation
fat-tree topology
interference
resource and job management
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 270
  Total Downloads
- Downloads (Last 12 months)49
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of an interference-free node allocation policy on fat-tree clusters

Evaluation of an interference-free node allocation policy on fat-tree clusters

Exploring pattern-aware routing in generalized fat tree networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Joint Effects of Application Communication Pattern, Job Placement and Network Routing on Fat-Tree Systems

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of an interference-free node allocation policy on fat-tree clusters

Evaluation of an interference-free node allocation policy on fat-tree clusters

Exploring pattern-aware routing in generalized fat tree networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media