skip to main content
10.1145/2755573.2755595acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Practical Massively Parallel Sorting

Authors Info & Claims
Published:13 June 2015Publication History

ABSTRACT

Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and overcome this gap both in theory and practice. The algorithms are multi-level generalizations of the known algorithms sample sort and multiway mergesort. In particular, our sample sort variant turns out to be very scalable both in theory and practice where it scales up to 215 MPI processes with outstanding performance in particular for medium sized inputs. Some tools we develop may be of independent interest -- a simple, practical, and flexible sorting algorithm for very small inputs, a near linear time ptimal algorithm for solving a constrained bin packing problem, and an algorithm for data delivery, that guarantees a small number of message startups on each processor.

References

  1. M. Axtmann, T. Bingmann, P. Sanders, and C. Schulz. Practical massively parallel sorting. CoRR, abs/1410.6754v2, 2015.Google ScholarGoogle Scholar
  2. V. Bala, J. Bruck, R. Cypher, P. Elustondo, A. Ho, C. Ho, S. Kipnis, and M. Snir. CCL: A portable and tunable collective communication library for scalable parallel computers. IEEE Transactions on Parallel and Distributed Systems, 6(2):154--164, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, pages 307--314, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Bäumker, W. Dittrich, and F. Meyer auf der Heide. Truly efficient parallel algorithms: c-optimal multisearch for an extension of the BSP model. In 3rd European Symposium on Algorithms (ESA), volume 979 of LNCS, pages 17--30. Springer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Bingmann, A. Eberle, and P. Sanders. Engineering parallel string sorting. Preprint arXiv:1403.2056, 2014.Google ScholarGoogle Scholar
  6. G. E. Blelloch et al. A comparison of sorting algorithms for the connection machine CM-2. In 3rd ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 3--16, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low depth cache-oblivious algorithms. In 22nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 189--199, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Borkar. Exascale computing -- a fact or a fiction? Keynote presentation at IPDPS 2013, Boston, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770--785, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Cole and V. Ramachandran. Resource oblivious sorting on multicores. In Automata, Languages and Programming (ICALP), volume 6198 of LNCS, pages 226--237. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Feng, M. Straka, T. di Matteo, and R. Croft. MP-sort: Sorting at scale on blue waters. https://www.writelatex.com/read/sttmdgqthvyv accessed Jan 17, 2015, 2014.Google ScholarGoogle Scholar
  12. A. Gerbessiotis and L. Valiant. Direct bulk-synchronous parallel algorithms. Journal of Parallel and Distributed Computing, 22(2):251--267, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. T. Goodrich. Communication-efficient parallel sorting. SIAM Journal on Computing, 29(2):416--432, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Hagerup and C. Rüb. Optimal merging and sorting on the EREW-PRAM. Information Processing Letters, 33:181--185, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. A. R. Hoare. Algorithm 65 (find). Communication of the ACM, 4(7):321--322, 1961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Hübschle-Schneider, P. Sanders, and I. Müller. Communication efficient algorithms for top-k selection problems. CoRR, abs/1502.03942, 2015.Google ScholarGoogle Scholar
  17. M. Ikkert, T. Kieritz, and P. Sanders. Parallele Algorithmen. course notes, October 2009.Google ScholarGoogle Scholar
  18. J. Jájá. An Introduction to Parallel Algorithms. Addison Wesley, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. E. Knuth. The Art of Computer Programming--Sorting and Searching. Addison Wesley, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing. Design and Analysis of Algorithms. Benjamin/Cummings, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Li and K. C. Sevcik. Parallel sorting by overpartitioning. In 6th ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 46--56, Cape May, New Jersey, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Mehlhorn and P. Sanders. Algorithms and Data Structures -- The Basic Toolbox. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Rasmussen, G. Porter, M. Conley, H. V. Madhyastha, R. N. Mysore, A. Pucher, and A. Vahdat. Tritonsort: A balanced large-scale sorting system. In NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Sanders. Fast priority queues for cached memory. ACM Journal of Experimental Algorithmics, 5, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Sanders. Course on Parallel Algorithms, lecture notes, 2008. http://algo2.iti.kit.edu/sanders/courses/paralg08/.Google ScholarGoogle Scholar
  26. P. Sanders, S. Schlag, and I. Müller. Communication efficient algorithms for fundamental big data problems. In IEEE Int. Conf. on Big Data, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  27. P. Sanders, J. Speck, and J. L. Tr\"aff. Two-tree algorithms for full bandwidth broadcast, reduction and scan. Parallel Computing, 35(12):581--594, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Sanders and J. L. Träff. The factor algorithm for regular all-to-all communication on clusters of SMP nodes. In 8th Euro-Par, number 2400, pages 799--803. Springer, 2002.Google ScholarGoogle Scholar
  29. P. Sanders and S. Winkel. Super scalar sample sort. In 12th European Symposium on Algorithms (ESA), volume 3221 of LNCS, pages 784--796. Springer, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  30. J. Singler, P. Sanders, and F. Putze. MCSTL: The multi-core standard template library. In 13th Euro-Par, volume 4641 of LNCS, pages 682--694. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Solomonik and L. Kale. Highly scalable parallel sorting. In IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1--12, April 2010.Google ScholarGoogle ScholarCross RefCross Ref
  32. L. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. J. Varman et al. Merging multiple lists on hierarchical-memory multiprocessors. J. Par. & Distr. Comp., 12(2):171--177, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Practical Massively Parallel Sorting

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures
          June 2015
          362 pages
          ISBN:9781450335881
          DOI:10.1145/2755573
          • General Chair:
          • Guy Blelloch,
          • Program Chair:
          • Kunal Agrawal

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 June 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SPAA '15 Paper Acceptance Rate31of131submissions,24%Overall Acceptance Rate447of1,461submissions,31%

          Upcoming Conference

          SPAA '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader