ABSTRACT
Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and overcome this gap both in theory and practice. The algorithms are multi-level generalizations of the known algorithms sample sort and multiway mergesort. In particular, our sample sort variant turns out to be very scalable both in theory and practice where it scales up to 215 MPI processes with outstanding performance in particular for medium sized inputs. Some tools we develop may be of independent interest -- a simple, practical, and flexible sorting algorithm for very small inputs, a near linear time ptimal algorithm for solving a constrained bin packing problem, and an algorithm for data delivery, that guarantees a small number of message startups on each processor.
- M. Axtmann, T. Bingmann, P. Sanders, and C. Schulz. Practical massively parallel sorting. CoRR, abs/1410.6754v2, 2015.Google Scholar
- V. Bala, J. Bruck, R. Cypher, P. Elustondo, A. Ho, C. Ho, S. Kipnis, and M. Snir. CCL: A portable and tunable collective communication library for scalable parallel computers. IEEE Transactions on Parallel and Distributed Systems, 6(2):154--164, 1995. Google ScholarDigital Library
- K. E. Batcher. Sorting networks and their applications. In AFIPS Spring Joint Computing Conference, pages 307--314, 1968. Google ScholarDigital Library
- A. Bäumker, W. Dittrich, and F. Meyer auf der Heide. Truly efficient parallel algorithms: c-optimal multisearch for an extension of the BSP model. In 3rd European Symposium on Algorithms (ESA), volume 979 of LNCS, pages 17--30. Springer, 1995. Google ScholarDigital Library
- T. Bingmann, A. Eberle, and P. Sanders. Engineering parallel string sorting. Preprint arXiv:1403.2056, 2014.Google Scholar
- G. E. Blelloch et al. A comparison of sorting algorithms for the connection machine CM-2. In 3rd ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 3--16, 1991. Google ScholarDigital Library
- G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low depth cache-oblivious algorithms. In 22nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 189--199, 2010. Google ScholarDigital Library
- S. Borkar. Exascale computing -- a fact or a fiction? Keynote presentation at IPDPS 2013, Boston, May 2013. Google ScholarDigital Library
- R. Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770--785, 1988. Google ScholarDigital Library
- R. Cole and V. Ramachandran. Resource oblivious sorting on multicores. In Automata, Languages and Programming (ICALP), volume 6198 of LNCS, pages 226--237. Springer, 2010. Google ScholarDigital Library
- Y. Feng, M. Straka, T. di Matteo, and R. Croft. MP-sort: Sorting at scale on blue waters. https://www.writelatex.com/read/sttmdgqthvyv accessed Jan 17, 2015, 2014.Google Scholar
- A. Gerbessiotis and L. Valiant. Direct bulk-synchronous parallel algorithms. Journal of Parallel and Distributed Computing, 22(2):251--267, 1994. Google ScholarDigital Library
- M. T. Goodrich. Communication-efficient parallel sorting. SIAM Journal on Computing, 29(2):416--432, 1999. Google ScholarDigital Library
- T. Hagerup and C. Rüb. Optimal merging and sorting on the EREW-PRAM. Information Processing Letters, 33:181--185, 1989. Google ScholarDigital Library
- C. A. R. Hoare. Algorithm 65 (find). Communication of the ACM, 4(7):321--322, 1961. Google ScholarDigital Library
- L. Hübschle-Schneider, P. Sanders, and I. Müller. Communication efficient algorithms for top-k selection problems. CoRR, abs/1502.03942, 2015.Google Scholar
- M. Ikkert, T. Kieritz, and P. Sanders. Parallele Algorithmen. course notes, October 2009.Google Scholar
- J. Jájá. An Introduction to Parallel Algorithms. Addison Wesley, 1992. Google ScholarDigital Library
- D. E. Knuth. The Art of Computer Programming--Sorting and Searching. Addison Wesley, 1998. Google ScholarDigital Library
- V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing. Design and Analysis of Algorithms. Benjamin/Cummings, 1994. Google ScholarDigital Library
- H. Li and K. C. Sevcik. Parallel sorting by overpartitioning. In 6th ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 46--56, Cape May, New Jersey, 1994. Google ScholarDigital Library
- K. Mehlhorn and P. Sanders. Algorithms and Data Structures -- The Basic Toolbox. Springer, 2008. Google ScholarDigital Library
- A. Rasmussen, G. Porter, M. Conley, H. V. Madhyastha, R. N. Mysore, A. Pucher, and A. Vahdat. Tritonsort: A balanced large-scale sorting system. In NSDI, 2011. Google ScholarDigital Library
- P. Sanders. Fast priority queues for cached memory. ACM Journal of Experimental Algorithmics, 5, 2000. Google ScholarDigital Library
- P. Sanders. Course on Parallel Algorithms, lecture notes, 2008. http://algo2.iti.kit.edu/sanders/courses/paralg08/.Google Scholar
- P. Sanders, S. Schlag, and I. Müller. Communication efficient algorithms for fundamental big data problems. In IEEE Int. Conf. on Big Data, 2013.Google ScholarCross Ref
- P. Sanders, J. Speck, and J. L. Tr\"aff. Two-tree algorithms for full bandwidth broadcast, reduction and scan. Parallel Computing, 35(12):581--594, 2009. Google ScholarDigital Library
- P. Sanders and J. L. Träff. The factor algorithm for regular all-to-all communication on clusters of SMP nodes. In 8th Euro-Par, number 2400, pages 799--803. Springer, 2002.Google Scholar
- P. Sanders and S. Winkel. Super scalar sample sort. In 12th European Symposium on Algorithms (ESA), volume 3221 of LNCS, pages 784--796. Springer, 2004.Google ScholarCross Ref
- J. Singler, P. Sanders, and F. Putze. MCSTL: The multi-core standard template library. In 13th Euro-Par, volume 4641 of LNCS, pages 682--694. Springer, 2007. Google ScholarDigital Library
- E. Solomonik and L. Kale. Highly scalable parallel sorting. In IEEE International Symposium on Parallel Distributed Processing (IPDPS), pages 1--12, April 2010.Google ScholarCross Ref
- L. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103--111, 1994. Google ScholarDigital Library
- P. J. Varman et al. Merging multiple lists on hierarchical-memory multiprocessors. J. Par. & Distr. Comp., 12(2):171--177, 1991. Google ScholarDigital Library
Index Terms
- Practical Massively Parallel Sorting
Recommendations
Parallel database sorting
Sorting in database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal and external sorting. Past published ...
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose ...
Bitonic Sort on a Mesh-Connected Parallel Computer
An O(n) algorithm to sort n2elements on an Illiac IV-like n × n mesh-connected processor array is presented. This algorithm sorts the n2elements into row-major order and is an adaptation of Batcher's bitonic sort. A slight modification of our algorithm ...
Comments