research-article

A Top-Down Parallel Semisort

Authors:
Yan Gu

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Julian Shun

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Yihan Sun

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Guy E. Blelloch

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and ArchitecturesJune 2015Pages 24–34https://doi.org/10.1145/2755573.2755597

Published:13 June 2015Publication History

SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures

Pages 24–34

ABSTRACT

Semisorting is the problem of reordering an input array of keys such that equal keys are contiguous but different keys are not necessarily in sorted order. Semisorting is important for collecting equal values and is widely used in practice. For example, it is the core of the MapReduce paradigm, is a key component of the database join operation, and has many other applications. We describe a (randomized) parallel algorithm for the problem that is theoretically efficient (linear work and logarithmic depth), but is designed to be more practically efficient than previous algorithms. We use ideas from the parallel integer sorting algorithm of Rajasekaran and Reif, but instead of processing bits of a integers in a reduced range in a bottom-up fashion, we process the hashed values of keys directly top-down. We implement the algorithm and experimentally show on a variety of input distributions that it outperforms a similarly-optimized radix sort on a modern 40-core machine with hyper-threading by about a factor of 1.7--1.9, and achieves a parallel speedup of up to 38x. We discuss the various optimizations used in our implementation and present an extensive experimental analysis of its performance.

References

C. Balkesen, G. Alonso, J. Teubner, and M. T. Ozsu. Main-memory hash joins on modern processor architectures. In IEEE Transactions on Knowledge and Data Engineering (TKDE), 2014.Google Scholar
H. Bast and T. Hagerup. Fast parallel space allocation, estimation and integer sorting. Information and Computation, 123(1):72--110, 1995. Google ScholarDigital Library
G. E. Blelloch, P. B. Gibbons, and H. V. Simhadri. Low depth cache-oblivious algorithms. In Proc. ACM Symp. on Parallelism in Algorithms and Architectures (SPAA), pages 189--199, 2010. Google ScholarDigital Library
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM (JACM), 46(5), Sept. 1999. Google ScholarDigital Library
R. P. Brent. The parallel evaluation of general arithmetic expressions. J. ACM (JACM), 21(2):201--206, 1974. Google ScholarDigital Library
E. F. Codd. A relational model of data for large shared data banks. Commun. ACM (CACM), 13(6):377--387, June 1970. Google ScholarDigital Library
R. Cole. Parallel merge sort. SIAM J. Comput., 17(4):770--785, 1988. Google ScholarDigital Library
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms (3. ed.). MIT Press, 2009. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM (CACM), 51(1):107--113, Jan. 2008. Google ScholarDigital Library
P. B. Gibbons, Y. Matias, and V. Ramachandran. Efficient low-contention parallel algorithms. Journal of Computer and System Sciences, 53(3):417--442, 1996. Google ScholarDigital Library
J. Gil, Y. Matias, and U. Vishkin. Towards a theory of nearly constant time parallel algorithms. In Foundations of Computer Science (FOCS), pages 698--710, 1991. Google ScholarDigital Library
W. Hasenplaugh, T. Kaler, T. B. Schardl, and C. E. Leiserson. Ordering heuristics for parallel graph coloring. In Proc. ACM Symp. on Parallelism in Algorithms and Architectures (SPAA), pages 166--177, 2014. Google ScholarDigital Library
J. Jaja. Introduction to Parallel Algorithms. Addison-Wesley Professional, 1992. Google ScholarDigital Library
C. E. Leiserson. The Cilk++ concurrency platform. The Journal of Supercomputing, 51(3):244--257, 2010. Google ScholarDigital Library
O. Polychroniou and K. A. Ross. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In Proc. ACM SIGMOD International Conference on Management of Data, pages 755--766, 2014. Google ScholarDigital Library
S. Rajasekaran and J. H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput., 18(3):594--607, 1989. Google ScholarDigital Library
J. H. Reif and S. Sen. Parallel computational geometry: An approach using randomization. In J. Sack and J. Urrutia, editors, Handbook of Computational Geometry, chapter 18, pages 765--828. Elsevier Science, 1999.Google Scholar
J. Shun and G. E. Blelloch. Phase-concurrent hash tables for determinism. In Proc. ACM Symp. on Parallelism in Algorithms and Architectures (SPAA), 2014. Google ScholarDigital Library
J. Shun, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, A. Kyrola, H. V. Simhadri, and K. Tangwongsan. Brief announcement: the Problem Based Benchmark Suite. In Proc. ACM Symp. on Parallelism in Algorithms and Architectures (SPAA), pages 68--70, 2012. Google ScholarDigital Library
J. Singler, P. Sanders, and F. Putze. Mcstl: The multi-core standard template library. In Euro-Par 2007 Parallel Processing, pages 682--694. Springer, 2007. Google ScholarDigital Library
L. G. Valiant. Handbook of theoretical computer science (vol. a). chapter General Purpose Parallel Architectures, pages 943--973. MIT Press, Cambridge, MA, USA, 1990. Google ScholarDigital Library

Index Terms

A Top-Down Parallel Semisort
1. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Sorting and searching

Recommendations

Parallel Self-Index Integer Sorting

We consider the problem of sorting n integers when the elements are drawn from the restricted domain [1… n ]. A new deterministic parallel algorithm for sorting n integers is obtained. Its running time is O(log n log( n /log n )) using n /...
Read More
A note on reducing parallel model simulations to integer sorting
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing

We show that simulating a step of a FETCH&ADD PRAM model on an EREW PRAM model can be made as efficient as integer sorting. In particular, we present several efficient reductions of the simulation problem to various integer sorting problems. By using ...
Read More
A new deterministic parallel sorting algorithm with an experimental evaluation

We introduce a new deterministic parallel sorting algorithm for distributed memory machines based on the regular sampling approach. The algorithm uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures
June 2015
362 pages
ISBN:9781450335881
DOI:10.1145/2755573
General Chair:
Guy Blelloch
Carnegie Mellon University, USA
,
Program Chair:
Kunal Agrawal
Washington University in St. Louis, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
integer sorting
parallel algorithms
semisorting
Qualifiers
- research-article
Conference

Acceptance Rates
SPAA '15 Paper Acceptance Rate31of131submissions,24%Overall Acceptance Rate447of1,461submissions,31%
More
Upcoming Conference
SPAA '24

Sponsor:

sigact

sigact

36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 255
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Top-Down Parallel Semisort

SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Parallel Self-Index Integer Sorting

A note on reducing parallel model simulations to integer sorting

A new deterministic parallel sorting algorithm with an experimental evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Top-Down Parallel Semisort

SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Parallel Self-Index Integer Sorting

A note on reducing parallel model simulations to integer sorting

A new deterministic parallel sorting algorithm with an experimental evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media