Skip to main content
Log in

A load-balanced parallel sorting algorithm for shared-nothing architectures

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

With the popularity of parallel database machines based on the shared-nothing architecture, it has become important to find external sorting algorithms which lead to a load-balanced computation, i.e., balanced execution, communication and output. If during the course of the sorting algorithm each processor is equally loaded, parallelism is fully exploited. Similarly, balanced communication will not congest the network traffic. Since sorting can be used to support a number of other relational operations (joins, duplicate elimination, building indexes etc.) data skew produced by sorting can further lead to execution skew at later stages of these operations. In this paper we present a load-balanced parallel sorting algorithm for shared-nothing architectures. It is a multiple-input multiple-output algorithm with four stages, based on a generalization of Batcher's odd-even merge. At each stage then keys are evenly distributed among thep processors (i.e., there is no final sequential merge phase) and the distribution of keys between stages ensures against network congestion. There is no assumption made on the key distribution and the algorithm performs equally well in the presence of duplicate keys. Hence our approach always guarantees its performance, as long asn is greater thanp 3, which is the case of interest for sorting large relations. In addition, processors can be added incrementally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S.G. Akl and N. Santroo, “Optimal Parallel Merging and Sorting Without Memory Conflicts,”IEEE Trans. on Computers, pp. 1367–69, 1987.

  2. K.E. Batcher, “Sorting Networks and their Applications,”Proc. AFIPS 1968Spring Joint Computer Conf., Vol. 32, AFIPS Press, pp. 307–314, 1968.

    Google Scholar 

  3. B. Baugsto and J. Greipsland, “Parallel Sorting Methods for Large Data on a Hypercube Database Computer,”Proc. of the 6th Intern. Workshop on Database Machines, Springer Verlag, pp. 127–141, 1989.

  4. D. Bitton, D.J. Dewitt, D.K. Hsiao, and J. Menon, “A taxonomy of parallel sorting,”ACM Computing Surveys Vol. 16, No. 3, pp. 287–318, 1984.

    Google Scholar 

  5. G.E. Blelloch, C.E. Leiserson, B.M. Maggs, C.G. Plaxton, S.J. Smith, and M. Zagha, “A comparison of sorting algorithms for the connection machine CM-2,”Proc. ACM SPAA Conf., pp. 3–16, 1991.

  6. T.H. Cormen, C.E. Leiserson, and R.L. Rivest,Introduction to Algorithms, The MIT Press, McGraw-Hill Book Company, 1990.

  7. R. Cypher and J.L.C. Sanz, “Cubesort: A Parallel Algorithm for SortingN Data Items withS-Sorters,”Journal of Algorithms, pp. 211–234, 1992.

  8. D.J. Dewitt and J. Gray, “Parallel Database Systems: The Future of Database Processing or a Passing Fad?”Sigmod Record Vol. 19, No. 4, pp. 104–112, 1990.

    Google Scholar 

  9. D.J. Dewitt and J. Gray, “Parallel databases systems: the future of high performance database systems,”Comm. of the ACM Vol. 35, No. 6, pp. 85–98, 1992.

    Google Scholar 

  10. D.J. Dewitt, S. Ghandeharizadeh, D. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen, “The Gamma database machine project,”IEEE TKDE, Vol. 2, No. 1, March 1990.

  11. D.J. Dewitt, R.H. Katz, F. Olken, L.D. Shapiro, M.R. Stonebraker, and D. Wood, “Implementation Techniques for Main Memory Database Systems,”Proc. ACM SIGMOD Conf., pp. 1–8, 1984.

  12. D. J. Dewitt, J.F. Naughton, and D.A. Schneider, “Parallel sorting on a shared-nothing architecture using probabilistic splitting,”Proc. of the Paral. and Distr. Inform. Systems Conf., pp. 280–291, 1991.

  13. R. Elmasri and S.B. Navathe,Fundamental of Database Systems Benjamin/Cummings, Menlo Park, CA, 1989.

    Google Scholar 

  14. J. Gray, M. Stewart, A. Tsukerman, S. Uren, and B. Vaughan, “FASTSORT: an external sort using parallel processing,”Tandem Systems Review Vol. 2, No. 3, pp. 57–72, Dec. 1986.

    Google Scholar 

  15. J.S. Huang and Y.C. Chow, “Parallel Sorting and Data Partitioning by Sampling,”Proc. of 7th COMSAC, pp. 627–631, 1983.

  16. B.R. Iyer, G.R. Ricard, and P.J. Varman, “Percentile finding algorithm for multiple sorted runs,”Proc. of the 15th VLDB, pp. 135–144, 1989.

  17. D.E. Knuth,The Art of Computer Programming, Vol. 3, Addison-Wesley, Reading, MA, 1973.

    Google Scholar 

  18. H.F. Korth and A. Silberschatz,Database System Concepts, 2nd ed., McGraw-Hill, 1991.

  19. T.T. Lee, “A generalized approach to distributed parallel sorting,” Technical Report CATT 93-19, March 1993, Polytechnic University; also appears inJournal of Parallel and Distributed Computing, Vol. 21, No. 3, May 1994.

  20. T. Leighton, “Tight bounds on the complexity of parallel sorting,”IEEE Trans. on Computers, No. 4, pp. 344–354, April 1985.

  21. T. Leighton,Introduction to Parallel Algorithms and Architectures Morgan Kaufman Publishers, San Mateo, CA, 1992.

    Google Scholar 

  22. P. Mishra and M.M. Eich, “Join Processing in Relational Databases,”ACM Computing Surveys Vol. 24, No. 1, pp. 63–113, March 1992.

    Google Scholar 

  23. B. Salzberg, A. Tsukerman, J. Gray, S. Uern, and B. Vaughan, “FastSort: A Distributed Single-Input Single-Output External Sort,”Proc. ACM SIGMOD Conf., pp. 88–101, 1990.

  24. D.A. Schneider and D.J. Dewitt, “A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment,”Proc. ACM SIGMOD Conf., pp. 110–121, 1989.

  25. S. Seshadri and J.F. Naughton, “Sampling Issues in Parallel Database Systems,”Proc. of EDBT, pp. 328–343, 1992.

  26. P. Valduriez and G. Gardarin, “Join and Semijoin Algorithms for a Multiprocessor Database Machine,”ACM Trans. on Database Systems Vol. 9, No. 1, pp. 133–161, March 1984.

    Google Scholar 

  27. P.J. Varman, B.R. Iyer, and D.J. Haderle, “Parallel Merge on an Arbitrary Number of Processors,”IBM Research Report RJ-6632, San Jose, CA, Dec. 1988.

  28. C.B. Walton, A.G. Date, and R.M. Jenevein, “A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins,”Proc. of the 17th VLDB, pp. 537–548, 1991.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Recommended by: Patrick Valduriez

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, A., Lee, T.T. & Tsotras, V.J. A load-balanced parallel sorting algorithm for shared-nothing architectures. Distrib Parallel Databases 3, 37–68 (1995). https://doi.org/10.1007/BF01263656

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01263656

Keywords

Navigation