Abstract
This paper is an algorithmic engineering study of cache-oblivious sorting. We investigate by empirical methods a number of implementation issues and parameter choices for the cache-oblivious sorting algorithm Lazy Funnelsort and compare the final algorithm with Quicksort, the established standard for comparison-based sorting, as well as with recent cache-aware proposals. The main result is a carefully implemented cache-oblivious sorting algorithm, which, our experiments show, can be faster than the best Quicksort implementation we are able to find for input sizes well within the limits of RAM. It is also at least as fast as the recent cache-aware implementations included in the test. On disk, the difference is even more pronounced regarding Quicksort and the cache-aware algorithms, whereas the algorithm is slower than a careful implementation of multiway Mergesort, such as TPIE.
- Agarwal, P. K., Arge, L., Danner, A., and Holland-Minkley, B. 2003. Cache-oblivious data structures for orthogonal range searching. In Proc. 19th ACM Symposium on Computational Geometry. ACM, New York. 237--245. Google ScholarDigital Library
- Aggarwal, A. and Vitter, J. S. 1988. The input/output complexity of sorting and related problems. Communications of the ACM 31, 9, 1116--1127.Google ScholarDigital Library
- Arge, L. 2001. External memory data structures. In Proc. 9th Annual European Symposium on Algorithms. LNCS, vol. 2161. Springer, New York. 1--29. Google ScholarDigital Library
- Arge, L., Bender, M. A., Demaine, E. D., Holland-Minkley, B., and Munro, J. I. 2002a. Cache-oblivious priority queue and graph algorithm applications. In Proc. 34th Annual ACM Symposium on Theory of Computing. ACM, New York. 268--276. Google ScholarDigital Library
- Arge, L., Chase, J., Vitter, J. S., and Wickremesinghe, R. 2002b. Efficient sorting using registers and caches. ACM Journal of Experimental Algorithmics 7, 9. Google ScholarDigital Library
- Arge, L., Brodal, G. S., and Fagerberg, R. 2005a. Cache-oblivious data structures. In Handbook of Data Structures and Applications, D. Mehta and S. Sahni, Eds. CRC Press, Boca Ratom, FL. Chapter 34.Google Scholar
- Arge, L., Brodal, G. S., Fagerberg, R., and Laustsen, M. 2005b. Cache-oblivious planar orthogonal range searching and counting. In Proc. 21st Annual ACM Symposium on Computational Geometry. ACM, New York. 160--169. Google ScholarDigital Library
- Arge, L., de Berg, M., and Haverkort, H. J. 2005c. Cache-oblivious R-trees. In Proc. 21st Annual ACM Symposium on Computational Geometry. ACM, New York. 170--179. Google ScholarDigital Library
- Bayer, R. and McCreight, E. 1972. Organization and maintenance of large ordered indexes. Acta Informatica 1, 173--189.Google ScholarDigital Library
- Bender, M., Cole, R., Demaine, E., and Farach-Colton, M. 2002a. Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In Proc. 10th Annual European Symposium on Algorithms. LNCS, vol. 2461. Springer, New York. 139--151. Google ScholarDigital Library
- Bender, M., Cole, R., and Raman, R. 2002b. Exponential structures for cache-oblivious algorithms. In Proc. 29th International Colloquium on Automata, Languages, and Programming. LNCS, vol. 2380. Springer, New York. 195--207. Google ScholarDigital Library
- Bender, M., Demaine, E., and Farach-Colton, M. 2002c. Efficient tree layout in a multilevel memory hierarchy. In Proc. 10th Annual European Symposium on Algorithms. LNCS, vol. 2461. Springer, New York. 165--173. Full version at http://arxiv.org/abs/cs/0211010. Google ScholarDigital Library
- Bender, M. A., Demaine, E., and Farach-Colton, M. 2000. Cache-oblivious B-trees. In Proc. 41st Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Washington D.C. 399--409. Google ScholarDigital Library
- Bender, M. A., Duan, Z., Iacono, J., and Wu, J. 2002d. A locality-preserving cache-oblivious dynamic dictionary. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, New York. 29--39. Google ScholarDigital Library
- Bender, M. A., Brodal, G. S., Fagerberg, R., Ge, D., He, S., Hu, H., Iacono, J., and López-Ortiz, A. 2003. The cost of cache-oblivious searching. In Proc. 44th Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Washington D.C. 271--282. Google ScholarDigital Library
- Bender, M. A., Fineman, J. T., Gilbert, S., and Kuszmaul, B. C. 2005. Concurrent cache-oblivious B-trees. In Proc. 17th Annual ACM Symposium on Parallel Algorithms. ACM, New York. 228--237. Google ScholarDigital Library
- Bender, M. A., Farach-Colton, M., and Kuszmaul, B. C. 2006. Cache-oblivious string B-trees. In Proc. 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York. 233--242. Google ScholarDigital Library
- Bentley, J. L. and McIlroy, M. D. 1993. Engineering a sort function. Software--Practice and Experience 23, 1, 1249--1265.Google ScholarDigital Library
- Brodal, G. S. 2004. Cache-oblivious algorithms and data structures. In Proc. 9th Scandinavian Workshop on Algorithm Theory. LNCS, vol. 3111. Springer, New York. 3--13.Google ScholarCross Ref
- Brodal, G. S. and Fagerberg, R. 2002a. Cache oblivious distribution sweeping. In Proc. 29th International Colloquium on Automata, Languages, and Programming. LNCS, vol. 2380. Springer, New York. 426--438. Google ScholarDigital Library
- Brodal, G. S. and Fagerberg, R. 2002b. Funnel heap—a cache-oblivious priority queue. In Proc. 13th Annual International Symposium on Algorithms and Computation. LNCS, vol. 2518. Springer, New York. 219--228. Google ScholarDigital Library
- Brodal, G. S. and Fagerberg, R. 2003. On the limits of cache-obliviousness. In Proc. 35th Annual ACM Symposium on Theory of Computing. ACM, New York. 307--315. Google ScholarDigital Library
- Brodal, G. S. and Fagerberg, R. 2006. Cache-oblivious string dictionaries. In Proc. 17th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, New York. 581--590. Google ScholarDigital Library
- Brodal, G. S., Fagerberg, R., and Jacob, R. 2002c. Cache oblivious search trees via binary trees of small height. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, New York. 39--48. Google ScholarDigital Library
- Brodal, G. S., Fagerberg, R., Meyer, U., and Zeh, N. 2004. Cache-oblivious data structures and algorithms for undirected breadth-first search and shortest paths. In Proc. 9th Scandinavian Workshop on Algorithm Theory. LNCS, vol. 3111. Springer, New York. 480--492.Google Scholar
- Brodal, G. S., Fagerberg, R., and Moruz, G. 2005. Cache-aware and cache-oblivious adaptive sorting. In Proc. 32nd International Colloquium on Automata, Languages, and Programming. LNCS, vol. 3580. Springer, New York. 576--588. Google ScholarDigital Library
- Chowdhury, R. A. and Ramachandran, V. 2004. Cache-oblivious shortest paths in graphs using buffer heap. In Proc. 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York. Google ScholarDigital Library
- Chowdhury, R. A. and Ramachandran, V. 2006. Cache-oblivious dynamic programming. In Proc. 17th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, New York. 591--600. Google ScholarDigital Library
- Department of Computer Science, Duke University. 2002. TPIE: a transparent parallel I/O environment. WWW page, http://www.cs.duke.edu/TPIE/.Google Scholar
- Fagerberg, R., Pagh, A., and Pagh, R. 2006. External string sorting: Faster and cache-oblivious. In Proc. 23rd Annual Symposium on Theoretical Aspects of Computer Science. LNCS, vol. 3884. Springer, New York. 68--79. Google ScholarDigital Library
- Farzan, A., Ferragina, P., Franceschini, G., and Munro, J. I. 2005. Cache-oblivious comparison-based algorithms on multisets. In Proc. 13th Annual European Symposium on Algorithms. LNCS, vol. 3669. Springer, New York. 305--316. Google ScholarDigital Library
- Franceschini, G. 2004. Proximity mergesort: Optimal in-place sorting in the cache-oblivious model. In Proc. 15th Annual ACM-SIAM Symposium on Discrete Algorithms. ACM-SIAM, New York. 291--299. Google ScholarDigital Library
- Franceschini, G. and Grossi, R. 2003a. Optimal cache-oblivious implicit dictionaries. In Proc. 30th International Colloquium on Automata, Languages, and Programming. LNCS, vol. 2719. Springer, New York. 316--331. Google ScholarDigital Library
- Franceschini, G. and Grossi, R. 2003b. Optimal worst-case operations for implicit cache-oblivious search trees. In Proc. 8th International Workshop on Algorithms and Data Structures. LNCS, vol. 2748. Springer, New York. 114--126.Google Scholar
- Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. 1999. Cache-oblivious algorithms. In Proc. 40th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Washington D.C. 285--297. Google ScholarDigital Library
- Gray, J. 2003. Sort benchmark home page. WWW page, http://research.microsoft.com/barc/SortBenchmark/.Google Scholar
- Hwang, F. K. and Lin, S. 1972. A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal on Computing 1, 1, 31--39.Google ScholarCross Ref
- Jampala, H. and Zeh, N. 2005. Cache-oblivious planar shortest paths. In Proc. 32nd International Colloquium on Automata, Languages, and Programming. LNCS, vol. 3580. Springer, New York. 563--575. Google ScholarDigital Library
- Knuth, D. E. 1998. The Art of Computer Programming, Vol 3, Sorting and Searching, 2nd ed. Addison-Wesley, Reading, MA. Google ScholarDigital Library
- Ladner, R. E., Fortna, R., and Nguyen, B.-H. 2002. A comparison of cache aware and cache oblivious static search trees using program instrumentation. In Experimental Algorithmics. LNCS, vol. 2547. Springer, New York. 78--92. Google ScholarDigital Library
- LaMarca, A. and Ladner, R. E. 1999. The influence of caches on the performance of sorting. Journal of Algorithms 31, 66--104. Google ScholarDigital Library
- Prokop, H. 1999. Cache-oblivious algorithms. M.S. thesis, Massachusetts Institute of Technology.Google Scholar
- Rahman, N., Cole, R., and Raman, R. 2001. Optimised predecessor data structures for internal memory. In Proc. 5th International Workshop on Algorithm Engineering. LNCS 2141, 67--78. Google ScholarDigital Library
- Sanders, P. 2000. Fast priority queues for cached memory. ACM Journal of Experimental Algorithmics 5, 7. Google ScholarDigital Library
- Sedgewick, R. 1998. Algorithms in C++: Parts 1--4: Fundamentals, Data Structures, Sorting, Searching, third ed. Addison-Wesley, Reading, MA. Code available at http://www.cs.princeton.edu/~rs/Algs3.cxx1-4/code.txt. Google ScholarDigital Library
- Vinther, K. 2003. Engineering cache-oblivious sorting algorithms. M.S. thesis, Department of Computer Science, University of Aarhus, Denmark. Available online at http://kristoffer.vinther.name/academia/thesis/.Google Scholar
- Vitter, J. S. 2001. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys 33, 2, 209--271. Google ScholarDigital Library
- Williams, J. W. J. 1964. Algorithm 232: Heapsort. Communications of the ACM 7, 347--348.Google Scholar
- Xiao, L., Zhang, X., and Kubricht, S. A. 2000. Improving memory performance of sorting algorithms. ACM Journal of Experimental Algorithmics 5, 3. Google ScholarDigital Library
Index Terms
- Engineering a cache-oblivious sorting algorithm
Recommendations
Low depth cache-oblivious algorithms
SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architecturesIn this paper we explore a simple and general approach for developing parallel algorithms that lead to good cache complexity on parallel machines with private or shared caches. The approach is to design nested-parallel algorithms that have low depth (...
Sorting on STAR
This paper gives timing comparisons for three sorting algorithms written for the CDC STAR computer. One algorithm is Hoare's Quicksort, which is the fastest or nearly the fastest sorting algorithm for most computers. A second algorithm is a vector ...
Improving memory performance of sorting algorithms
Memory hierarchy considerations during sorting algorithm design and implementation play an important role in significantly improving execution performance. Existing algorithms mainly attempt to reduce capacity misses on direct-mapped caches. To reduce ...
Comments