ABSTRACT
We present a new simple wait-free universal construction, called Sim, that uses just a Fetch&Add and an LL/SC object and performs a constant number of shared memory accesses. We have implemented SIM in a real shared-memory machine. In theory terms, our practical version of SIM, called P-SIM, has worse complexity than its theoretical analog; in practice though, we experimentally show that P-SIM outperforms several state-of-the-art lock-based and lock-free techniques, and this given that it is wait-free, i.e., that it satisfies a stronger progress condition than all the algorithms it outperforms.
We have used P-SIM to get highly-efficient wait-free implementations of stacks and queues. Our experiments show that our implementations outperform the currently state-of-the-art shared stack and queue implementations which ensure only weaker progress properties than wait-freedom.
- Yehuda Afek, Dalia Dauber, and Dan Touitou. Wait-free made fast. In Proceedings of the 27th ACM Symposium on Theory of Computing, pages 538--547, 1995. Google ScholarDigital Library
- Yehuda Afek, Gideon Stupp, and Dan Touitou. Long-lived adaptive collect with applications. In Proceedings of the 40th Symposium on Foundations of Computer Science, pages 262--272, 1999. Google ScholarDigital Library
- James H. Anderson and Mark Moir. Universal constructions for multi-object operations. In Proceedings of the 14th ACM Symposium on Principles of Distributed Computing, pages 184--193, 1995. Google ScholarDigital Library
- James H. Anderson and Mark Moir. Universal constructions for large objects. IEEE Transactions on Parallel and Distributed Systems, 10(12):1317--1332, dec 1999. Google ScholarDigital Library
- Hagit Attiya, Rachid Guerraoui, and Eric Ruppert. Partial snapshot objects. In Proceedings of the 20th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 336--343, 2008. Google ScholarDigital Library
- Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 117--128, 2000. Google ScholarDigital Library
- Phong Chuong, Faith Ellen, and Vijaya Ramachandran. A universal construction for wait-free transaction friendly data structures. In Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 335--344, 2010. Google ScholarDigital Library
- Pat Conway, Nathan Kalyanasundharam, Gregg Donley, Kevin Lepak, and Bill Hughes. Blade computing with the amd opteron processor (magny-cours). Hot chips 21, August 2009.Google Scholar
- T. S. Craig. Building fifo and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington, February 1993.Google Scholar
- Panagiota Fatourou and Nikolaos D. Kallimanis. The RedBlue adaptive universal constractions. In Proceedings of the 23rd International Symposium on Distributed Computing, pages 127--141, 2009. Google ScholarDigital Library
- Panagiota Fatourou and Nikolaos D. Kallimanis. Fast implementations of shared objects using fetch&add. Technical Report TR 02-2010, Department of Computer Science, University of Ioannina, February 2010.Google Scholar
- D. George S. Harvey W. Kleinfelder K. McAuliffe E. Melton V. Norton G. Pfister, W. Brantley and J. Weiss. The ibm research parallel processor prototype (rp3): Introduction and architecture. pages 764--771, 1985.Google Scholar
- P. Heidelberger, A. Norton, and John T. Robinson. Parallel quicksort using fetch-and-add. IEEE Transactions on Computers., 39(1):133--138, 1990. Google ScholarDigital Library
- Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. The code for flat combining. http://github.com/mit-carbon/flat-combining.Google Scholar
- Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pages 355--364, 2010. Google ScholarDigital Library
- Danny Hendler, Nir Shavit, and Lena Yerushalmi. A scalable lock-free stack algorithm. In Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures, pages 206--215, 2004. Google ScholarDigital Library
- Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13:124--149, jan 1991. Google ScholarDigital Library
- Maurice Herlihy. A methodology for implementing highly concurrent data objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 15(5):745--770, nov 1993. Google ScholarDigital Library
- Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), 12:463--492, 1990. Google ScholarDigital Library
- Damien Imbs and Michel Raynal. Help when needed, but no more: Efficient read/write partial snapshot. In Proceedings of the 23rd International Symposium on Distributed Computing, pages 142--156. Springer, 2009. Google ScholarDigital Library
- Prasad Jayanti. A time complexity lower bound for randomized implementations of some shared objects. In Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pages 201--210, 1998. Google ScholarDigital Library
- Peter S. Magnusson, Anders Landin, and Erik Hagersten. Queue locks on cache coherent multiprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pages 165--171, 1994. Google ScholarDigital Library
- John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991. Google ScholarDigital Library
- Maged M. Michael and Michael L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pages 267--275, 1996. Google ScholarDigital Library
- Dimitrios S. Nikolopoulos and Theodore S. Papatheodorou. A quantitative architectural evaluation of synchronization algorithms and disciplines on ccnuma systems: the case of the sgi origin2000. In Proceedings of the 13th international conference on Supercomputing (ICS '99), pages 319--328, New York, NY, USA, 1999. ACM. Google ScholarDigital Library
- Ori Shalev and Nir Shavit. Predictive log-synchronization. In EuroSys, pages 305--315, 2006. Google ScholarDigital Library
- Nir Shavit and Asaph Zemach. Combining funnels: A dynamic approach to software combining. Journal of Parallel and Distributed Computing, 60(11):1355--1387, 2000. Google ScholarDigital Library
- Gadi Taubenfeld. Synchronization Algorithms and Concurrent Programming. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2006. Google ScholarDigital Library
- R. K. Treiber. Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, April 1986.Google Scholar
- Pen-Chung Yew, Nian-Feng Tzeng, and D.H. Lawrie. Distributing hot-spot addressing in large-scale multiprocessors. IEEE Transactions on Computers, C-36(4):388 --395, April 1987. Google ScholarDigital Library
Index Terms
- A highly-efficient wait-free universal construction
Recommendations
Highly-Efficient Wait-Free Synchronization
We study a simple technique, originally presented by Herlihy (ACM Trans. Program. Lang. Syst. 15(5):745---770, 1993 ), for executing concurrently, in a wait-free manner, blocks of code that have been programmed for sequential execution and require ...
A methodology for creating fast wait-free data structures
PPOPP '12Lock-freedom is a progress guarantee that ensures overall program progress. Wait-freedom is a stronger progress guarantee that ensures the progress of each thread in the program. While many practical lock-free algorithms exist, wait-free algorithms are ...
A methodology for creating fast wait-free data structures
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingLock-freedom is a progress guarantee that ensures overall program progress. Wait-freedom is a stronger progress guarantee that ensures the progress of each thread in the program. While many practical lock-free algorithms exist, wait-free algorithms are ...
Comments