ABSTRACT
Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n \geq 2f_P + 1 processes (where f_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.
- Ittai Abraham, Gregory Chockler, Idit Keidar, and Dahlia Malkhi. Byzantine disk paxos: optimal resilience with byzantine shared memory. Distributed computing (DIST), 18(5):387--408, 2006. Google ScholarDigital Library
- Yehuda Afek, David S. Greenberg, Michael Merritt, and Gadi Taubenfeld. Computing with faulty shared memory. In ACM Symposium on Principles of Distributed Computing (PODC), pages 47--58, August 1992. Google ScholarDigital Library
- Marcos K Aguilera, Naama Ben-David, Irina Calciu, Rachid Guerraoui, Erez Petrank, and Sam Toueg. Passing messages while sharing memory. In ACM Symposium on Principles of Distributed Computing (PODC), pages 51--60. ACM, 2018. Google ScholarDigital Library
- Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Virendra Marathe, and Igor Zablotchi. The Impact of RDMA on Agreement. ArXiv preprint arXiv:1905.12143, 2019. Google ScholarDigital Library
- Noga Alon, Michael Merritt, Omer Reingold, Gadi Taubenfeld, and Rebecca N Wright. Tight bounds for shared memory systems accessed by byzantine processes. Distributed computing (DIST), 18(2):99--109, 2005. Google ScholarDigital Library
- James Aspnes and Maurice Herlihy. Fast randomized consensus using shared memory. Journal of algorithms, 11(3):441--461, 1990. Google ScholarDigital Library
- Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. Sharing memory robustly in message-passing systems. Journal of the ACM (JACM), 42(1):124--142, 1995. Google ScholarDigital Library
- Pierre-Louis Aublin, Rachid Guerraoui, Nikola Knevz ević, Vivien Quéma, and Marko Vukolić. The next 700 BFT protocols. ACM Transactions on Computer Systems (TOCS), 32(4):12, 2015. Google ScholarDigital Library
- Rida Bazzi and Gil Neiger. Optimally simulating crash failures in a byzantine environment. In International Workshop on Distributed Algorithms (WDAG), pages 108--128. Springer, 1991. Google ScholarDigital Library
- Jonathan Behrens, Ken Birman, Sagar Jha, Matthew Milano, Edward Tremel, Eugene Bagdasaryan, Theo Gkountouvas, Weijia Song, and Robbert Van Renesse. Derecho: Group communication at the speed of light. Technical report, Technical Report. Cornell University, 2016.Google Scholar
- Michael Ben-Or. Another advantage of free choice (extended abstract): Completely asynchronous agreement protocols. In ACM Symposium on Principles of Distributed Computing (PODC), pages 27--30. ACM, 1983. Google ScholarDigital Library
- Alysson Neves Bessani, Miguel Correia, Joni da Silva Fraga, and Lau Cheuk Lung. Sharing memory between byzantine processes using policy-enforced tuple spaces. IEEE Transactions on Parallel and Distributed Systems, 20(3):419--432, 2009. Google ScholarDigital Library
- Romain Boichat, Partha Dutta, Svend Frolund, and Rachid Guerraoui. Reconstructing paxos. SIGACT News, 34(2):42--57, March 2003. Google ScholarDigital Library
- Zohir Bouzid, Damien Imbs, and Michel Raynal. A necessary condition for byzantine k-set agreement. Information Processing Letters, 116(12):757--759, 2016. Google ScholarDigital Library
- Gabriel Bracha. Asynchronous byzantine agreement protocols. Information and Computation, 75(2):130--143, 1987. Google ScholarDigital Library
- Gabriel Bracha and Sam Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM (JACM), 32(4):824--840, 1985. Google ScholarDigital Library
- Francisco Brasileiro, Fab'iola Greve, Achour Mostéfaoui, and Michel Raynal. Consensus in one communication step. In International Conference on Parallel Computing Technologies, pages 42--50. Springer, 2001. Google ScholarDigital Library
- Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM (JACM), 43(2):225--267, 1996. Google ScholarDigital Library
- Byung-Gon Chun, Petros Maniatis, and Scott Shenker. Diverse replication for single-machine byzantine-fault tolerance. In USENIX Annual Technical Conference (ATC), pages 287--292, 2008. Google ScholarDigital Library
- Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatowicz. Attested append-only memory: making adversaries stick to their word. In ACM Symposium on Operating Systems Principles (SOSP), pages 189--204, 2007. Google ScholarDigital Library
- Allen Clement, Flavio Junqueira, Aniket Kate, and Rodrigo Rodrigues. On the (limited) power of non-equivocation. In ACM Symposium on Principles of Distributed Computing (PODC), pages 301--308. ACM, 2012. Google ScholarDigital Library
- Miguel Correia, Nuno Ferreira Neves, and Paulo Ver'i ssimo. How to tolerate half less one byzantine nodes in practical distributed systems. In International Symposium on Reliable Distributed Systems (SRDS), pages 174--183, 2004. Google ScholarDigital Library
- Miguel Correia, Giuliana S Veronese, and Lau Cheuk Lung. Asynchronous byzantine consensus with 2fGoogle Scholar
- 1 processes. In ACM symposium on applied computing (SAC), pages 475--480. ACM, 2010.Google Scholar
- Dan Dobre and Neeraj Suri. One-step consensus with zero-degradation. In International Conference on Dependable Systems and Networks (DSN), pages 137--146. IEEE Computer Society, 2006. Google ScholarDigital Library
- Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. FaRM: Fast remote memory. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 401--414, 2014. Google ScholarDigital Library
- Partha Dutta, Rachid Guerraoui, and Leslie Lamport. How fast can eventual synchrony lead to consensus? In International Conference on Dependable Systems and Networks (DSN), pages 22--27. IEEE, 2005. Google ScholarDigital Library
- Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM (JACM), 35(2):288--323, 1988. Google ScholarDigital Library
- Michael J Fischer, Nancy A Lynch, and Michael S Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM (JACM), 1985. Google ScholarDigital Library
- Eli Gafni and Leslie Lamport. Disk paxos. Distributed computing (DIST), 16(1):1--20, 2003. Google ScholarDigital Library
- Prasad Jayanti, Tushar Deepak Chandra, and Sam Toueg. Fault-tolerant wait-free shared objects. Journal of the ACM (JACM), 45(3):451--500, May 1998. Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David G Andersen. Using RDMA efficiently for key-value services. ACM SIGCOMM Computer Communication Review, 44(4):295--306, 2015. Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David G Andersen. FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. In USENIX Symposium on Operating System Design and Implementation (OSDI), volume 16, pages 185--201, 2016. Google ScholarDigital Library
- Anuj Kalia Michael Kaminsky and David G Andersen. Design guidelines for high performance RDMA systems. In USENIX Annual Technical Conference (ATC), page 437, 2016. Google ScholarDigital Library
- Rü diger Kapitza, Johannes Behl, Christian Cachin, Tobias Distler, Simon Kuhnle, Seyed Vahid Mohammadi, Wolfgang Schrö der-Preikschat, and Klaus Stengel. Cheapbft: resource-efficient byzantine fault tolerance. In European Conference on Computer Systems (EuroSys), pages 295--308, 2012. Google ScholarDigital Library
- Idit Keidar and Sergio Rajsbaum. On the cost of fault-tolerant consensus when there are no faults: preliminary version. ACM SIGACT News, 32(2):45--63, 2001. Google ScholarDigital Library
- Klaus Kursawe. Optimistic byzantine agreement. In International Symposium on Reliable Distributed Systems (SRDS), pages 262--267, October 2002. Google ScholarDigital Library
- Leslie Lamport. The weak byzantine generals problem. Journal of the ACM (JACM), 30(3):668--676, July 1983. Google ScholarDigital Library
- Leslie Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2):133--169, 1998. Google ScholarDigital Library
- Leslie Lamport. Fast paxos. Distributed computing (DIST), 19(2):79--103, 2006. Google ScholarDigital Library
- Leslie Lamport, Robert Shostak, and Marshall Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3):382--401, 1982. Google ScholarDigital Library
- Dahlia Malkhi, Michael Merritt, Michael K Reiter, and Gadi Taubenfeld. Objects shared by byzantine processes. Distributed computing (DIST), 16(1):37--48, 2003. Google ScholarDigital Library
- Jean-Philippe Martin and Lorenzo Alvisi. Fast byzantine consensus. IEEE Transactions on Dependable and Secure Computing (TDSC), 3(3):202--215, July 2006. Google ScholarDigital Library
- Gil Neiger and Sam Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, 11(3):374--419, 1990. Google ScholarDigital Library
- Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. Journal of the ACM (JACM), 27(2):228--234, 1980. Google ScholarDigital Library
- Marius Poke and Torsten Hoefler. DARE: High-performance state machine replication on RDMA networks. In Symposium on High-Performance Parallel and Distributed Computing (HPDC), pages 107--118. ACM, 2015. Google ScholarDigital Library
- Signe Rüsch, Ines Messadi, and Rüdiger Kapitza. Towards low-latency byzantine agreement protocols using RDMA. In IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pages 146--151. IEEE, 2018.Google ScholarCross Ref
- Yee Jiun Song and Robbert van Renesse. Bosco: One-step byzantine asynchronous consensus. In International Symposium on Distributed Computing (DISC), pages 438--450, September 2008. Google ScholarDigital Library
- Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau Cheuk Lung, and Paulo Ver'i ssimo. Efficient byzantine fault-tolerance. IEEE Trans. Computers, 62(1):16--30, 2013. Google ScholarDigital Library
- Cheng Wang, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. APUS: Fast and scalable paxos on RDMA. In Symposium on Cloud Computing (SoCC), pages 94--107. ACM, 2017. Google ScholarDigital Library
Index Terms
- The Impact of RDMA on Agreement
Recommendations
The incremental agreement
To achieve reliable distributed systems, the fault-tolerance must be studied. One of the most important problems of fault-tolerance issues lies in the Byzantine Agreement (BA) problem. The primary issue surrounding BA is that fault-free processors must ...
Revisiting fault diagnosis agreement in a new territory
In convention, to consensus has been discussed variously. The way of fault masking is commonly used to reach consensus. However, reaching consensus is not enough in a high reliability application. Therefore, in this study, the fault diagnosis agreement ...
Fully Polynomial Byzantine Agreement for Processors in Rounds
This paper presents a polynomial-time protocol for reaching Byzantine agreement in t + 1 rounds whenever n > 3t, where n is the number of processors and t is an a priori upper bound on the number of failures. This resolves an open problem presented by ...
Comments