research-article

Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol

Authors:
Antonios Katsarakis

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Vasilis Gavrielatos

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
M.R. Siavash Katebzadeh

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Arpit Joshi

Intel, Portland, OR, USA

Intel, Portland, OR, USA
View Profile

,
Aleksandar Dragojevic

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

,
Boris Grot

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Vijay Nagarajan

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating SystemsMarch 2020Pages 201–217https://doi.org/10.1145/3373376.3378496

Published:13 March 2020Publication History

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 201–217

ABSTRACT

Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency.

This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.

References

Atul Adya, Daniel Myers, Jon Howell, Jeremy Elson, Colin Meek, Vishesh Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, Roberto Peon, Larry Kai, Alexander Shraer, Arif Merchant, and Kfir Lev-Ari. 2016. Slicer: Auto-sharding for Datacenter Applications. In Proceedings of the 12th Conference on Operating Systems Design and Implementation (OSDI'16). USENIX, USA, 739--753.Google Scholar
Marcos Aguilera, Carole Gallet, Hugues Fauconnier, and Sam Toueg. 2000. Thrifty Generic Broadcast. In Proceedings of the 14th Conference on Distributed Computing (DISC '00). ., UK, 268--282.Google Scholar
Marcos Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. SIGOPS Oper. Syst. Rev. , Vol. 41, 6 (2007), 159--174. https://doi.org/10.1145/1323293.1294278Google ScholarDigital Library
Sérgio Almeida, Jo ao Leit ao, and Lu'is Rodrigues. 2013. ChainReaction: A CausalGoogle Scholar
Consistent Datastore Based on Chain Replication. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 85--98. https://doi.org/10.1145/2465351.2465361Google Scholar
Peter Alsberg and John Day. 1976. A Principle for Resilient Sharing of Distributed Resources. In Proceedings of the 2nd International Conference on Software Engineering (ICSE '76). IEEE, USA, 562--570.Google ScholarDigital Library
Yair Amir, Louise Moser, Peter Melliar, Deborah Agarwal, and Paul Ciarfella. 1995. The Totem Single-ring Ordering and Membership Protocol. ACM Trans. Comput. Syst. , Vol. 13, 4 (Nov. 1995), 311--342. https://doi.org/10.1145/210223.210224Google ScholarDigital Library
Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, and Ali R. Butt. 2018. bespoKV: Application Tailored Scale-out Key-value Stores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 2, bibinfonumpages16 pages.Google Scholar
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-value Store . SIGMETRICS Perform. Eval. Rev. , Vol. 40, 1 (June 2012), 53--64. https://doi.org/10.1145/2318857.2254766Google ScholarDigital Library
Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1995. Sharing Memory Robustly in Message-passing Systems . J. ACM , Vol. 42, 1 (1995), 124--142. https://doi.org/10.1145/200836.200869Google ScholarDigital Library
Hagit Attiya and Jennifer Welch. 1994. Sequential Consistency versus Linearizability. ACM Trans. Comput. Syst. , Vol. 12, 2 (May 1994), 91--122. https://doi.org/10.1145/176575.176576Google ScholarDigital Library
Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In Proceedings of the Conference on Innovative Data system Research (CIDR) . ., Asilomar, CA, 223--234.Google Scholar
Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, and John D. Davis. 2012. CORFU: A Shared Log Design for Flash Clusters. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 1--1.Google ScholarDigital Library
Dotan Barak. 2013. Tips and tricks to optimize your RDMA code . https://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/. (Accessed on 13/08/2019).Google Scholar
Dotan Barak. 2015. RDMA Aware Networks Programming User Manual .Google Scholar
Luiz Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The datacenter as a computer: Designing warehouse-scale machines. Synthesis Lectures on Computer Architecture , Vol. 13, 3 (2018), i--189.Google ScholarCross Ref
Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. 2017. Attack of the Killer Microseconds. Commun. ACM , Vol. 60, 4 (2017), 48--54. https://doi.org/10.1145/3015146Google ScholarDigital Library
Jonathan Behrens, Ken Birman, Sagar Jha, Matthew Milano, Edward Tremel, Eugene Bagdasaryan, Theo Gkountouvas, Weijia Song, and Robbert Van Renesse. 2016. Derecho: Group Communication at the Speed of Light . Technical Report. Cornell University.Google Scholar
Ken Birman and Thomas Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP '87). ACM, USA, 123--138. https://doi.org/10.1145/41457.37515Google ScholarDigital Library
William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos Replicated State Machines As the Basis of a High-performance Data Store. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). USENIX Association, USA, 141--154.Google ScholarDigital Library
Fábio Botelho, Fernando Ramos, Diego Kreutz, and Alysson Bessani. 2013. On the Feasibility of a Consistent and Fault-Tolerant Data Store for SDNs. In Proceedings of the 2013 Second European Workshop on Software Defined Networks (EWSDN '13). IEEE, USA, 38--43. https://doi.org/10.1109/EWSDN.2013.13Google ScholarDigital Library
Eric Brewer. 2000. Towards Robust Distributed Systems. In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing (PODC '00). ACM, USA, 7--. https://doi.org/10.1145/343477.343502Google ScholarDigital Library
Eric Brewer. 2012. CAP twelve years later: How the" rules" have changed . Computer , Vol. 45, 2 (2012), 23--29.Google ScholarDigital Library
Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the 2013 Conference on Annual Technical Conference (ATC'13). USENIX, Berkeley, 49--60.Google Scholar
Mike Burrows. 2006. The Chubby Lock Service for Loosely-coupled Distributed Systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7 (OSDI '06). USENIX Association, USA, 24--24.Google Scholar
Tushar Chandra, Vassos Hadzilacos, and Sam Toueg. 2016. An Algorithm for Replicated Objects with Efficient Reads. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC '16). ACM, New York, NY, USA, 325--334. https://doi.org/10.1145/2933057.2933111Google ScholarDigital Library
Tushar Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM , Vol. 43, 2 (1996), 225--267.Google ScholarDigital Library
Kelly Clay. 2013. Amazon.com Goes Down, Loses $66,240 Per Minute. https://www.forbes.com/sites/kellyclay/2013/08/19/amazon-com-goes-down-loses-66240-per-minute/#4e849f8b495c . (Accessed on 13/08/2019).Google Scholar
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154. https://doi.org/10.1145/1807128.1807152Google ScholarDigital Library
James Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google's Globally Distributed Database. ACM Trans. Comput. Syst. , Vol. 31, 3 (2013), 22. https://doi.org/10.1145/2491245Google ScholarCross Ref
Huynh Tu Dang, Daniele Sciascia, Marco Canini, Fernando Pedone, and Robert Soulé. 2015. NetPaxos: Consensus at Network Speed. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research (SOSR '15). ACM, New York, Article 5, bibinfonumpages7 pages. https://doi.org/10.1145/2774993.2774999Google ScholarDigital Library
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store . SIGOPS Oper. Sys. , Vol. 41, 6 (2007), 5--20. https://doi.org/10.1145/1323293.1294281Google Scholar
Aleksandar Dragojević , Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 401--414.Google Scholar
Aleksandar Dragojević , Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In Proceedings of the Symposium on Operating Systems Principles (SOSP '15). ACM, New York, 54--70. https://doi.org/10.1145/2815400.2815425Google ScholarDigital Library
Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the Presence of Partial Synchrony. J. ACM , Vol. 35, 2 (1988), 288--323. https://doi.org/10.1145/42282.42283Google ScholarDigital Library
Niklas Ekströ m and Seif Haridi. 2016. A Fault-Tolerant Sequentially Consistent DSM With a Compositional Correctness Proof .arxiv: 1608.02442Google Scholar
Nathan Farrington. 2009. Multipath TCP under Massive Packet Reordering.Google Scholar
Vasilis Gavrielatos, Antonios Katsarakis, Arpit Joshi, Nicolai Oswald, Boris Grot, and Vijay Nagarajan. 2018. Scale-out ccNUMA: Exploiting Skew with Strongly Consistent Caching. In Proceedings of the EuroSys Conference (EuroSys '18). ACM, USA, Article 21, bibinfonumpages15 pages. https://doi.org/10.1145/3190508.3190550Google ScholarDigital Library
Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. Acm Sigact News , Vol. 33, 2 (2002), 51--59.Google ScholarDigital Library
Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. 2011. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM '11). ACM, USA, 350--361. https://doi.org/10.1145/2018436.2018477Google ScholarDigital Library
Jim Gray. 1978. Notes on Data Base Operating Systems. In Operating Systems, An Advanced Course . Springer-Verlag, London, UK, 393--481.Google ScholarDigital Library
Rachid Guerraoui. 2002. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distributed Computing , Vol. 15, 1 (2002), 17--25. https://doi.org/10.1007/s446-002--8027--4Google ScholarDigital Library
Rachid Guerraoui, Dejan Kostic, Ron R. Levy, and Vivien Quema. 2007. A High Throughput Atomic Storage Algorithm. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS '07). IEEE Computer Society, Washington, DC, USA, 19--. https://doi.org/10.1109/ICDCS.2007.80Google ScholarDigital Library
Rachid Guerraoui, Mikel Larrea, and André Schiper. 1995. Non Blocking Atomic Commitment with an Unreliable Failure Detector. In Proceedings of the 14TH Symposium on Reliable Distributed Systems (SRDS '95). IEEE Computer Society, Washington, DC, USA, 41--.Google ScholarDigital Library
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, USA, 202--215. https://doi.org/10.1145/2934872.2934908Google ScholarDigital Library
Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., USA.Google ScholarDigital Library
Maurice Herlihy and Jeannette Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects . ACM Trans. Program. Lang. Syst. , Vol. 12, 3 (July 1990), 463--492. https://doi.org/10.1145/78969.78972Google ScholarDigital Library
Heidi Howard. 2019. Distributed consensus revised (Thesis).Google Scholar
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC'10). USENIX Association, Berkeley, CA, USA, 11--11.Google Scholar
Zsolt István, David Sidler, Gustavo Alonso, and Marko Vukolic. 2016. Consensus in a Box: Inexpensive Coordination in Hardware. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16). USENIX, USA, 425--438.Google ScholarDigital Library
Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert Van Renesse, Sydney Zink, and Kenneth P. Birman. 2019. Derecho: Fast State Machine Replication for Cloud Services. Trans. Comput. Syst. , Vol. 36, 2, Article 4 (2019), bibinfonumpages49 pages. https://doi.org/10.1145/3302258Google Scholar
Ricardo Jiménez-Peris, M. Pati no Mart'inez, Gustavo Alonso, and Bettina Kemme. 2003. Are Quorums an Alternative for Data Replication? ACM Trans. Database Syst. , Vol. 28, 3 (Sept. 2003), 257--294. https://doi.org/10.1145/937598.937601Google ScholarDigital Library
Xin Jin, Xiaozhou Li, Haoyu Zhang, Nate Foster, Jeongkeun Lee, Robert Soulé , Changhoon Kim, and Ion Stoica. 2018. NetChain: Scale-Free Sub-RTT Coordination. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX , USA, 35--49.Google Scholar
Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance Broadcast for Primary-backup Systems. In Proceedings of the IEEE 41st International Conference on Dependable Systems&Networks (DSN '11). IEEE, USA, 245--256. https://doi.org/10.1109/DSN.2011.5958223Google ScholarDigital Library
Gopal Kakivaya, Lu Xun, Richard Hasha, Shegufta Bakht Ahsan, Todd Pfleiger, Rishi Sinha, Anurag Gupta, Mihail Tarta, Mark Fussell, Vipul Modi, Mansoor Mohsin, Ray Kong, Anmol Ahuja, Oana Platon, Alex Wun, Matthew Snider, Chacko Daniel, Dan Mastrian, Yang Li, Aprameya Rao, Vaishnav Kidambi, Randy Wang, Abhishek Ram, Sumukh Shivaprakash, Rajeet Nair, Alan Warwick, Bharat S. Narasimman, Meng Lin, Jeffrey Chen, Abhay Balkrishna Mhatre, Preetha Subbarayalu, Mert Coskun, and Indranil Gupta. 2018. Service Fabric: A Distributed Platform for Building Microservices in the Cloud. In Proceedings of the EuroSys Conference (EuroSys '18). ACM, USA, 1--15. https://doi.org/10.1145/3190508.3190546Google ScholarDigital Library
Anuj Kalia, Michael Kaminsky, and David Andersen. 2014. Using RDMA Efficiently for Key-value Services . SIGCOMM Comput. Commun. Rev. , Vol. 44, 4 (Aug. 2014), 295--306. https://doi.org/10.1145/2740070.2626299Google ScholarDigital Library
Anuj Kalia, Michael Kaminsky, and David Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 437--450.Google Scholar
Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-data Center Consistency. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 113--126. https://doi.org/10.1145/2465351.2465363Google ScholarDigital Library
H. T. Kung, Trevor Blackwell, and Alan Chapman. 1994. Credit-based Flow Control for ATM Networks: Credit Update Protocol, Adaptive Credit Allocation and Statistical Multiplexing . In Proceedings of the Conference on Communications Architectures, Protocols and Applications (SIGCOMM '94). ACM, New York, NY, USA, 101--114. https://doi.org/10.1145/190314.190324Google ScholarDigital Library
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System . SIGOPS Oper. Sys. , Vol. 44, 2 (2010), 35--40. https://doi.org/10.1145/1773912.1773922Google ScholarDigital Library
Christoph Lameter. 2005. Effective synchronization on Linux/NUMA systems.Google Scholar
Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM , Vol. 21, 7 (1978), 558--565.Google ScholarDigital Library
Leslie Lamport. 1994. The temporal logic of actions. Transactions on Programming Languages and Systems (TOPLAS) , Vol. 16, 3 (1994), 872--923.Google ScholarDigital Library
Leslie Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems (TOCS) , Vol. 16, 2 (1998), 133--169.Google ScholarDigital Library
Leslie Lamport. 2005. Generalized consensus and Paxos .Google Scholar
Leslie Lamport. 2006. Fast Paxos. Distributed Computing , Vol. 19, 2 (2006), 79--103. https://doi.org/10.1007/s00446-006-0005-xGoogle ScholarDigital Library
Leslie Lamport et almbox. 2001. Paxos made simple. ACM Sigact News , Vol. 32, 4 (2001), 18--25.Google Scholar
Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2009. Vertical Paxos and Primary-backup Replication. In Proceedings of the Symposium on Principles of Distributed Computing (PODC '09). ACM, USA, 312--313. https://doi.org/10.1145/1582716.1582783Google ScholarDigital Library
Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, and Dan R. K. Ports. 2016. Just Say No to Paxos Overhead: Replacing Consensus with Network Ordering. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 467--483.Google ScholarDigital Library
Hyeontaek Lim, Dongsu Han, David Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-memory Key-value Storage. In Proceedings of the 11th Networked Systems Design and Implementation (NSDI'14). USENIX Association, USA, 429--444.Google Scholar
Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited.Google Scholar
Wyatt Lloyd, Michael Freedman, Michael Kaminsky, and David Andersen. 2011. Don't Settle for Eventual: Scalable Causal Consistency for Wide-area Storage with COPS. In Proceedings of the 23rd Symposium on Operating Systems Principles (SOSP '11). ACM, USA, 401--416. https://doi.org/10.1145/2043556.2043593Google ScholarDigital Library
Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, Enhong Chen, and Thomas Moscibroda. 2018. Multi-Path Transport for RDMA in Datacenters. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, USA, 357--371.Google ScholarDigital Library
Nancy Lynch and Alexander Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. , bibinfonumpages272--281 pages. https://doi.org/10.1109/FTCS.1997.614100Google Scholar
Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. 2008. Mencius: Building Efficient Replicated State Machines for WANs. In Proceedings of the 8th Conference on Operating Systems Design and Implementation (OSDI'08). USENIX, Berkeley, CA, USA, 369--384.Google Scholar
Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2011. High Performance State-machine Replication. In Proceedings of the 41st International Conference on Dependable Systems&Networks (DSN '11). IEEE Computer Society, USA, 454--465. https://doi.org/10.1109/DSN.2011.5958258Google ScholarDigital Library
Parisa Jalili Marandi, Marco Primi, Nicolas Schiper, and Fernando Pedone. 2010. Ring Paxos: A high-throughput atomic broadcast protocol. In 2010 International Conference on Dependable Systems Networks. ., USA, 527--536. https://doi.org/10.1109/DSN.2010.5544272Google ScholarCross Ref
Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: A Microkernel Approach to Host Networking. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). ACM, USA, 399--413. https://doi.org/10.1145/3341301.3359657Google ScholarDigital Library
Iulian Moraru, David Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments. In Proceedings of the 24th Symposium on Operating Systems Principles (SOSP '13). ACM, USA, 358--372. https://doi.org/10.1145/2517349.2517350Google ScholarDigital Library
Iulian Moraru, David Andersen, and Michael Kaminsky. 2014. Paxos Quorum Leases: Fast Reads Without Sacrificing Writes. In Proceedings of the Symposium on Cloud Computing (SOCC '14). ACM, USA, 1--13. https://doi.org/10.1145/2670979.2671001Google ScholarDigital Library
Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, and Yutaka Suzue. 2012. Flat Datacenter Storage. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12) . USENIX, Hollywood, CA, 1--15.Google Scholar
Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2016. The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC '16). ACM, USA, 182--195. https://doi.org/10.1145/2987550.2987577Google ScholarDigital Library
Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems. In Proceedings of the Seventh Symposium on Principles of Distributed Computing (PODC '88). ACM, USA, 8--17. https://doi.org/10.1145/62546.62549Google Scholar
Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC'14). USENIX, USA, 305--320.Google Scholar
Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 29--41. https://doi.org/10.1145/2043556.2043560Google ScholarDigital Library
Seo Jin Park and John Ousterhout. 2019. Exploiting Commutativity for Practical Fast Replication. In Proceedings of the 16th Conference on Networked Systems Design and Implementation (NSDI'19). USENIX, USA, 47--64.Google ScholarDigital Library
Marius Poke and Torsten Hoefler. 2015. DARE: High-Performance State Machine Replication on RDMA Networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15). ACM, USA, 107--118. https://doi.org/10.1145/2749246.2749267Google ScholarDigital Library
Marius Poke, Torsten Hoefler, and Colin W. Glass. 2017. AllConcur: Leaderless Concurrent Atomic Broadcast. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '17). ACM, USA, 205--218. https://doi.org/10.1145/3078597.3078598Google Scholar
Ian Prittie. 2018. Windows Time Service | Microsoft Docs . https://docs.microsoft.com/en-us/windows-server/networking/windows-time-service/windows-time-service-top . (Accessed on 13/08/2019).Google Scholar
Benjamin Reed and Flavio P. Junqueira. 2008. A Simple Totally Ordered Broadcast Protocol. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS '08). ACM, USA, 2:1--2:6. https://doi.org/10.1145/1529974.1529978Google Scholar
Fred B. Schneider. 1990. Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial . ACM Comput. Surv. , Vol. 22, 4 (Dec. 1990), 299--319. https://doi.org/10.1145/98163.98167Google ScholarDigital Library
Michael L. Scott. 2013. Shared-Memory Synchronization.Google Scholar
Alex Shamis, Matthew Renzelmann, Stanko Novakovic, Georgios Chatzopoulos, Aleksandar Dragojević , Dushyanth Narayanan, and Miguel Castro. 2019. Fast General Distributed Transactions with Opacity. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 433--448. https://doi.org/10.1145/3299869.3300069Google ScholarDigital Library
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, USA, 183--197. https://doi.org/10.1145/2785956.2787508Google ScholarDigital Library
Dale Skeen. 1981. Nonblocking Commit Protocols. In Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data (SIGMOD '81). ACM, USA, 133--142. https://doi.org/10.1145/582318.582339Google ScholarDigital Library
Jeff Terrace and Michael J. Freedman. 2009. Object Storage on CRAQ: High-throughput Chain Replication for Read-mostly Workloads. In Proceedings of the 2009 Conference on USENIX Annual Technical Conference (USENIX'09). USENIX Association, Berkeley, CA, USA, 11--11.Google ScholarDigital Library
Robbert Van Renesse, Kenneth P. Birman, Bradford B. Glade, Katie Guo, Mark Hayden, Takako Hickey, Dalia Malki, Alex Vaysburd, and Werner Vogels. 1995. Horus: A Flexible Group Communications System . Technical Report. Cornell University, Ithaca, NY, USA.Google Scholar
Robbert van Renesse and Fred B. Schneider. 2004. Chain Replication for Supporting High Throughput and Availability. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation (OSDI'04). USENIX, Berkeley, CA, USA, 7--7.Google Scholar
Paolo Viotti and Marko Vukolić. 2016. Consistency in Non-Transactional Distributed Storage Systems . ACM Comput. Surv. , Vol. 49, 1 (2016), 19:1--19:34. https://doi.org/10.1145/2926965Google ScholarDigital Library
Werner Vogels. 2009. Eventually Consistent . Commun. ACM , Vol. 52, 1 (2009), 40--44. https://doi.org/10.1145/1435417.1435432Google ScholarDigital Library
Cheng Wang, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. 2017. APUS: Fast and Scalable Paxos on RDMA. In Proceedings of the Symposium on Cloud Computing (SoCC '17). ACM, USA, 94--107. https://doi.org/10.1145/3127479.3128609Google ScholarDigital Library
Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott Fritchie, Steven Swanson, Michael J. Freedman, and Dahlia Malkhi. 2017. vCorfu: A Cloud-scale Object Store on a Shared Log. In Proceedings of the 14th Conference on Networked Systems Design and Implementation (NSDI'17). USENIX Association, USA, 35--49.Google Scholar
Shinae Woo, Justine Sherry, Sangjin Han, Sue Moon, Sylvia Ratnasamy, and Scott Shenker. 2018. Elastic Scaling of Stateful Network Functions. In 15th Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 299--312.Google Scholar
Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. 2013. Transaction Chains: Achieving Serializability with Low Latency in Geo-distributed Storage Systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 276--291. https://doi.org/10.1145/2517349.2522729Google ScholarDigital Library
Hang Zhu, Zhihao Bai, Jialin Li, Ellis Michael, Dan Ports, Ion Stoica, and Xin Jin. 2019. Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection. arxiv: 1904.08964Google ScholarDigital Library

Index Terms

Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
  2. Dependable and fault-tolerant systems and networks
    1. Availability
    2. Reliability
2. Software and its engineering
  1. Software organization and properties
    1. Software functional properties
      1. Correctness
        Consistency

Recommendations

Odyssey: the impact of modern hardware on strongly-consistent replication protocols
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems

Get/Put Key-Value Stores (KVSes) rely on replication protocols to enforce consistency and guarantee availability. Today's modern hardware, with manycore servers and RDMA-capable networks, challenges the conventional wisdom on protocol design. In this ...
Read More
Consistent and automatic replica regeneration

Reducing management costs and improving the availability of large-scale distributed systems require automatic replica regeneration, that is, creating new replicas in response to replica failures. A major challenge to regeneration is maintaining ...
Read More
Multi-consistency Data Replication
ICPADS '10: Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems

Replication is a technique widely used in parallel and distributed systems to provide qualities such as performance, scalability, reliability and availability to their clients. These qualities comprise the non-functional requirements of the system. But ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2020
1412 pages
ISBN:9781450371025
DOI:10.1145/3373376
General Chair:
James Larus
EPFL
,
Program Chairs:
Luis Ceze
University of Washington
,
Karin Strauss
Microsoft
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Functional
Author Tags
availability
consistency
fault-tolerant
latency
linearizability
rdma
replication
throughput
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 2,695
  Total Downloads
- Downloads (Last 12 months)390
- Downloads (Last 6 weeks)77
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Odyssey: the impact of modern hardware on strongly-consistent replication protocols

Consistent and automatic replica regeneration

Multi-consistency Data Replication