skip to main content
10.1145/3373376.3378496acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol

Published:13 March 2020Publication History

ABSTRACT

Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency.

This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.

References

  1. Atul Adya, Daniel Myers, Jon Howell, Jeremy Elson, Colin Meek, Vishesh Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, Roberto Peon, Larry Kai, Alexander Shraer, Arif Merchant, and Kfir Lev-Ari. 2016. Slicer: Auto-sharding for Datacenter Applications. In Proceedings of the 12th Conference on Operating Systems Design and Implementation (OSDI'16). USENIX, USA, 739--753.Google ScholarGoogle Scholar
  2. Marcos Aguilera, Carole Gallet, Hugues Fauconnier, and Sam Toueg. 2000. Thrifty Generic Broadcast. In Proceedings of the 14th Conference on Distributed Computing (DISC '00). ., UK, 268--282.Google ScholarGoogle Scholar
  3. Marcos Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. SIGOPS Oper. Syst. Rev. , Vol. 41, 6 (2007), 159--174. https://doi.org/10.1145/1323293.1294278Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sérgio Almeida, Jo ao Leit ao, and Lu'is Rodrigues. 2013. ChainReaction: A CausalGoogle ScholarGoogle Scholar
  5. Consistent Datastore Based on Chain Replication. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 85--98. https://doi.org/10.1145/2465351.2465361Google ScholarGoogle Scholar
  6. Peter Alsberg and John Day. 1976. A Principle for Resilient Sharing of Distributed Resources. In Proceedings of the 2nd International Conference on Software Engineering (ICSE '76). IEEE, USA, 562--570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yair Amir, Louise Moser, Peter Melliar, Deborah Agarwal, and Paul Ciarfella. 1995. The Totem Single-ring Ordering and Membership Protocol. ACM Trans. Comput. Syst. , Vol. 13, 4 (Nov. 1995), 311--342. https://doi.org/10.1145/210223.210224Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, and Ali R. Butt. 2018. bespoKV: Application Tailored Scale-out Key-value Stores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 2, bibinfonumpages16 pages.Google ScholarGoogle Scholar
  9. Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-value Store . SIGMETRICS Perform. Eval. Rev. , Vol. 40, 1 (June 2012), 53--64. https://doi.org/10.1145/2318857.2254766Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1995. Sharing Memory Robustly in Message-passing Systems . J. ACM , Vol. 42, 1 (1995), 124--142. https://doi.org/10.1145/200836.200869Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hagit Attiya and Jennifer Welch. 1994. Sequential Consistency versus Linearizability. ACM Trans. Comput. Syst. , Vol. 12, 2 (May 1994), 91--122. https://doi.org/10.1145/176575.176576Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In Proceedings of the Conference on Innovative Data system Research (CIDR) . ., Asilomar, CA, 223--234.Google ScholarGoogle Scholar
  13. Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, and John D. Davis. 2012. CORFU: A Shared Log Design for Flash Clusters. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 1--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dotan Barak. 2013. Tips and tricks to optimize your RDMA code . https://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/. (Accessed on 13/08/2019).Google ScholarGoogle Scholar
  15. Dotan Barak. 2015. RDMA Aware Networks Programming User Manual .Google ScholarGoogle Scholar
  16. Luiz Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The datacenter as a computer: Designing warehouse-scale machines. Synthesis Lectures on Computer Architecture , Vol. 13, 3 (2018), i--189.Google ScholarGoogle ScholarCross RefCross Ref
  17. Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. 2017. Attack of the Killer Microseconds. Commun. ACM , Vol. 60, 4 (2017), 48--54. https://doi.org/10.1145/3015146Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jonathan Behrens, Ken Birman, Sagar Jha, Matthew Milano, Edward Tremel, Eugene Bagdasaryan, Theo Gkountouvas, Weijia Song, and Robbert Van Renesse. 2016. Derecho: Group Communication at the Speed of Light . Technical Report. Cornell University.Google ScholarGoogle Scholar
  19. Ken Birman and Thomas Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP '87). ACM, USA, 123--138. https://doi.org/10.1145/41457.37515Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos Replicated State Machines As the Basis of a High-performance Data Store. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). USENIX Association, USA, 141--154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fábio Botelho, Fernando Ramos, Diego Kreutz, and Alysson Bessani. 2013. On the Feasibility of a Consistent and Fault-Tolerant Data Store for SDNs. In Proceedings of the 2013 Second European Workshop on Software Defined Networks (EWSDN '13). IEEE, USA, 38--43. https://doi.org/10.1109/EWSDN.2013.13Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Eric Brewer. 2000. Towards Robust Distributed Systems. In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing (PODC '00). ACM, USA, 7--. https://doi.org/10.1145/343477.343502Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Eric Brewer. 2012. CAP twelve years later: How the" rules" have changed . Computer , Vol. 45, 2 (2012), 23--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the 2013 Conference on Annual Technical Conference (ATC'13). USENIX, Berkeley, 49--60.Google ScholarGoogle Scholar
  25. Mike Burrows. 2006. The Chubby Lock Service for Loosely-coupled Distributed Systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7 (OSDI '06). USENIX Association, USA, 24--24.Google ScholarGoogle Scholar
  26. Tushar Chandra, Vassos Hadzilacos, and Sam Toueg. 2016. An Algorithm for Replicated Objects with Efficient Reads. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC '16). ACM, New York, NY, USA, 325--334. https://doi.org/10.1145/2933057.2933111Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tushar Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM , Vol. 43, 2 (1996), 225--267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kelly Clay. 2013. Amazon.com Goes Down, Loses $66,240 Per Minute. https://www.forbes.com/sites/kellyclay/2013/08/19/amazon-com-goes-down-loses-66240-per-minute/#4e849f8b495c . (Accessed on 13/08/2019).Google ScholarGoogle Scholar
  29. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154. https://doi.org/10.1145/1807128.1807152Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. James Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google's Globally Distributed Database. ACM Trans. Comput. Syst. , Vol. 31, 3 (2013), 22. https://doi.org/10.1145/2491245Google ScholarGoogle ScholarCross RefCross Ref
  31. Huynh Tu Dang, Daniele Sciascia, Marco Canini, Fernando Pedone, and Robert Soulé. 2015. NetPaxos: Consensus at Network Speed. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research (SOSR '15). ACM, New York, Article 5, bibinfonumpages7 pages. https://doi.org/10.1145/2774993.2774999Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store . SIGOPS Oper. Sys. , Vol. 41, 6 (2007), 5--20. https://doi.org/10.1145/1323293.1294281Google ScholarGoogle Scholar
  33. Aleksandar Dragojević , Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 401--414.Google ScholarGoogle Scholar
  34. Aleksandar Dragojević , Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In Proceedings of the Symposium on Operating Systems Principles (SOSP '15). ACM, New York, 54--70. https://doi.org/10.1145/2815400.2815425Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the Presence of Partial Synchrony. J. ACM , Vol. 35, 2 (1988), 288--323. https://doi.org/10.1145/42282.42283Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Niklas Ekströ m and Seif Haridi. 2016. A Fault-Tolerant Sequentially Consistent DSM With a Compositional Correctness Proof .arxiv: 1608.02442Google ScholarGoogle Scholar
  37. Nathan Farrington. 2009. Multipath TCP under Massive Packet Reordering.Google ScholarGoogle Scholar
  38. Vasilis Gavrielatos, Antonios Katsarakis, Arpit Joshi, Nicolai Oswald, Boris Grot, and Vijay Nagarajan. 2018. Scale-out ccNUMA: Exploiting Skew with Strongly Consistent Caching. In Proceedings of the EuroSys Conference (EuroSys '18). ACM, USA, Article 21, bibinfonumpages15 pages. https://doi.org/10.1145/3190508.3190550Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. Acm Sigact News , Vol. 33, 2 (2002), 51--59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. 2011. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM '11). ACM, USA, 350--361. https://doi.org/10.1145/2018436.2018477Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jim Gray. 1978. Notes on Data Base Operating Systems. In Operating Systems, An Advanced Course . Springer-Verlag, London, UK, 393--481.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Rachid Guerraoui. 2002. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distributed Computing , Vol. 15, 1 (2002), 17--25. https://doi.org/10.1007/s446-002--8027--4Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Rachid Guerraoui, Dejan Kostic, Ron R. Levy, and Vivien Quema. 2007. A High Throughput Atomic Storage Algorithm. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS '07). IEEE Computer Society, Washington, DC, USA, 19--. https://doi.org/10.1109/ICDCS.2007.80Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rachid Guerraoui, Mikel Larrea, and André Schiper. 1995. Non Blocking Atomic Commitment with an Unreliable Failure Detector. In Proceedings of the 14TH Symposium on Reliable Distributed Systems (SRDS '95). IEEE Computer Society, Washington, DC, USA, 41--.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, USA, 202--215. https://doi.org/10.1145/2934872.2934908Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Maurice Herlihy and Jeannette Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects . ACM Trans. Program. Lang. Syst. , Vol. 12, 3 (July 1990), 463--492. https://doi.org/10.1145/78969.78972Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Heidi Howard. 2019. Distributed consensus revised (Thesis).Google ScholarGoogle Scholar
  49. Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC'10). USENIX Association, Berkeley, CA, USA, 11--11.Google ScholarGoogle Scholar
  50. Zsolt István, David Sidler, Gustavo Alonso, and Marko Vukolic. 2016. Consensus in a Box: Inexpensive Coordination in Hardware. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16). USENIX, USA, 425--438.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert Van Renesse, Sydney Zink, and Kenneth P. Birman. 2019. Derecho: Fast State Machine Replication for Cloud Services. Trans. Comput. Syst. , Vol. 36, 2, Article 4 (2019), bibinfonumpages49 pages. https://doi.org/10.1145/3302258Google ScholarGoogle Scholar
  52. Ricardo Jiménez-Peris, M. Pati no Mart'inez, Gustavo Alonso, and Bettina Kemme. 2003. Are Quorums an Alternative for Data Replication? ACM Trans. Database Syst. , Vol. 28, 3 (Sept. 2003), 257--294. https://doi.org/10.1145/937598.937601Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xin Jin, Xiaozhou Li, Haoyu Zhang, Nate Foster, Jeongkeun Lee, Robert Soulé , Changhoon Kim, and Ion Stoica. 2018. NetChain: Scale-Free Sub-RTT Coordination. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX , USA, 35--49.Google ScholarGoogle Scholar
  54. Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance Broadcast for Primary-backup Systems. In Proceedings of the IEEE 41st International Conference on Dependable Systems&Networks (DSN '11). IEEE, USA, 245--256. https://doi.org/10.1109/DSN.2011.5958223Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Gopal Kakivaya, Lu Xun, Richard Hasha, Shegufta Bakht Ahsan, Todd Pfleiger, Rishi Sinha, Anurag Gupta, Mihail Tarta, Mark Fussell, Vipul Modi, Mansoor Mohsin, Ray Kong, Anmol Ahuja, Oana Platon, Alex Wun, Matthew Snider, Chacko Daniel, Dan Mastrian, Yang Li, Aprameya Rao, Vaishnav Kidambi, Randy Wang, Abhishek Ram, Sumukh Shivaprakash, Rajeet Nair, Alan Warwick, Bharat S. Narasimman, Meng Lin, Jeffrey Chen, Abhay Balkrishna Mhatre, Preetha Subbarayalu, Mert Coskun, and Indranil Gupta. 2018. Service Fabric: A Distributed Platform for Building Microservices in the Cloud. In Proceedings of the EuroSys Conference (EuroSys '18). ACM, USA, 1--15. https://doi.org/10.1145/3190508.3190546Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Anuj Kalia, Michael Kaminsky, and David Andersen. 2014. Using RDMA Efficiently for Key-value Services . SIGCOMM Comput. Commun. Rev. , Vol. 44, 4 (Aug. 2014), 295--306. https://doi.org/10.1145/2740070.2626299Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Anuj Kalia, Michael Kaminsky, and David Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 437--450.Google ScholarGoogle Scholar
  58. Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-data Center Consistency. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 113--126. https://doi.org/10.1145/2465351.2465363Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. H. T. Kung, Trevor Blackwell, and Alan Chapman. 1994. Credit-based Flow Control for ATM Networks: Credit Update Protocol, Adaptive Credit Allocation and Statistical Multiplexing . In Proceedings of the Conference on Communications Architectures, Protocols and Applications (SIGCOMM '94). ACM, New York, NY, USA, 101--114. https://doi.org/10.1145/190314.190324Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System . SIGOPS Oper. Sys. , Vol. 44, 2 (2010), 35--40. https://doi.org/10.1145/1773912.1773922Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Christoph Lameter. 2005. Effective synchronization on Linux/NUMA systems.Google ScholarGoogle Scholar
  62. Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM , Vol. 21, 7 (1978), 558--565.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Leslie Lamport. 1994. The temporal logic of actions. Transactions on Programming Languages and Systems (TOPLAS) , Vol. 16, 3 (1994), 872--923.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Leslie Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems (TOCS) , Vol. 16, 2 (1998), 133--169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Leslie Lamport. 2005. Generalized consensus and Paxos .Google ScholarGoogle Scholar
  66. Leslie Lamport. 2006. Fast Paxos. Distributed Computing , Vol. 19, 2 (2006), 79--103. https://doi.org/10.1007/s00446-006-0005-xGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  67. Leslie Lamport et almbox. 2001. Paxos made simple. ACM Sigact News , Vol. 32, 4 (2001), 18--25.Google ScholarGoogle Scholar
  68. Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2009. Vertical Paxos and Primary-backup Replication. In Proceedings of the Symposium on Principles of Distributed Computing (PODC '09). ACM, USA, 312--313. https://doi.org/10.1145/1582716.1582783Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, and Dan R. K. Ports. 2016. Just Say No to Paxos Overhead: Replacing Consensus with Network Ordering. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 467--483.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Hyeontaek Lim, Dongsu Han, David Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-memory Key-value Storage. In Proceedings of the 11th Networked Systems Design and Implementation (NSDI'14). USENIX Association, USA, 429--444.Google ScholarGoogle Scholar
  71. Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited.Google ScholarGoogle Scholar
  72. Wyatt Lloyd, Michael Freedman, Michael Kaminsky, and David Andersen. 2011. Don't Settle for Eventual: Scalable Causal Consistency for Wide-area Storage with COPS. In Proceedings of the 23rd Symposium on Operating Systems Principles (SOSP '11). ACM, USA, 401--416. https://doi.org/10.1145/2043556.2043593Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, Enhong Chen, and Thomas Moscibroda. 2018. Multi-Path Transport for RDMA in Datacenters. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, USA, 357--371.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Nancy Lynch and Alexander Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. , bibinfonumpages272--281 pages. https://doi.org/10.1109/FTCS.1997.614100Google ScholarGoogle Scholar
  75. Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. 2008. Mencius: Building Efficient Replicated State Machines for WANs. In Proceedings of the 8th Conference on Operating Systems Design and Implementation (OSDI'08). USENIX, Berkeley, CA, USA, 369--384.Google ScholarGoogle Scholar
  76. Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2011. High Performance State-machine Replication. In Proceedings of the 41st International Conference on Dependable Systems&Networks (DSN '11). IEEE Computer Society, USA, 454--465. https://doi.org/10.1109/DSN.2011.5958258Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Parisa Jalili Marandi, Marco Primi, Nicolas Schiper, and Fernando Pedone. 2010. Ring Paxos: A high-throughput atomic broadcast protocol. In 2010 International Conference on Dependable Systems Networks. ., USA, 527--536. https://doi.org/10.1109/DSN.2010.5544272Google ScholarGoogle ScholarCross RefCross Ref
  78. Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: A Microkernel Approach to Host Networking. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). ACM, USA, 399--413. https://doi.org/10.1145/3341301.3359657Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Iulian Moraru, David Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments. In Proceedings of the 24th Symposium on Operating Systems Principles (SOSP '13). ACM, USA, 358--372. https://doi.org/10.1145/2517349.2517350Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Iulian Moraru, David Andersen, and Michael Kaminsky. 2014. Paxos Quorum Leases: Fast Reads Without Sacrificing Writes. In Proceedings of the Symposium on Cloud Computing (SOCC '14). ACM, USA, 1--13. https://doi.org/10.1145/2670979.2671001Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, and Yutaka Suzue. 2012. Flat Datacenter Storage. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12) . USENIX, Hollywood, CA, 1--15.Google ScholarGoogle Scholar
  82. Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2016. The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC '16). ACM, USA, 182--195. https://doi.org/10.1145/2987550.2987577Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems. In Proceedings of the Seventh Symposium on Principles of Distributed Computing (PODC '88). ACM, USA, 8--17. https://doi.org/10.1145/62546.62549Google ScholarGoogle Scholar
  84. Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC'14). USENIX, USA, 305--320.Google ScholarGoogle Scholar
  85. Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 29--41. https://doi.org/10.1145/2043556.2043560Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Seo Jin Park and John Ousterhout. 2019. Exploiting Commutativity for Practical Fast Replication. In Proceedings of the 16th Conference on Networked Systems Design and Implementation (NSDI'19). USENIX, USA, 47--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Marius Poke and Torsten Hoefler. 2015. DARE: High-Performance State Machine Replication on RDMA Networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15). ACM, USA, 107--118. https://doi.org/10.1145/2749246.2749267Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Marius Poke, Torsten Hoefler, and Colin W. Glass. 2017. AllConcur: Leaderless Concurrent Atomic Broadcast. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '17). ACM, USA, 205--218. https://doi.org/10.1145/3078597.3078598Google ScholarGoogle Scholar
  89. Ian Prittie. 2018. Windows Time Service | Microsoft Docs . https://docs.microsoft.com/en-us/windows-server/networking/windows-time-service/windows-time-service-top . (Accessed on 13/08/2019).Google ScholarGoogle Scholar
  90. Benjamin Reed and Flavio P. Junqueira. 2008. A Simple Totally Ordered Broadcast Protocol. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS '08). ACM, USA, 2:1--2:6. https://doi.org/10.1145/1529974.1529978Google ScholarGoogle Scholar
  91. Fred B. Schneider. 1990. Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial . ACM Comput. Surv. , Vol. 22, 4 (Dec. 1990), 299--319. https://doi.org/10.1145/98163.98167Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Michael L. Scott. 2013. Shared-Memory Synchronization.Google ScholarGoogle Scholar
  93. Alex Shamis, Matthew Renzelmann, Stanko Novakovic, Georgios Chatzopoulos, Aleksandar Dragojević , Dushyanth Narayanan, and Miguel Castro. 2019. Fast General Distributed Transactions with Opacity. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 433--448. https://doi.org/10.1145/3299869.3300069Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, USA, 183--197. https://doi.org/10.1145/2785956.2787508Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Dale Skeen. 1981. Nonblocking Commit Protocols. In Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data (SIGMOD '81). ACM, USA, 133--142. https://doi.org/10.1145/582318.582339Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Jeff Terrace and Michael J. Freedman. 2009. Object Storage on CRAQ: High-throughput Chain Replication for Read-mostly Workloads. In Proceedings of the 2009 Conference on USENIX Annual Technical Conference (USENIX'09). USENIX Association, Berkeley, CA, USA, 11--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Robbert Van Renesse, Kenneth P. Birman, Bradford B. Glade, Katie Guo, Mark Hayden, Takako Hickey, Dalia Malki, Alex Vaysburd, and Werner Vogels. 1995. Horus: A Flexible Group Communications System . Technical Report. Cornell University, Ithaca, NY, USA.Google ScholarGoogle Scholar
  98. Robbert van Renesse and Fred B. Schneider. 2004. Chain Replication for Supporting High Throughput and Availability. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation (OSDI'04). USENIX, Berkeley, CA, USA, 7--7.Google ScholarGoogle Scholar
  99. Paolo Viotti and Marko Vukolić. 2016. Consistency in Non-Transactional Distributed Storage Systems . ACM Comput. Surv. , Vol. 49, 1 (2016), 19:1--19:34. https://doi.org/10.1145/2926965Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Werner Vogels. 2009. Eventually Consistent . Commun. ACM , Vol. 52, 1 (2009), 40--44. https://doi.org/10.1145/1435417.1435432Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Cheng Wang, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. 2017. APUS: Fast and Scalable Paxos on RDMA. In Proceedings of the Symposium on Cloud Computing (SoCC '17). ACM, USA, 94--107. https://doi.org/10.1145/3127479.3128609Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott Fritchie, Steven Swanson, Michael J. Freedman, and Dahlia Malkhi. 2017. vCorfu: A Cloud-scale Object Store on a Shared Log. In Proceedings of the 14th Conference on Networked Systems Design and Implementation (NSDI'17). USENIX Association, USA, 35--49.Google ScholarGoogle Scholar
  103. Shinae Woo, Justine Sherry, Sangjin Han, Sue Moon, Sylvia Ratnasamy, and Scott Shenker. 2018. Elastic Scaling of Stateful Network Functions. In 15th Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 299--312.Google ScholarGoogle Scholar
  104. Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. 2013. Transaction Chains: Achieving Serializability with Low Latency in Geo-distributed Storage Systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 276--291. https://doi.org/10.1145/2517349.2522729Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Hang Zhu, Zhihao Bai, Jialin Li, Ellis Michael, Dan Ports, Ion Stoica, and Xin Jin. 2019. Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection. arxiv: 1904.08964Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
            March 2020
            1412 pages
            ISBN:9781450371025
            DOI:10.1145/3373376

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 March 2020

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate535of2,713submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader