ABSTRACT
Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency.
This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.
- Atul Adya, Daniel Myers, Jon Howell, Jeremy Elson, Colin Meek, Vishesh Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, Roberto Peon, Larry Kai, Alexander Shraer, Arif Merchant, and Kfir Lev-Ari. 2016. Slicer: Auto-sharding for Datacenter Applications. In Proceedings of the 12th Conference on Operating Systems Design and Implementation (OSDI'16). USENIX, USA, 739--753.Google Scholar
- Marcos Aguilera, Carole Gallet, Hugues Fauconnier, and Sam Toueg. 2000. Thrifty Generic Broadcast. In Proceedings of the 14th Conference on Distributed Computing (DISC '00). ., UK, 268--282.Google Scholar
- Marcos Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. SIGOPS Oper. Syst. Rev. , Vol. 41, 6 (2007), 159--174. https://doi.org/10.1145/1323293.1294278Google ScholarDigital Library
- Sérgio Almeida, Jo ao Leit ao, and Lu'is Rodrigues. 2013. ChainReaction: A CausalGoogle Scholar
- Consistent Datastore Based on Chain Replication. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 85--98. https://doi.org/10.1145/2465351.2465361Google Scholar
- Peter Alsberg and John Day. 1976. A Principle for Resilient Sharing of Distributed Resources. In Proceedings of the 2nd International Conference on Software Engineering (ICSE '76). IEEE, USA, 562--570.Google ScholarDigital Library
- Yair Amir, Louise Moser, Peter Melliar, Deborah Agarwal, and Paul Ciarfella. 1995. The Totem Single-ring Ordering and Membership Protocol. ACM Trans. Comput. Syst. , Vol. 13, 4 (Nov. 1995), 311--342. https://doi.org/10.1145/210223.210224Google ScholarDigital Library
- Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, and Ali R. Butt. 2018. bespoKV: Application Tailored Scale-out Key-value Stores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 2, bibinfonumpages16 pages.Google Scholar
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-value Store . SIGMETRICS Perform. Eval. Rev. , Vol. 40, 1 (June 2012), 53--64. https://doi.org/10.1145/2318857.2254766Google ScholarDigital Library
- Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1995. Sharing Memory Robustly in Message-passing Systems . J. ACM , Vol. 42, 1 (1995), 124--142. https://doi.org/10.1145/200836.200869Google ScholarDigital Library
- Hagit Attiya and Jennifer Welch. 1994. Sequential Consistency versus Linearizability. ACM Trans. Comput. Syst. , Vol. 12, 2 (May 1994), 91--122. https://doi.org/10.1145/176575.176576Google ScholarDigital Library
- Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In Proceedings of the Conference on Innovative Data system Research (CIDR) . ., Asilomar, CA, 223--234.Google Scholar
- Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, and John D. Davis. 2012. CORFU: A Shared Log Design for Flash Clusters. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 1--1.Google ScholarDigital Library
- Dotan Barak. 2013. Tips and tricks to optimize your RDMA code . https://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/. (Accessed on 13/08/2019).Google Scholar
- Dotan Barak. 2015. RDMA Aware Networks Programming User Manual .Google Scholar
- Luiz Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The datacenter as a computer: Designing warehouse-scale machines. Synthesis Lectures on Computer Architecture , Vol. 13, 3 (2018), i--189.Google ScholarCross Ref
- Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. 2017. Attack of the Killer Microseconds. Commun. ACM , Vol. 60, 4 (2017), 48--54. https://doi.org/10.1145/3015146Google ScholarDigital Library
- Jonathan Behrens, Ken Birman, Sagar Jha, Matthew Milano, Edward Tremel, Eugene Bagdasaryan, Theo Gkountouvas, Weijia Song, and Robbert Van Renesse. 2016. Derecho: Group Communication at the Speed of Light . Technical Report. Cornell University.Google Scholar
- Ken Birman and Thomas Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP '87). ACM, USA, 123--138. https://doi.org/10.1145/41457.37515Google ScholarDigital Library
- William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos Replicated State Machines As the Basis of a High-performance Data Store. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). USENIX Association, USA, 141--154.Google ScholarDigital Library
- Fábio Botelho, Fernando Ramos, Diego Kreutz, and Alysson Bessani. 2013. On the Feasibility of a Consistent and Fault-Tolerant Data Store for SDNs. In Proceedings of the 2013 Second European Workshop on Software Defined Networks (EWSDN '13). IEEE, USA, 38--43. https://doi.org/10.1109/EWSDN.2013.13Google ScholarDigital Library
- Eric Brewer. 2000. Towards Robust Distributed Systems. In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing (PODC '00). ACM, USA, 7--. https://doi.org/10.1145/343477.343502Google ScholarDigital Library
- Eric Brewer. 2012. CAP twelve years later: How the" rules" have changed . Computer , Vol. 45, 2 (2012), 23--29.Google ScholarDigital Library
- Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the 2013 Conference on Annual Technical Conference (ATC'13). USENIX, Berkeley, 49--60.Google Scholar
- Mike Burrows. 2006. The Chubby Lock Service for Loosely-coupled Distributed Systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7 (OSDI '06). USENIX Association, USA, 24--24.Google Scholar
- Tushar Chandra, Vassos Hadzilacos, and Sam Toueg. 2016. An Algorithm for Replicated Objects with Efficient Reads. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC '16). ACM, New York, NY, USA, 325--334. https://doi.org/10.1145/2933057.2933111Google ScholarDigital Library
- Tushar Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM , Vol. 43, 2 (1996), 225--267.Google ScholarDigital Library
- Kelly Clay. 2013. Amazon.com Goes Down, Loses $66,240 Per Minute. https://www.forbes.com/sites/kellyclay/2013/08/19/amazon-com-goes-down-loses-66240-per-minute/#4e849f8b495c . (Accessed on 13/08/2019).Google Scholar
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154. https://doi.org/10.1145/1807128.1807152Google ScholarDigital Library
- James Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google's Globally Distributed Database. ACM Trans. Comput. Syst. , Vol. 31, 3 (2013), 22. https://doi.org/10.1145/2491245Google ScholarCross Ref
- Huynh Tu Dang, Daniele Sciascia, Marco Canini, Fernando Pedone, and Robert Soulé. 2015. NetPaxos: Consensus at Network Speed. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research (SOSR '15). ACM, New York, Article 5, bibinfonumpages7 pages. https://doi.org/10.1145/2774993.2774999Google ScholarDigital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store . SIGOPS Oper. Sys. , Vol. 41, 6 (2007), 5--20. https://doi.org/10.1145/1323293.1294281Google Scholar
- Aleksandar Dragojević , Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 401--414.Google Scholar
- Aleksandar Dragojević , Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In Proceedings of the Symposium on Operating Systems Principles (SOSP '15). ACM, New York, 54--70. https://doi.org/10.1145/2815400.2815425Google ScholarDigital Library
- Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the Presence of Partial Synchrony. J. ACM , Vol. 35, 2 (1988), 288--323. https://doi.org/10.1145/42282.42283Google ScholarDigital Library
- Niklas Ekströ m and Seif Haridi. 2016. A Fault-Tolerant Sequentially Consistent DSM With a Compositional Correctness Proof .arxiv: 1608.02442Google Scholar
- Nathan Farrington. 2009. Multipath TCP under Massive Packet Reordering.Google Scholar
- Vasilis Gavrielatos, Antonios Katsarakis, Arpit Joshi, Nicolai Oswald, Boris Grot, and Vijay Nagarajan. 2018. Scale-out ccNUMA: Exploiting Skew with Strongly Consistent Caching. In Proceedings of the EuroSys Conference (EuroSys '18). ACM, USA, Article 21, bibinfonumpages15 pages. https://doi.org/10.1145/3190508.3190550Google ScholarDigital Library
- Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. Acm Sigact News , Vol. 33, 2 (2002), 51--59.Google ScholarDigital Library
- Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. 2011. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM '11). ACM, USA, 350--361. https://doi.org/10.1145/2018436.2018477Google ScholarDigital Library
- Jim Gray. 1978. Notes on Data Base Operating Systems. In Operating Systems, An Advanced Course . Springer-Verlag, London, UK, 393--481.Google ScholarDigital Library
- Rachid Guerraoui. 2002. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distributed Computing , Vol. 15, 1 (2002), 17--25. https://doi.org/10.1007/s446-002--8027--4Google ScholarDigital Library
- Rachid Guerraoui, Dejan Kostic, Ron R. Levy, and Vivien Quema. 2007. A High Throughput Atomic Storage Algorithm. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS '07). IEEE Computer Society, Washington, DC, USA, 19--. https://doi.org/10.1109/ICDCS.2007.80Google ScholarDigital Library
- Rachid Guerraoui, Mikel Larrea, and André Schiper. 1995. Non Blocking Atomic Commitment with an Unreliable Failure Detector. In Proceedings of the 14TH Symposium on Reliable Distributed Systems (SRDS '95). IEEE Computer Society, Washington, DC, USA, 41--.Google ScholarDigital Library
- Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, USA, 202--215. https://doi.org/10.1145/2934872.2934908Google ScholarDigital Library
- Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., USA.Google ScholarDigital Library
- Maurice Herlihy and Jeannette Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects . ACM Trans. Program. Lang. Syst. , Vol. 12, 3 (July 1990), 463--492. https://doi.org/10.1145/78969.78972Google ScholarDigital Library
- Heidi Howard. 2019. Distributed consensus revised (Thesis).Google Scholar
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC'10). USENIX Association, Berkeley, CA, USA, 11--11.Google Scholar
- Zsolt István, David Sidler, Gustavo Alonso, and Marko Vukolic. 2016. Consensus in a Box: Inexpensive Coordination in Hardware. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16). USENIX, USA, 425--438.Google ScholarDigital Library
- Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert Van Renesse, Sydney Zink, and Kenneth P. Birman. 2019. Derecho: Fast State Machine Replication for Cloud Services. Trans. Comput. Syst. , Vol. 36, 2, Article 4 (2019), bibinfonumpages49 pages. https://doi.org/10.1145/3302258Google Scholar
- Ricardo Jiménez-Peris, M. Pati no Mart'inez, Gustavo Alonso, and Bettina Kemme. 2003. Are Quorums an Alternative for Data Replication? ACM Trans. Database Syst. , Vol. 28, 3 (Sept. 2003), 257--294. https://doi.org/10.1145/937598.937601Google ScholarDigital Library
- Xin Jin, Xiaozhou Li, Haoyu Zhang, Nate Foster, Jeongkeun Lee, Robert Soulé , Changhoon Kim, and Ion Stoica. 2018. NetChain: Scale-Free Sub-RTT Coordination. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX , USA, 35--49.Google Scholar
- Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance Broadcast for Primary-backup Systems. In Proceedings of the IEEE 41st International Conference on Dependable Systems&Networks (DSN '11). IEEE, USA, 245--256. https://doi.org/10.1109/DSN.2011.5958223Google ScholarDigital Library
- Gopal Kakivaya, Lu Xun, Richard Hasha, Shegufta Bakht Ahsan, Todd Pfleiger, Rishi Sinha, Anurag Gupta, Mihail Tarta, Mark Fussell, Vipul Modi, Mansoor Mohsin, Ray Kong, Anmol Ahuja, Oana Platon, Alex Wun, Matthew Snider, Chacko Daniel, Dan Mastrian, Yang Li, Aprameya Rao, Vaishnav Kidambi, Randy Wang, Abhishek Ram, Sumukh Shivaprakash, Rajeet Nair, Alan Warwick, Bharat S. Narasimman, Meng Lin, Jeffrey Chen, Abhay Balkrishna Mhatre, Preetha Subbarayalu, Mert Coskun, and Indranil Gupta. 2018. Service Fabric: A Distributed Platform for Building Microservices in the Cloud. In Proceedings of the EuroSys Conference (EuroSys '18). ACM, USA, 1--15. https://doi.org/10.1145/3190508.3190546Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David Andersen. 2014. Using RDMA Efficiently for Key-value Services . SIGCOMM Comput. Commun. Rev. , Vol. 44, 4 (Aug. 2014), 295--306. https://doi.org/10.1145/2740070.2626299Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 437--450.Google Scholar
- Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-data Center Consistency. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 113--126. https://doi.org/10.1145/2465351.2465363Google ScholarDigital Library
- H. T. Kung, Trevor Blackwell, and Alan Chapman. 1994. Credit-based Flow Control for ATM Networks: Credit Update Protocol, Adaptive Credit Allocation and Statistical Multiplexing . In Proceedings of the Conference on Communications Architectures, Protocols and Applications (SIGCOMM '94). ACM, New York, NY, USA, 101--114. https://doi.org/10.1145/190314.190324Google ScholarDigital Library
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System . SIGOPS Oper. Sys. , Vol. 44, 2 (2010), 35--40. https://doi.org/10.1145/1773912.1773922Google ScholarDigital Library
- Christoph Lameter. 2005. Effective synchronization on Linux/NUMA systems.Google Scholar
- Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM , Vol. 21, 7 (1978), 558--565.Google ScholarDigital Library
- Leslie Lamport. 1994. The temporal logic of actions. Transactions on Programming Languages and Systems (TOPLAS) , Vol. 16, 3 (1994), 872--923.Google ScholarDigital Library
- Leslie Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems (TOCS) , Vol. 16, 2 (1998), 133--169.Google ScholarDigital Library
- Leslie Lamport. 2005. Generalized consensus and Paxos .Google Scholar
- Leslie Lamport. 2006. Fast Paxos. Distributed Computing , Vol. 19, 2 (2006), 79--103. https://doi.org/10.1007/s00446-006-0005-xGoogle ScholarDigital Library
- Leslie Lamport et almbox. 2001. Paxos made simple. ACM Sigact News , Vol. 32, 4 (2001), 18--25.Google Scholar
- Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2009. Vertical Paxos and Primary-backup Replication. In Proceedings of the Symposium on Principles of Distributed Computing (PODC '09). ACM, USA, 312--313. https://doi.org/10.1145/1582716.1582783Google ScholarDigital Library
- Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, and Dan R. K. Ports. 2016. Just Say No to Paxos Overhead: Replacing Consensus with Network Ordering. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 467--483.Google ScholarDigital Library
- Hyeontaek Lim, Dongsu Han, David Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-memory Key-value Storage. In Proceedings of the 11th Networked Systems Design and Implementation (NSDI'14). USENIX Association, USA, 429--444.Google Scholar
- Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited.Google Scholar
- Wyatt Lloyd, Michael Freedman, Michael Kaminsky, and David Andersen. 2011. Don't Settle for Eventual: Scalable Causal Consistency for Wide-area Storage with COPS. In Proceedings of the 23rd Symposium on Operating Systems Principles (SOSP '11). ACM, USA, 401--416. https://doi.org/10.1145/2043556.2043593Google ScholarDigital Library
- Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, Enhong Chen, and Thomas Moscibroda. 2018. Multi-Path Transport for RDMA in Datacenters. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, USA, 357--371.Google ScholarDigital Library
- Nancy Lynch and Alexander Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. , bibinfonumpages272--281 pages. https://doi.org/10.1109/FTCS.1997.614100Google Scholar
- Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. 2008. Mencius: Building Efficient Replicated State Machines for WANs. In Proceedings of the 8th Conference on Operating Systems Design and Implementation (OSDI'08). USENIX, Berkeley, CA, USA, 369--384.Google Scholar
- Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2011. High Performance State-machine Replication. In Proceedings of the 41st International Conference on Dependable Systems&Networks (DSN '11). IEEE Computer Society, USA, 454--465. https://doi.org/10.1109/DSN.2011.5958258Google ScholarDigital Library
- Parisa Jalili Marandi, Marco Primi, Nicolas Schiper, and Fernando Pedone. 2010. Ring Paxos: A high-throughput atomic broadcast protocol. In 2010 International Conference on Dependable Systems Networks. ., USA, 527--536. https://doi.org/10.1109/DSN.2010.5544272Google ScholarCross Ref
- Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: A Microkernel Approach to Host Networking. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). ACM, USA, 399--413. https://doi.org/10.1145/3341301.3359657Google ScholarDigital Library
- Iulian Moraru, David Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments. In Proceedings of the 24th Symposium on Operating Systems Principles (SOSP '13). ACM, USA, 358--372. https://doi.org/10.1145/2517349.2517350Google ScholarDigital Library
- Iulian Moraru, David Andersen, and Michael Kaminsky. 2014. Paxos Quorum Leases: Fast Reads Without Sacrificing Writes. In Proceedings of the Symposium on Cloud Computing (SOCC '14). ACM, USA, 1--13. https://doi.org/10.1145/2670979.2671001Google ScholarDigital Library
- Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, and Yutaka Suzue. 2012. Flat Datacenter Storage. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12) . USENIX, Hollywood, CA, 1--15.Google Scholar
- Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2016. The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC '16). ACM, USA, 182--195. https://doi.org/10.1145/2987550.2987577Google ScholarDigital Library
- Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems. In Proceedings of the Seventh Symposium on Principles of Distributed Computing (PODC '88). ACM, USA, 8--17. https://doi.org/10.1145/62546.62549Google Scholar
- Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC'14). USENIX, USA, 305--320.Google Scholar
- Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 29--41. https://doi.org/10.1145/2043556.2043560Google ScholarDigital Library
- Seo Jin Park and John Ousterhout. 2019. Exploiting Commutativity for Practical Fast Replication. In Proceedings of the 16th Conference on Networked Systems Design and Implementation (NSDI'19). USENIX, USA, 47--64.Google ScholarDigital Library
- Marius Poke and Torsten Hoefler. 2015. DARE: High-Performance State Machine Replication on RDMA Networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15). ACM, USA, 107--118. https://doi.org/10.1145/2749246.2749267Google ScholarDigital Library
- Marius Poke, Torsten Hoefler, and Colin W. Glass. 2017. AllConcur: Leaderless Concurrent Atomic Broadcast. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '17). ACM, USA, 205--218. https://doi.org/10.1145/3078597.3078598Google Scholar
- Ian Prittie. 2018. Windows Time Service | Microsoft Docs . https://docs.microsoft.com/en-us/windows-server/networking/windows-time-service/windows-time-service-top . (Accessed on 13/08/2019).Google Scholar
- Benjamin Reed and Flavio P. Junqueira. 2008. A Simple Totally Ordered Broadcast Protocol. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS '08). ACM, USA, 2:1--2:6. https://doi.org/10.1145/1529974.1529978Google Scholar
- Fred B. Schneider. 1990. Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial . ACM Comput. Surv. , Vol. 22, 4 (Dec. 1990), 299--319. https://doi.org/10.1145/98163.98167Google ScholarDigital Library
- Michael L. Scott. 2013. Shared-Memory Synchronization.Google Scholar
- Alex Shamis, Matthew Renzelmann, Stanko Novakovic, Georgios Chatzopoulos, Aleksandar Dragojević , Dushyanth Narayanan, and Miguel Castro. 2019. Fast General Distributed Transactions with Opacity. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 433--448. https://doi.org/10.1145/3299869.3300069Google ScholarDigital Library
- Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, USA, 183--197. https://doi.org/10.1145/2785956.2787508Google ScholarDigital Library
- Dale Skeen. 1981. Nonblocking Commit Protocols. In Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data (SIGMOD '81). ACM, USA, 133--142. https://doi.org/10.1145/582318.582339Google ScholarDigital Library
- Jeff Terrace and Michael J. Freedman. 2009. Object Storage on CRAQ: High-throughput Chain Replication for Read-mostly Workloads. In Proceedings of the 2009 Conference on USENIX Annual Technical Conference (USENIX'09). USENIX Association, Berkeley, CA, USA, 11--11.Google ScholarDigital Library
- Robbert Van Renesse, Kenneth P. Birman, Bradford B. Glade, Katie Guo, Mark Hayden, Takako Hickey, Dalia Malki, Alex Vaysburd, and Werner Vogels. 1995. Horus: A Flexible Group Communications System . Technical Report. Cornell University, Ithaca, NY, USA.Google Scholar
- Robbert van Renesse and Fred B. Schneider. 2004. Chain Replication for Supporting High Throughput and Availability. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation (OSDI'04). USENIX, Berkeley, CA, USA, 7--7.Google Scholar
- Paolo Viotti and Marko Vukolić. 2016. Consistency in Non-Transactional Distributed Storage Systems . ACM Comput. Surv. , Vol. 49, 1 (2016), 19:1--19:34. https://doi.org/10.1145/2926965Google ScholarDigital Library
- Werner Vogels. 2009. Eventually Consistent . Commun. ACM , Vol. 52, 1 (2009), 40--44. https://doi.org/10.1145/1435417.1435432Google ScholarDigital Library
- Cheng Wang, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. 2017. APUS: Fast and Scalable Paxos on RDMA. In Proceedings of the Symposium on Cloud Computing (SoCC '17). ACM, USA, 94--107. https://doi.org/10.1145/3127479.3128609Google ScholarDigital Library
- Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott Fritchie, Steven Swanson, Michael J. Freedman, and Dahlia Malkhi. 2017. vCorfu: A Cloud-scale Object Store on a Shared Log. In Proceedings of the 14th Conference on Networked Systems Design and Implementation (NSDI'17). USENIX Association, USA, 35--49.Google Scholar
- Shinae Woo, Justine Sherry, Sangjin Han, Sue Moon, Sylvia Ratnasamy, and Scott Shenker. 2018. Elastic Scaling of Stateful Network Functions. In 15th Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 299--312.Google Scholar
- Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. 2013. Transaction Chains: Achieving Serializability with Low Latency in Geo-distributed Storage Systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 276--291. https://doi.org/10.1145/2517349.2522729Google ScholarDigital Library
- Hang Zhu, Zhihao Bai, Jialin Li, Ellis Michael, Dan Ports, Ion Stoica, and Xin Jin. 2019. Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection. arxiv: 1904.08964Google ScholarDigital Library
Index Terms
- Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol
Recommendations
Odyssey: the impact of modern hardware on strongly-consistent replication protocols
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer SystemsGet/Put Key-Value Stores (KVSes) rely on replication protocols to enforce consistency and guarantee availability. Today's modern hardware, with manycore servers and RDMA-capable networks, challenges the conventional wisdom on protocol design. In this ...
Consistent and automatic replica regeneration
Reducing management costs and improving the availability of large-scale distributed systems require automatic replica regeneration, that is, creating new replicas in response to replica failures. A major challenge to regeneration is maintaining ...
Multi-consistency Data Replication
ICPADS '10: Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed SystemsReplication is a technique widely used in parallel and distributed systems to provide qualities such as performance, scalability, reliability and availability to their clients. These qualities comprise the non-functional requirements of the system. But ...
Comments