ABSTRACT
The trend towards global applications and services has created an increasing demand for transaction processing on globally-distributed data. Many database systems, such as Spanner and CockroachDB, support distributed transactions but require a large number of wide-area network roundtrips to commit each transaction and ensure the transaction's state is durably replicated across multiple datacenters. This can significantly increase transaction completion time, resulting in developers replacing database-level transactions with their own error-prone application-level solutions.
This paper introduces Carousel, a distributed database system that provides low-latency transaction processing for multi-partition globally-distributed transactions. Carousel shortens transaction processing time by reducing the number of sequential wide-area network round trips required to commit a transaction and replicate its results while maintaining serializability. This is possible in part by using information about a transaction's potential write set to enable transaction processing, including any necessary remote read operations, to overlap with 2PC and state replication. Carousel further reduces transaction completion time by introducing a consensus protocol that can perform state replication in parallel with 2PC. For a multi-partition 2-round Fixed-set Interactive (2FI) transaction, Carousel requires at most two wide-area network roundtrips to commit the transaction when there are no failures, and only one round trip in the common case if local replicas are available.
- Atul Adya, Robert Gruber, Barbara Liskov, and Umesh Maheshwari. 1995. Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks SIGMOD. Google ScholarDigital Library
- Divy Agrawal, Amr El Abbadi, and Kenneth Salem. 2015. A Taxonomy of Partitioned Replicated Cloud-based Database Systems. IEEE Data Eng. Bull. Vol. 38, 1 (2015).Google Scholar
- Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A New Paradigm for Building Scalable Distributed Systems SOSP. Google ScholarDigital Library
- Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing Scalable, Highly Available Storage for Interactive Services Proceedings of the Conference on Innovative Data system Research (CIDR).Google Scholar
- Philip A. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, David B. Lomet, Ramesh Manne, Lev Novik, and Tomas Talius. 2011. Adapting microsoft SQL server for cloud computing ICDE. Google ScholarDigital Library
- Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph USENIX ATC. Google ScholarDigital Library
- Mike Burrows. 2006. The Chubby Lock Service for Loosely-coupled Distributed Systems OSDI. Google ScholarDigital Library
- Cockroach Labs. 2017. CockroachDB. https://github.com/cockroachdb/cockroach. (2017).Google Scholar
- Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!'s Hosted Data Serving Platform. VLDB (2008). Google ScholarDigital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In SoCC. Google ScholarDigital Library
- James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-distributed Database. In OSDI. Google ScholarDigital Library
- CoreOS. 2017. Raft Implementation. https://github.com/coreos/etcd/tree/master. (2017).Google Scholar
- James Cowling and Barbara Liskov. 2012. Granola: Low-overhead Distributed Transaction Coordination USENIX ATC. Google ScholarDigital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store. In SOSP. Google ScholarDigital Library
- Akon Dey, Alan Fekete, Raghunath Nambiar, and Uwe Rohm. 2014. YCSB T: Benchmarking web-scale transactional databases ICDEW.Google Scholar
- Robert Escriva and Robbert van Renesse. 2016. Consus: Taming the Paxi. CoRR Vol. abs/1612.03457 (2016).Google Scholar
- Google. 2017. gRPC-go. https://github.com/grpc/grpc-go. (2017).Google Scholar
- Stephen Hemminger. 2005. Network Emulation with NetEm. In Australia's 6th National Linux Conference.Google Scholar
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems USENIX ATC. Google ScholarDigital Library
- Flavio Paiva Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems Proceedings of the 2011 IEEE/IFIP International Conference on Dependable Systems and Networks. Google ScholarDigital Library
- Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: A High-performance, Distributed Main Memory Transaction Processing System. VLDB. Google ScholarDigital Library
- David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing. ACM. Google ScholarDigital Library
- Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-data Center Consistency. In EuroSys. Google ScholarDigital Library
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. Vol. 44, 2 (2010). Google ScholarDigital Library
- Leslie Lamport. 1998. The Part-time Parliament. ACM Trans. Comput. Syst. Vol. 16, 2 (1998). Google ScholarDigital Library
- Leslie Lamport. 2001. Paxos Made Simple. Technical Report, Microsoft (2001).Google Scholar
- Leslie Lamport. 2005. Generalized Consensus and Paxos. Technical Report, Microsoft (2005).Google Scholar
- Leslie Lamport. 2006. Fast Paxos. Distributed Computing Vol. 19 (October . 2006).Google Scholar
- Leslie Lamport and Mike Massa. 2004. Cheap Paxos. Technical Report, Microsoft (2004).Google Scholar
- Costin Leau. 2013. Spring Data Redis - Retwis-J. https://docs.spring.io/spring-data/data-keyvalue/examples/retwisj/current/. (2013).Google Scholar
- Barbara Liskov, Miguel Castro, Liuba Shrira, and Atul Adya. 1999. Providing Persistent Objects in Distributed Systems ECOOP. Google ScholarDigital Library
- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2011. Don'T Settle for Eventual: Scalable Causal Consistency for Wide-area Storage with COPS. In SOSP. Google ScholarDigital Library
- Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. 2013. Stronger Semantics for Low-latency Geo-replicated Storage NSDI. Google ScholarDigital Library
- Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, and Amr El Abbadi. 2013. Low-latency Multi-datacenter Databases Using Replicated Commit. VLDB. Google ScholarDigital Library
- Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. 2008. Mencius: Building Efficient Replicated State Machines for WANs OSDI. Google ScholarDigital Library
- Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments SOSP. Google ScholarDigital Library
- Shuai Mu, Yang Cui, Yang Zhang, Wyatt Lloyd, and Jinyang Li. 2014. Extracting More Concurrency from Distributed Transactions OSDI. Google ScholarDigital Library
- Shuai Mu, Lamont Nelson, Wyatt Lloyd, and Jinyang Li. 2016. Consolidating Concurrency Control and Consensus for Commits under Conflicts OSDI. Google ScholarDigital Library
- Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems. In PODC. Google ScholarDigital Library
- Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm USENIX ATC. Google ScholarDigital Library
- Andrew Pavlo. 2017. What Are We Doing With Our Lives?: Nobody Cares About Our Concurrency Control Research. In SIGMOD. Google ScholarDigital Library
- Dan R. K. Ports, Jialin Li, Vincent Liu, Naveen Kr. Sharma, and Arvind Krishnamurthy. 2015. Designing Distributed Systems Using Approximate Synchrony in Data Center Networks NSDI. Google ScholarDigital Library
- Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. 2011. Transactional Storage for Geo-replicated Systems. In SOSP. Google ScholarDigital Library
- D. B. Terry, M. M. Theimer, Karin Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser. 1995. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System SOSP. Google ScholarDigital Library
- Alexander Thomson and Daniel J. Abadi. 2010. The Case for Determinism in Database Systems. Proc. VLDB Endowment Vol. 3, 1--2 (2010), 70--80. Google ScholarDigital Library
- Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. 2012. Calvin: Fast Distributed Transactions for Partitioned Database Systems SIGMOD. Google ScholarDigital Library
- UWSysLab. 2017. TAPIR Implementation. https://github.com/UWSysLab/tapir. (2017).Google Scholar
- Robbert Van Renesse and Deniz Altinbuken. 2015. Paxos Made Moderately Complex. ACM Comput. Surv. Vol. 47, 3 (2015). Google ScholarDigital Library
- Robbert van Renesse, Nicolas Schiper, and Fred B. Schneider. 2015. Vive La Différence: Paxos vs. Viewstamped Replication vs. Zab. IEEE Trans. Dependable Sec. Comput. Vol. 12, 4 (2015).Google Scholar
- Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan R. K. Ports. 2015. Building Consistent Transactions with Inconsistent Replication SOSP. Google ScholarDigital Library
- Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. 2013. Transaction Chains: Achieving Serializability with Low Latency in Geo-distributed Storage Systems. In SOSP. Google ScholarDigital Library
Index Terms
- Carousel: Low-Latency Transaction Processing for Globally-Distributed Data
Recommendations
Fast In-Memory Transaction Processing Using RDMA and HTM
DrTM is a fast in-memory transaction processing system that exploits advanced hardware features such as remote direct memory access (RDMA) and hardware transactional memory (HTM). To achieve high efficiency, it mostly offloads concurrency control such ...
Fast General Distributed Transactions with Opacity
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataTransactions can simplify distributed applications by hiding data distribution, concurrency, and failures from the application developer. Ideally the developer would see the abstraction of a single large machine that runs transactions sequentially and ...
Comments