Abstract
Highly available database systems rely on data replication to tolerate machine failures. Both classes of existing replication algorithms, active-passive and active-active, were designed in a time when network was the dominant performance bottleneck. In essence, these techniques aim to minimize network communication between replicas at the cost of incurring more processing redundancy; a trade-off that suitably fitted the conventional wisdom of distributed database design. However, the emergence of next-generation networks with high throughput and low latency calls for revisiting these assumptions.
In this paper, we first make the case that in modern RDMA-enabled networks, the bottleneck has shifted to CPUs, and therefore the existing network-optimized replication techniques are no longer optimal. We present Active-Memory Replication, a new high availability scheme that efficiently leverages RDMA to completely eliminate the processing redundancy in replication. Using Active-Memory, all replicas dedicate their processing power to executing new transactions, as opposed to performing redundant computation. Active-Memory maintains high availability and correctness in the presence of failures through an efficient RDMA-based undo-logging scheme. Our evaluation against active-passive and active-active schemes shows that Active-Memory is up to a factor of 2 faster than the second-best protocol on RDMA-based networks.
- D. Barak. Tips and tricks to optimize your rdma code. https://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/, 2013. {Accessed: 2019-01-11}.Google Scholar
- P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys (CSUR), 13(2):185--221, 1981. Google ScholarDigital Library
- C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: it's time for a redesign. PVLDB, 9(7):528--539, 2016. Google ScholarDigital Library
- E. Cecchet, G. Candea, and A. Ailamaki. Middleware-based database replication: the gaps between theory and practice. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 739--752. ACM, 2008. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143--154. ACM, 2010. Google ScholarDigital Library
- J. A. Cowling and B. Liskov. Granola: Low-overhead distributed transaction coordination. In USENIX Annual Technical Conference, volume 12, 2012. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. In ACM SIGOPS operating systems review, volume 41, pages 205--220. ACM, 2007. Google ScholarDigital Library
- A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. Farm: fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, pages 401--114. USENIX Association, 2014. Google ScholarDigital Library
- A. Dragojević, D. Narayanan, E. B. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: distributed transactions with consistency, availability, and performance. In Proceedings of the 25th Symposium on Operating Systems Principles, pages 54--70. ACM, 2015. Google ScholarDigital Library
- J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. ACM SIGMOD Record, 25(2):173--182, 1996. Google ScholarDigital Library
- X. Hu, M. Ogleari, J. Zhao, S. Li, A. Basak, and Y. Xie. Persistence parallelism optimization: A holistic approach from memory bus to rdma network. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018. Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX annual technical conference, volume 8. Boston, MA, USA, 2010. Google ScholarDigital Library
- InfiniBand Trade Association. Infiniband roadmap. https://www.infinibandta.org/infiniband-roadmap/. {Accessed: 2019-05-02}.Google Scholar
- A. Kalia, M. Kaminsky, and D. G. Andersen. Using rdma efficiently for key-value services. ACM SIGCOMM Computer Communication Review, 44(4):295--306, 2015. Google ScholarDigital Library
- A. Kalia, M. Kaminsky, and D. G. Andersen. Fasst: Fast, scalable and simple distributed transactions with two-sided (rdma) datagram rpcs. In OSDI, volume 16, pages 185--201, 2016. Google ScholarDigital Library
- R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496--1499, 2008. Google ScholarDigital Library
- B. Kemme and G. Alonso. Database replication: a tale of research across communities. PVLDB, 3(1-2):5--12, 2010. Google ScholarDigital Library
- J. Kim, K. Salem, K. Daudjee, A. Aboulnaga, and X. Pan. Database high availability using shadow systems. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 209--221. ACM, 2015. Google ScholarDigital Library
- T. Lahiri, M.-A. Neimat, and S. Folkman. Oracle timesten: An in-memory database for enterprise applications. IEEE Data Eng. Bull., 36(2):6--13, 2013.Google Scholar
- A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010. Google ScholarDigital Library
- B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. Kv-direct: High-performance in-memory key-value store with programmable nic. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 137--152. ACM, 2017. Google ScholarDigital Library
- J. Li, E. Michael, and D. R. Ports. Eris: Coordination-free consistent transactions using in-network concurrency control. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 104--120. ACM, 2017. Google ScholarDigital Library
- P. MacArthur and R. D. Russell. A performance study to guide rdma programming decisions. In High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, pages 778--785. IEEE, 2012. Google ScholarDigital Library
- H. Mahmoud, F. Nawab, A. Pucher, D. Agrawal, and A. El Abbadi. Low-latency multi-datacenter databases using replicated commit. PVLDB, 6(9):661--672, 2013. Google ScholarDigital Library
- U. F. Minhas, S. Rajagopalan, B. Cully, A. Aboulnaga, K. Salem, and A. Warfield. Remusdb: Transparent high availability for database systems. The VLDB Journal, 22(1):29--45, Feb. 2013. Google ScholarDigital Library
- R. Mistry and S. Misner. Introducing Microsoft SQL Server 2014. Microsoft Press, 2014. Google ScholarDigital Library
- C. Mitchell, Y. Geng, and J. Li. Using one-sided {RDMA} reads to build a fast, cpu-efficient key-value store. In Presented as part of the 2013 {USENIX} Annual Technical Conference ({USENIX}{ATC} 13), pages 103--114, 2013. Google ScholarDigital Library
- S. Mu, Y. Cui, Y. Zhang, W. Lloyd, and J. Li. Extracting more concurrency from distributed transactions. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14), pages 479--494, 2014. Google ScholarDigital Library
- S. Novakovic, Y. Shan, A. Kolli, M. Cui, Y. Zhang, H. Eran, L. Liss, M. Wei, D. Tsafrir, and M. Aguilera. Storm: a fast transactional dataplane for remote data structures. arXiv preprint arXiv:1902.02411, 2019.Google Scholar
- D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in ramcloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 29--41. ACM, 2011. Google ScholarDigital Library
- J. Ousterhout, A. Gopalan, A. Gupta, A. Kejriwal, C. Lee, B. Montazeri, D. Ongaro, S. J. Park, H. Qin, M. Rosenblum, et al. The ramcloud storage system. ACM Transactions on Computer Systems (TOCS), 33(3):7, 2015. Google ScholarDigital Library
- M. Poke and T. Hoefler. Dare: High-performance state machine replication on rdma networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pages 107--118. ACM, 2015. Google ScholarDigital Library
- D. Qin, A. D. Brown, and A. Goel. Scalable replay-based replication for fast databases. PVLDB, 10(13):2025--2036, 2017. Google ScholarDigital Library
- M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era:(it's time for a complete rewrite). In PVLDB, pages 1150--1160, 2007. Google ScholarDigital Library
- M. Stonebraker and A. Weisberg. The voltdb main memory dbms. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google Scholar
- Y. Taleb, R. Stutsman, G. Antoniu, and T. Cortes. Tailwind: fast and atomic rdma-based replication. In 2018 {USENIX} Annual Technical Conference ({USENIX} {ATC} 18), pages 851--863, 2018. Google ScholarDigital Library
- A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 1--12. ACM, 2012. Google ScholarDigital Library
- C. Wang, J. Jiang, X. Chen, N. Yi, and H. Cui. Apus: Fast and scalable paxos on rdma. In Proceedings of the 2017 Symposium on Cloud Computing, pages 94--107. ACM, 2017. Google ScholarDigital Library
- T. Wang, R. Johnson, and I. Pandis. Query fresh: Log shipping on steroids. PVLDB, 11(4):406--419, 2017. Google ScholarDigital Library
- M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso. Understanding replication in databases and distributed systems. In Proceedings 20th IEEE International Conference on Distributed Computing Systems, pages 464--474. IEEE, 2000. Google ScholarDigital Library
- E. Zamanian, C. Binnig, T. Harris, and T. Kraska. The end of a myth: Distributed transactions can scale. PVLDB, 10(6):685--696, 2017. Google ScholarDigital Library
- E. Zamanian, J. Shun, C. Binnig, and T. Kraska. Chiller: Contention-centric transaction execution and data partitioning for fast networks. arXiv preprint arXiv:1811.12204, 2018.Google Scholar
- Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A reliable and highly-available non-volatile memory system. In ACM SIGARCH Computer Architecture News, volume 43, pages 3--18. ACM, 2015. Google ScholarDigital Library
Index Terms
- Rethinking database high availability with RDMA networks
Recommendations
Database high availability using SHADOW systems
SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud ComputingHot standby techniques are widely used to implement highly available database systems. These techniques make use of two separate copies of the database, an active copy and a backup that is managed by the standby. The two database copies are stored ...
High Availability and Performance of Database in the Cloud
CLOSER 2017: Proceedings of the 7th International Conference on Cloud Computing and Services ScienceHigh availability (HA) of database is critical for the high availability of cloud-based applications and services. Master-slave replication has been traditionally used since long time as a solution for this. Since master-slave replication uses either ...
Comments