skip to main content
research-article

Rethinking database high availability with RDMA networks

Published:01 July 2019Publication History
Skip Abstract Section

Abstract

Highly available database systems rely on data replication to tolerate machine failures. Both classes of existing replication algorithms, active-passive and active-active, were designed in a time when network was the dominant performance bottleneck. In essence, these techniques aim to minimize network communication between replicas at the cost of incurring more processing redundancy; a trade-off that suitably fitted the conventional wisdom of distributed database design. However, the emergence of next-generation networks with high throughput and low latency calls for revisiting these assumptions.

In this paper, we first make the case that in modern RDMA-enabled networks, the bottleneck has shifted to CPUs, and therefore the existing network-optimized replication techniques are no longer optimal. We present Active-Memory Replication, a new high availability scheme that efficiently leverages RDMA to completely eliminate the processing redundancy in replication. Using Active-Memory, all replicas dedicate their processing power to executing new transactions, as opposed to performing redundant computation. Active-Memory maintains high availability and correctness in the presence of failures through an efficient RDMA-based undo-logging scheme. Our evaluation against active-passive and active-active schemes shows that Active-Memory is up to a factor of 2 faster than the second-best protocol on RDMA-based networks.

References

  1. D. Barak. Tips and tricks to optimize your rdma code. https://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/, 2013. {Accessed: 2019-01-11}.Google ScholarGoogle Scholar
  2. P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys (CSUR), 13(2):185--221, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: it's time for a redesign. PVLDB, 9(7):528--539, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Cecchet, G. Candea, and A. Ailamaki. Middleware-based database replication: the gaps between theory and practice. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 739--752. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143--154. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. A. Cowling and B. Liskov. Granola: Low-overhead distributed transaction coordination. In USENIX Annual Technical Conference, volume 12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. In ACM SIGOPS operating systems review, volume 41, pages 205--220. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. Farm: fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, pages 401--114. USENIX Association, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Dragojević, D. Narayanan, E. B. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: distributed transactions with consistency, availability, and performance. In Proceedings of the 25th Symposium on Operating Systems Principles, pages 54--70. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. ACM SIGMOD Record, 25(2):173--182, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Hu, M. Ogleari, J. Zhao, S. Li, A. Basak, and Y. Xie. Persistence parallelism optimization: A holistic approach from memory bus to rdma network. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX annual technical conference, volume 8. Boston, MA, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. InfiniBand Trade Association. Infiniband roadmap. https://www.infinibandta.org/infiniband-roadmap/. {Accessed: 2019-05-02}.Google ScholarGoogle Scholar
  14. A. Kalia, M. Kaminsky, and D. G. Andersen. Using rdma efficiently for key-value services. ACM SIGCOMM Computer Communication Review, 44(4):295--306, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Kalia, M. Kaminsky, and D. G. Andersen. Fasst: Fast, scalable and simple distributed transactions with two-sided (rdma) datagram rpcs. In OSDI, volume 16, pages 185--201, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496--1499, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Kemme and G. Alonso. Database replication: a tale of research across communities. PVLDB, 3(1-2):5--12, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Kim, K. Salem, K. Daudjee, A. Aboulnaga, and X. Pan. Database high availability using shadow systems. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 209--221. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Lahiri, M.-A. Neimat, and S. Folkman. Oracle timesten: An in-memory database for enterprise applications. IEEE Data Eng. Bull., 36(2):6--13, 2013.Google ScholarGoogle Scholar
  20. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Li, Z. Ruan, W. Xiao, Y. Lu, Y. Xiong, A. Putnam, E. Chen, and L. Zhang. Kv-direct: High-performance in-memory key-value store with programmable nic. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 137--152. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Li, E. Michael, and D. R. Ports. Eris: Coordination-free consistent transactions using in-network concurrency control. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 104--120. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. MacArthur and R. D. Russell. A performance study to guide rdma programming decisions. In High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on, pages 778--785. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Mahmoud, F. Nawab, A. Pucher, D. Agrawal, and A. El Abbadi. Low-latency multi-datacenter databases using replicated commit. PVLDB, 6(9):661--672, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. U. F. Minhas, S. Rajagopalan, B. Cully, A. Aboulnaga, K. Salem, and A. Warfield. Remusdb: Transparent high availability for database systems. The VLDB Journal, 22(1):29--45, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Mistry and S. Misner. Introducing Microsoft SQL Server 2014. Microsoft Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Mitchell, Y. Geng, and J. Li. Using one-sided {RDMA} reads to build a fast, cpu-efficient key-value store. In Presented as part of the 2013 {USENIX} Annual Technical Conference ({USENIX}{ATC} 13), pages 103--114, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Mu, Y. Cui, Y. Zhang, W. Lloyd, and J. Li. Extracting more concurrency from distributed transactions. In 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14), pages 479--494, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Novakovic, Y. Shan, A. Kolli, M. Cui, Y. Zhang, H. Eran, L. Liss, M. Wei, D. Tsafrir, and M. Aguilera. Storm: a fast transactional dataplane for remote data structures. arXiv preprint arXiv:1902.02411, 2019.Google ScholarGoogle Scholar
  30. D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in ramcloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 29--41. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Ousterhout, A. Gopalan, A. Gupta, A. Kejriwal, C. Lee, B. Montazeri, D. Ongaro, S. J. Park, H. Qin, M. Rosenblum, et al. The ramcloud storage system. ACM Transactions on Computer Systems (TOCS), 33(3):7, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Poke and T. Hoefler. Dare: High-performance state machine replication on rdma networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pages 107--118. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Qin, A. D. Brown, and A. Goel. Scalable replay-based replication for fast databases. PVLDB, 10(13):2025--2036, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era:(it's time for a complete rewrite). In PVLDB, pages 1150--1160, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Stonebraker and A. Weisberg. The voltdb main memory dbms. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google ScholarGoogle Scholar
  36. Y. Taleb, R. Stutsman, G. Antoniu, and T. Cortes. Tailwind: fast and atomic rdma-based replication. In 2018 {USENIX} Annual Technical Conference ({USENIX} {ATC} 18), pages 851--863, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 1--12. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Wang, J. Jiang, X. Chen, N. Yi, and H. Cui. Apus: Fast and scalable paxos on rdma. In Proceedings of the 2017 Symposium on Cloud Computing, pages 94--107. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Wang, R. Johnson, and I. Pandis. Query fresh: Log shipping on steroids. PVLDB, 11(4):406--419, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, and G. Alonso. Understanding replication in databases and distributed systems. In Proceedings 20th IEEE International Conference on Distributed Computing Systems, pages 464--474. IEEE, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. E. Zamanian, C. Binnig, T. Harris, and T. Kraska. The end of a myth: Distributed transactions can scale. PVLDB, 10(6):685--696, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. E. Zamanian, J. Shun, C. Binnig, and T. Kraska. Chiller: Contention-centric transaction execution and data partitioning for fast networks. arXiv preprint arXiv:1811.12204, 2018.Google ScholarGoogle Scholar
  43. Y. Zhang, J. Yang, A. Memaripour, and S. Swanson. Mojim: A reliable and highly-available non-volatile memory system. In ACM SIGARCH Computer Architecture News, volume 43, pages 3--18. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rethinking database high availability with RDMA networks
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the VLDB Endowment
            Proceedings of the VLDB Endowment  Volume 12, Issue 11
            July 2019
            543 pages

            Publisher

            VLDB Endowment

            Publication History

            • Published: 1 July 2019
            Published in pvldb Volume 12, Issue 11

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader