skip to main content
10.1145/1966445.1966457acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

ZZ and the art of practical BFT execution

Authors Info & Claims
Published:10 April 2011Publication History

ABSTRACT

The high replication cost of Byzantine fault-tolerance (BFT) methods has been a major barrier to their widespread adoption in commercial distributed applications. We present ZZ, a new approach that reduces the replication cost of BFT services from 2f+1 to practically f+1. The key insight in ZZ is to use f+1 execution replicas in the normal case and to activate additional replicas only upon failures. In data centers where multiple applications share a physical server, ZZ reduces the aggregate number of execution replicas running in the data center, improving throughput and response times. ZZ relies on virtualization---a technology already employed in modern data centers---for fast replica activation upon failures, and enables newly activated replicas to immediately begin processing requests by fetching state on-demand. A prototype implementation of ZZ using the BASE library and Xen shows that, when compared to a system with 2f+1 replicas, our approach yields lower response times and up to 33% higher throughput in a prototype data center with four BFT web applications. We also show that ZZ can handle simultaneous failures and achieve sub-second recovery.

References

  1. Michael Abd-El-Malek, Gregory R. Ganger, Garth R. Goodson, Michael K. Reiter, and Jay J. Wylie. Faultscalable Byzantine Fault-Tolerant Services. SIGOPS Oper. Syst. Rev., 39(5):59--74, 2005. ISSN 0163-5980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ronnie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer. FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment. In Proc. of the 5th Symposium on Operating Systems Design and Implementation (OSDI), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Castro and B. Liskov. Practical Byzantine Fault Tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, February 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Miguel Castro and Barbara Liskov. Practical Byzantine Fault Tolerance and Proactive Recovery. ACM Transactions on Computer Systems (TOCS), 20(4), November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Clement, M. Marchetti, E.Wong, L. Alvisi, and M. Dahlin. Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults. In 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI), April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. James Cowling, Daniel Myers, Barbara Liskov, Rodrigo Rodrigues, and Liuba Shrira. HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance. In Proceedings of the Seventh Symposium on Operating Systems Design and Implementations (OSDI), Seattle, Washington, November 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew Warfield. Remus: High Availability via Asynchronous Virtual Machine Replication. In NSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tobias Distler and Rüdiger Kapitza. Increasing Performance in Byzantine Fault-Tolerant Systems with On-Demand Replica Consistency. In European Chapter of ACM SIGOPS, editor, Proceedings of the EuroSys 2011 Conference (EuroSys '11), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tobias Distler, Rüdiger Kapitza, Ivan Popov, Hans P. Reiser, and Wolfgang Schröder-Preikschat. SPARE: Replicas on Hold. In Internet Society (ISOC), editor, Proceedings of the 18th Network and Distributed System Security Symposium (NDSS '11), 2011.Google ScholarGoogle Scholar
  10. Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the Presence of Partial Synchrony. Journal of the ACM, 35(2), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of Distributed Consensus with One Faulty Process. J. ACM, 32(2):374--382, 1985. ISSN 0004-5411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tal Garfinkel, Ben Pfaff, Jim Chow, Mendel Rosenblum, and Dan Boneh. Terra: a Virtual Machine-based Platform for Trusted Computing. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 193--206, New York, NY, USA, 2003. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rachid Guerraoui, Nikola Knežević, Vivien Quéma, and Marko Vukolić. The Next 700 BFT Protocols. In EuroSys '10: Proceedings of the 5th European conference on Computer systems, pages 363--376, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-577-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kim Potter Kihlstrom, L. E. Moser, and P. M. Melliar-Smith. The SecureRing Protocols for Securing Group Communication. In HICSS '98: Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 3, Washington, DC, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. Zyzzyva: Speculative Byzantine Fault Tolerance. In SOSP '07: Proceedings of twentyfirst ACM SIGOPS Symposium on Operating Systems Principles, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Lamport. Part Time Parliament. ACM Transactions on Computer Systems, 16(2), May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Lamport, R. Shostack, and M. Pease. The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems, 4(3):382--401, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Leslie Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM, 21(7), 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Leslie Lamport and Mike Massa. Cheap Paxos. In DSN '04: Proceedings of the 2004 International Conference on Depen dable Systems and Networks, page 307,Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2052-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Brian M. Oki and Barbara H. Liskov. Viewstamped Replication: a General Primary Copy. In PODC '88: Proceedings of the seventh annual ACM Symposium on Principles of distributed computing, New York, NY, USA, 1988. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Michael K. Reiter. The Rampart Toolkit for Building High-integrity Services. In Selected Papers from the International Workshop on Theory and Practice in Distributed Systems, London, UK, 1995. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rodrigo Rodrigues, Miguel Castro, and Barbara Liskov. BASE: Using Abstraction to Improve Fault Tolerance. In Proceedings of the eighteenth ACM symposium on Operating systems principles, New York, NY, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Atul Singh, Tathagata Das, Petros Maniatis, Peter Druschel, and Timothy Roscoe. BFT Protocols Under Fire. In NSDI '08: Proceedings of the Usenix Symposium on Networked System Design and Implementation, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Paulo Sousa, Alysson N. Bessani, Miguel Correia, Nuno F. Neves, and Paulo Verissimo. Resilient Intrusion Tolerance Through Proactive and Reactive Recovery. In Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing, Washington, DC, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ben Vandiver, Hari Balakrishnan, Barbara Liskov, and Sam Madden. Tolerating Byzantine Faults in Database Systems Using Commit Barrier Scheduling. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP), Stevenson, Washington, USA, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Timothy Wood, Rahul Singh, Arun Venkataramani, Prashant Shenoy, and Emmanuel Cecchet. ZZ and the Art of Practical BFT. Technical report, University of Massachusetts Amherst, Feb. 2011.Google ScholarGoogle Scholar
  27. J. Yin, J.P. Martin, A. Venkataramani, L. Alvisi, and M. Dahlin. Separating Agreement from Execution for Byzantine Fault Tolerant Services. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, October 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. ZFS. The Last Word in File Systems. http://www.sun.com/2004-0914/feature/, 2004.Google ScholarGoogle Scholar

Index Terms

  1. ZZ and the art of practical BFT execution

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      EuroSys '11: Proceedings of the sixth conference on Computer systems
      April 2011
      370 pages
      ISBN:9781450306348
      DOI:10.1145/1966445

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 April 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      EuroSys '11 Paper Acceptance Rate24of161submissions,15%Overall Acceptance Rate241of1,308submissions,18%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader