skip to main content
10.1145/3281411.3281441acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article
Public Access
Artifacts Available

REINFORCE: achieving efficient failure resiliency for network function virtualization based services

Published:04 December 2018Publication History

ABSTRACT

Ensuring high availability (HA) for software-based networks is a critical design feature that will help the adoption of software-based network functions (NFs) in production networks. It is important for NFs to avoid outages and maintain mission-critical operations. However, HA support for NFs on the critical data path can result in unacceptable performance degradation. We present REINFORCE, an integrated framework to support efficient resiliency for NFs and NF service chains. REINFORCE includes timely failure detection and consistent failover mechanisms. REINFORCE replicates state to standby NFs (local and remote) while enforcing correctness. It minimizes the number of state transfers by exploiting the concept of external synchrony, and leverages opportunistic batching and multi-buffering to optimize performance. Experimental results show that, even at line-rate packet processing (10 Gbps), REINFORCE achieves chain-level failover across servers in a LAN (or within the same node) within 10ms (100/μs), incurring less than 10% (1%) performance overhead, and adds average latency of only ~400/μs (5/μs), with a worst-case latency of less than 1ms (10/μs).

Skip Supplemental Material Section

Supplemental Material

p41-kulkarni.mp4

mp4

288.8 MB

References

  1. Data plane development kit. http://dpdk.org/, 2014. {online}.Google ScholarGoogle Scholar
  2. Criu: Checkpoint restore in userspace. http://criu.org/, 2017. {online}.Google ScholarGoogle Scholar
  3. ndpi test pcap traces. https://github.com/ntop/nDPI/tree/dev/tests/pcap, 2018. {online}.Google ScholarGoogle Scholar
  4. wrk: a http benchmarking tool. https://github.com/wg/wrk, 2018. {online}.Google ScholarGoogle Scholar
  5. Alpernas, K., Manevich, R., Panda, A., Sagiv, M., Shenker, S., Shoham, S., and Velner, Y. Abstract interpretation of stateful networks. In International Static Analysis Symposium (2018), Springer, pp. 86--106.Google ScholarGoogle ScholarCross RefCross Ref
  6. Bench, A. ab-apache http server benchmarking tool.Google ScholarGoogle Scholar
  7. Cachin, C., Schubert, S., and Vukolić, M. Non-determinism in byzantine fault-tolerant replication. arXiv preprint arXiv:1603.07351 (2016).Google ScholarGoogle Scholar
  8. Deri, L., Martinelli, M., Bujlow, T., and Cardigliano, A. nDPI: Open-source high-speed deep packet inspection. In 2014 International Wireless Communications and Mobile Computing Conference (IWCMC) (Aug. 2014), pp. 617--622.Google ScholarGoogle ScholarCross RefCross Ref
  9. Dragojević, A., Narayanan, D., Castro, M., and Hodson, O. Farm: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14) (Seattle, WA, 2014), USENIX Association, pp. 401--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Emmerich, P., Gallenmüller, S., Raumer, D., Wohlfart, F., and Carle, G. Moongen: a scriptable high-speed packet generator. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference (2015), ACM, pp. 275--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. ETSI-GS-NFV-002. Network Functions Virtualization (NFV): Architectural Framework. http://www.etsi.org/deliver/etsi_gs/nfv/001_099/002/01.01.01_60/gs_nfv002v010101p.pdf, 2013. {online}.Google ScholarGoogle Scholar
  12. ETSI-GS-NFV-REL-001. Network Functions Virtualization (NFV): Resiliency Requirements. http://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_NFV-REL001v010101p.pdf, 2015. {online}.Google ScholarGoogle Scholar
  13. Gallenmüller, S., Emmerich, p., Wohlfart, F., Raumer, D., and Carle, G. Comparison of frameworks for high-performance packet io. In Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for networking and communications systems (2015), IEEE Computer Society, pp. 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gember, A., Krishnamurthy, A., John, S. S., Grandl, R., Gao, X., Anand, A., Benson, T., Akella, A., and Sekar, V. Stratos: A network-aware orchestration layer for middleboxes in the cloud. CoRR abs/1305.0209 (2013).Google ScholarGoogle Scholar
  15. Gember-Jacobson, a., Viswanathan, R., Prakash, C., Grandl, R., Khalid, J., Das, S., and Akella, A. Opennf: Enabling innovation in network function control. SIGCOMM Comput. Commun. Rev. 44, 4 (Aug. 2014), 163--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gill, P., Jain, N., and Nagappan, N. Understanding network failures in data centers: Measurement, analysis, and implications. SIGCOMM Comput. Commun. Rev. 41, 4 (Aug. 2011), 350--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gunawi, H. S., Hao, M., Leesatapornwongsa, T., Patana-anake, T., Do, T., Adityatama, J., Eliazar, K. J., Laksono, A., Lukman, J. F., Martin, V., and Satria, A. D. What bugs live in the cloud? a study of 3000+ issues in cloud systems. In Proceedings of the ACM Symposium on Cloud Computing (New York, NY, USA, 2014), SOCC '14, ACM, pp. 7:1--7:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gunawi, H. S., Hao, M., Suminto, R. O., Laksono, a., Satria, A. D., Adityatama, J., and Eliazar, K. J. Why does the cloud stop computing?: Lessons from hundreds of service outages. In Proceedings of the Seventh ACM Symposium on Cloud Computing (New York, NY, USA, 2016), SoCC '16, ACM, pp. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jackson, E. J., Walls, M., Panda, A., Pettit, J., Pfaff, B., Rajahalme, J., Koponen, T., and Shenker, S. Softflow: A middlebox architecture for open vswitch. In USENIX Annual Technical Conference (2016), pp. 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kablan, M., Alsudais, A., Keller, E., and Le, F. Stateless network functions: Breaking the tight coupling of state and processing. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) (Boston, MA, 2017), USENIX Association, pp. 97--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Katz, D., and Ward, D. Bidirectional Forwarding Detection (BFD). RFC 5880, June 2010.Google ScholarGoogle Scholar
  22. Katz, D., and Ward, D. Bidirectional Forwarding Detection (BFD) for IPv4 and IPv6 (Single Hop). RFC 5881, June 2010.Google ScholarGoogle Scholar
  23. Katz, D., and Ward, D. Generic Application of Bidirectional Forwarding Detection (BFD). RFC 5882, June 2010.Google ScholarGoogle Scholar
  24. Khalid, J., and Akella, A. Streamnf: Performance and correctness for stateful chained nfs. CoRR abs/1612.01497 (2016).Google ScholarGoogle Scholar
  25. Khalid, J., Gember-Jacobson, A., Michael, R., Abhashkumar, A., and Akella, A. Paving the way for NFV: Simplifying middlebox modifications using statealyzr. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16) (Santa Clara, CA, 2016), USENIX Association, pp. 239--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kohler, E., Morris, R., Chen, B., Jannotti, J., and Kaashoek, M. F. The click modular router. ACM Trans. Comput. Syst. 18, 3 (Aug. 2000), 263--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Madhavapeddy, A., Mortier, R., Rotsos, C., Scott, D., Singh, B., Gazagnaire, T., Smith, S., Hand, S., and Crowcroft, J. Unikernels: Library operating systems for the cloud. SIGPLAN Not. 48, 4 (Mar. 2013), 461--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Martins, J., Ahmed, M., Raiciu, C., Olteanu, V., Honda, M., Bifulco, R., and Huici, F. Clickos and the art of network function virtualization. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (Berkeley, CA, USA, 2014), NSDI'14, USENIX Association, pp. 459--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nightingale, E. B., Veeraraghavan, K., Chen, P. M., and Flinn, J. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (Berkeley, CA, USA, 2006), OSDI '06, USENIX Association, pp. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nightingale, E. B., Veeraraghavan, K., Chen, P. M., and Flinn, J. Rethink the sync. ACM Trans. Comput. Syst. 26, 3 (Sept. 2008), 6:1--6:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ongaro, D., Rumble, S. M., Stutsman, R., Ousterhout, J., and Rosenblum, M. Fast crash recovery in ramcloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (New York, NY, USA, 2011), SOSP '11, ACM, pp. 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Palkar, S., Lan, C., Han, S., Jang, K., Panda, A., Ratnasamy, S., Rizzo, L., and Shenker, S. E2: A framework for nfv applications. In Proceedings of the 25th Symposium on Operating Systems Principles (New York, NY, USA, 2015), SOSP '15, ACM, pp. 121--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Panda, A., Lahav, O., Argyraki, K., Sagiv, M., and Shenker, S. Verifying isolation properties in the presence of middleboxes. arXiv preprint arXiv:1409.7687 (2014).Google ScholarGoogle Scholar
  34. Pignataro, C., Ward, D., Akiya, N., Bhatia, M., and Networks, J. Seamless Bidirectional Forwarding Detection (S-BFD). RFC 7880, July 2016.Google ScholarGoogle Scholar
  35. Potharaju, R., and Jain, N. Demystifying the dark side of the middle: A field study of middlebox failures in datacenters. In Proceedings of the 2013 Conference on Internet Measurement Conference (New York, NY, USA, 2013), IMC '13, ACM, pp. 9--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Quinn, P., and Nadeau, T. Problem Statement for Service Function Chaining. RFC 7498, Apr. 2015.Google ScholarGoogle Scholar
  37. Rajagopalan, S., Williams, D., and Jamjoom, H. Pico replication: A high availability framework for middleboxes. In Proceedings of the 4th Annual Symposium on Cloud Computing (New York, NY, USA, 2013), SOCC '13, ACM, pp. 1:1--1:15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rajagopalan, S., Williams, D., Jamjoom, H., and Warfield, A. Split/merge: System support for elastic execution in virtual middleboxes. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (Lombard, IL, 2013), USENIX, pp. 227--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sahoo, S. K., Criswell, J., and Adve, V. An empirical study of reported bugs in server software with implications for automated bug diagnosis. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (New York, NY, USA, 2010), ICSE '10, ACM, pp. 485--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sherry, J., Gao, P. X., Basu, S., Panda, a., Krishnamurthy, A., Maciocco, C., Manesh, M., Martins, J. a., Ratnasamy, S., Rizzo, L., and Shenker, S. Rollback-recovery for middleboxes. SIGCOMM Comput. Commun. Rev. 45, 4 (Aug. 2015), 227--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Velner, Y., Alpernas, K., Panda, a., Rabinovich, a., Sagiv, M., Shenker, S., and Shoham, S. Some complexity results for stateful network verification. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems (2016), Springer, pp. 811--830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wang, C., Chen, X., Jia, W., Li, B., Qiu, H., Zhao, S., and Cui, H. PLOVER: Fast, multi-core scalable virtual machine fault-tolerance. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) (Renton, WA, 2018), USENIX Association, pp. 483--489.Google ScholarGoogle Scholar
  43. Woo, S., Sherry, J., Han, S., Moon, S., Ratnasamy, S., and Shenker, S. Elastic scaling of stateful network functions. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) (Renton, WA, 2018), USENIX Association, pp. 299--312.Google ScholarGoogle Scholar
  44. Zhang, W., Liu, G., Zhang, W., Shah, N., Lopreiato, P., Todeschi, G., Ramakrishnan, K., and Wood, T. Opennetvm: A platform for high performance network service chains. In Proceedings of the 2016 Workshop on Hot Topics in Middleboxes and Network Function Virtualization (New York, NY, USA, 2016), HotMIddlebox '16, ACM, pp. 26--31. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. REINFORCE: achieving efficient failure resiliency for network function virtualization based services

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CoNEXT '18: Proceedings of the 14th International Conference on emerging Networking EXperiments and Technologies
            December 2018
            408 pages
            ISBN:9781450360807
            DOI:10.1145/3281411

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 December 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate198of789submissions,25%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader