skip to main content
10.1145/2934872.2934897acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware

Published:22 August 2016Publication History

ABSTRACT

Highly flexible software network functions (NFs) are crucial components to enable multi-tenancy in the clouds. However, software packet processing on a commodity server has limited capacity and induces high latency. While software NFs could scale out using more servers, doing so adds significant cost. This paper focuses on accelerating NFs with programmable hardware, i.e., FPGA, which is now a mature technology and inexpensive for datacenters. However, FPGA is predominately programmed using low-level hardware description languages (HDLs), which are hard to code and difficult to debug. More importantly, HDLs are almost inaccessible for most software programmers. This paper presents ClickNP, a FPGA-accelerated platform for highly flexible and high-performance NFs with commodity servers. ClickNP is highly flexible as it is completely programmable using high-level C-like languages, and exposes a modular programming abstraction that resembles Click Modular Router. ClickNP is also high performance. Our prototype NFs show that they can process traffic at up to 200 million packets per second with ultra-low latency ($< 2\mu$s). Compared to existing software counterparts, with FPGA, ClickNP improves throughput by 10x, while reducing latency by 10x. To the best of our knowledge, ClickNP is the first FPGA-accelerated platform for NFs, written completely in high-level language and achieving 40 Gbps line rate at any packet size.

Skip Supplemental Material Section

Supplemental Material

p1.mp4

mp4

213.8 MB

References

  1. 1.Altera SDK for OpenCL. http://www.altera.com/.Google ScholarGoogle Scholar
  2. 2.Cavium Networks OCTEON II processors. http://www.caviumnetworks.com.Google ScholarGoogle Scholar
  3. 3.Dell networking s6000 spec sheet.Google ScholarGoogle Scholar
  4. 4.Linux virtual server. http://www.linuxvirtualserver.org/.Google ScholarGoogle Scholar
  5. 5.Netronome Flow Processor NFP-6xxx. https://netronome.com/product/nfp-6xxx/.Google ScholarGoogle Scholar
  6. 6.SDAccel Development Environment. http://www.xilinx.com/.Google ScholarGoogle Scholar
  7. 7.Strongswan ipsec-based vpn. https://www.strongswan.org/.Google ScholarGoogle Scholar
  8. 8.The OpenCL Specifications ver 2.1. Khronos Group.Google ScholarGoogle Scholar
  9. 9.Vivado Design Suite. http://www.xilinx.com/.Google ScholarGoogle Scholar
  10. 10.Ethernet switch series, 2013. Broadcom Trident II.Google ScholarGoogle Scholar
  11. 11.Introducing EDR 100GB/s - Enabling the Use of Data, 2014. Mellanox White Paper.Google ScholarGoogle Scholar
  12. 12.M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pfabric: Minimal near-optimal datacenter transport. In Proc. ACM SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: a java-compatible and synthesizable language for heterogeneous architectures. In ACM SIGPLAN Notices, volume 45, pages 89–108. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović. Chisel: constructing hardware in a scala embedded language. In Proc. ACM Annual Design Automation Conf., 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.D. F. Bacon, R. Rabbah, and S. Shukla. Fpga programming for the masses. Communications of the ACM, 56(4):56–63, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.W. Bai, L. Chen, K. Chen, and H. Wu. Enabling ecn in multi-service multi-queue data centers. In Proc. USENIX NSDI, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.T. Barbette, C. Soldani, and L. Mathy. Fast userspace packet processing. In Proc. ANCS, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, EC-15(5):757–763, Oct 1966.Google ScholarGoogle ScholarCross RefCross Ref
  19. 19.B. Betkaoui, D. B. Thomas, and W. Luk. Comparing performance and energy efficiency of fpgas and gpus for high productivity computing. In 2010 International Conference on Field-Programmable Technology, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. 20.M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. Routebricks: Exploiting parallelism to scale software routers. In Proc. ACM SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.N. Egi, A. Greenhalgh, M. Handley, M. Hoerdt, F. Huici, and L. Mathy. Towards high performance virtual routers on commodity hardware. In Proc. ACM CoNEXT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.R. Gandhi, H. H. Liu, Y. C. Hu, G. Lu, J. Padhye, L. Yuan, and M. Zhang. Duet: Cloud scale load balancing with hardware and software. In Proc. ACM SIGCOMM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.A. Greenberg. Windows Azure: Scaling SDN in Public Cloud, 2014. OpenNet Submit.Google ScholarGoogle Scholar
  24. 24.A. Greenberg. SDN for the Cloud, 2015. Keynote at SIGCOMM 2015 (https://azure.microsoft.com/en-us/blog/microsoft-showcases-software-defined-networking-innovation-at-sigcomm-v2/).Google ScholarGoogle Scholar
  25. 25.A. Greenhalgh, F. Huici, M. Hoerdt, P. Papadimitriou, M. Handley, and L. Mathy. Flow processing and the rise of commodity network hardware. ACM SIGCOMM CCR, 39(2):20–26, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.S. Han, K. Jang, K. Park, and S. Moon. Packetshader: A gpu-accelerated software router. In Proc. ACM SIGCOMM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.W. Jiang. Scalable ternary content addressable memory implementation using fpgas. In Proc. ANCS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.S. Kestur, J. D. Davis, and O. Williams. Blas comparison on fpga, cpu and gpu. In IEEE Computer Society Symposium on VLSI, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The click modular router. ACM Transactions on Computer Systems (TOCS), 18(3):263–297, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.T. Koponen, K. Amidon, P. Balland, M. Casado, A. Chanda, B. Fulton, I. Ganichev, J. Gross, N. Gude, P. Ingram, et al. Network virtualization in multi-tenant datacenters. In Proc. USENIX NSDI, Berkeley, CA, USA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31.M. Lavasani, L. Dennison, and D. Chiou. Compiling high throughput network processors. In Proc. FPGA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.J. Lee, S. Lee, J. Lee, Y. Yi, and K. Park. Flosis: a highly scalable network flow capture system for fast retrieval and storage efficiency. In Proc. USENIX ATC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33.J. Martins, M. Ahmed, C. Raiciu, V. Olteanu, M. Honda, R. Bifulco, and F. Huici. Clickos and the art of network function virtualization. In Proc. USENIX NSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34.N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: enabling innovation in campus networks. ACM SIGCOMM CCR, 38(2):69–74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35.S.-W. Moon, J. Rexford, and K. G. Shin. Scalable hardware priority queue architectures for high-speed packet switches. IEEE Transactions on Computers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. 36.J. Naous, G. Gibb, S. Bolouki, and N. McKeown. Netfpga: Reusable router architecture for experimental research. In Proc. PRESTO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.R. S. Nikhil and Arvind. What is bluespec? ACM SIGDA Newsletter, 39(1):1–1, Jan. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. 38.R. Pagh and F. F. Rodler. Cuckoo hashing. Algorithms - ESA 2001. Lecture Notes in Computer Science 2161, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. 39.P. Patel, D. Bansal, L. Yuan, A. Murthy, A. Greenberg, D. A. Maltz, R. Kern, H. Kumar, M. Zikos, H. Wu, C. Kim, and N. Karri. Ananta: Cloud scale load balancing. In Proc. ACM SIGCOMM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. 40.A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In Proc. Intl. Symp. on Computer Architecture (ISCA), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. 41.T. Rinta-aho, M. Karlstedt, and M. P. Desai. The click2netfpga toolchain. In Proc. USENIX ATC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. 42.E. Rubow, R. McGeer, J. Mogul, and A. Vahdat. Chimpp: A click-based programming and simulation environment for reconfigurable networking hardware. In Proc. ANCS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. 43.V. Sekar, N. Egi, S. Ratnasamy, M. K. Reiter, and G. Shi. Design and implementation of a consolidated middlebox architecture. In Proc. USENIX NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. 44.J. Sherry, P. Gao, S. Basu, A. Panda, A. Krishnamurthy, C. Macciocco, M. Manesh, J. Martins, S. Ratnasamy, L. Rizzo, and S. Shenker. Rollback recovery for middleboxes. In Proc. ACM SIGCOMM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. 45.D. Singh. Implementing fpga design with the opencl standard. Altera whitepaper, 2011.Google ScholarGoogle Scholar
  46. 46.R. Wester. A transformation-based approach to hardware design using higher-order functions. 2015.Google ScholarGoogle Scholar

Index Terms

  1. ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGCOMM '16: Proceedings of the 2016 ACM SIGCOMM Conference
          August 2016
          645 pages
          ISBN:9781450341936
          DOI:10.1145/2934872

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 August 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGCOMM '16 Paper Acceptance Rate39of231submissions,17%Overall Acceptance Rate554of3,547submissions,16%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader