ABSTRACT
Highly flexible software network functions (NFs) are crucial components to enable multi-tenancy in the clouds. However, software packet processing on a commodity server has limited capacity and induces high latency. While software NFs could scale out using more servers, doing so adds significant cost. This paper focuses on accelerating NFs with programmable hardware, i.e., FPGA, which is now a mature technology and inexpensive for datacenters. However, FPGA is predominately programmed using low-level hardware description languages (HDLs), which are hard to code and difficult to debug. More importantly, HDLs are almost inaccessible for most software programmers. This paper presents ClickNP, a FPGA-accelerated platform for highly flexible and high-performance NFs with commodity servers. ClickNP is highly flexible as it is completely programmable using high-level C-like languages, and exposes a modular programming abstraction that resembles Click Modular Router. ClickNP is also high performance. Our prototype NFs show that they can process traffic at up to 200 million packets per second with ultra-low latency ($< 2\mu$s). Compared to existing software counterparts, with FPGA, ClickNP improves throughput by 10x, while reducing latency by 10x. To the best of our knowledge, ClickNP is the first FPGA-accelerated platform for NFs, written completely in high-level language and achieving 40 Gbps line rate at any packet size.
Supplemental Material
- 1.Altera SDK for OpenCL. http://www.altera.com/.Google Scholar
- 2.Cavium Networks OCTEON II processors. http://www.caviumnetworks.com.Google Scholar
- 3.Dell networking s6000 spec sheet.Google Scholar
- 4.Linux virtual server. http://www.linuxvirtualserver.org/.Google Scholar
- 5.Netronome Flow Processor NFP-6xxx. https://netronome.com/product/nfp-6xxx/.Google Scholar
- 6.SDAccel Development Environment. http://www.xilinx.com/.Google Scholar
- 7.Strongswan ipsec-based vpn. https://www.strongswan.org/.Google Scholar
- 8.The OpenCL Specifications ver 2.1. Khronos Group.Google Scholar
- 9.Vivado Design Suite. http://www.xilinx.com/.Google Scholar
- 10.Ethernet switch series, 2013. Broadcom Trident II.Google Scholar
- 11.Introducing EDR 100GB/s - Enabling the Use of Data, 2014. Mellanox White Paper.Google Scholar
- 12.M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pfabric: Minimal near-optimal datacenter transport. In Proc. ACM SIGCOMM, 2013. Google ScholarDigital Library
- 13.J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: a java-compatible and synthesizable language for heterogeneous architectures. In ACM SIGPLAN Notices, volume 45, pages 89–108. ACM, 2010. Google ScholarDigital Library
- 14.J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović. Chisel: constructing hardware in a scala embedded language. In Proc. ACM Annual Design Automation Conf., 2012. Google ScholarDigital Library
- 15.D. F. Bacon, R. Rabbah, and S. Shukla. Fpga programming for the masses. Communications of the ACM, 56(4):56–63, 2013. Google ScholarDigital Library
- 16.W. Bai, L. Chen, K. Chen, and H. Wu. Enabling ecn in multi-service multi-queue data centers. In Proc. USENIX NSDI, 2016. Google ScholarDigital Library
- 17.T. Barbette, C. Soldani, and L. Mathy. Fast userspace packet processing. In Proc. ANCS, 2015. Google ScholarDigital Library
- 18.A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, EC-15(5):757–763, Oct 1966.Google ScholarCross Ref
- 19.B. Betkaoui, D. B. Thomas, and W. Luk. Comparing performance and energy efficiency of fpgas and gpus for high productivity computing. In 2010 International Conference on Field-Programmable Technology, 2010.Google ScholarCross Ref
- 20.M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. Routebricks: Exploiting parallelism to scale software routers. In Proc. ACM SOSP, 2009. Google ScholarDigital Library
- 21.N. Egi, A. Greenhalgh, M. Handley, M. Hoerdt, F. Huici, and L. Mathy. Towards high performance virtual routers on commodity hardware. In Proc. ACM CoNEXT, 2008. Google ScholarDigital Library
- 22.R. Gandhi, H. H. Liu, Y. C. Hu, G. Lu, J. Padhye, L. Yuan, and M. Zhang. Duet: Cloud scale load balancing with hardware and software. In Proc. ACM SIGCOMM, 2014. Google ScholarDigital Library
- 23.A. Greenberg. Windows Azure: Scaling SDN in Public Cloud, 2014. OpenNet Submit.Google Scholar
- 24.A. Greenberg. SDN for the Cloud, 2015. Keynote at SIGCOMM 2015 (https://azure.microsoft.com/en-us/blog/microsoft-showcases-software-defined-networking-innovation-at-sigcomm-v2/).Google Scholar
- 25.A. Greenhalgh, F. Huici, M. Hoerdt, P. Papadimitriou, M. Handley, and L. Mathy. Flow processing and the rise of commodity network hardware. ACM SIGCOMM CCR, 39(2):20–26, Mar. 2009. Google ScholarDigital Library
- 26.S. Han, K. Jang, K. Park, and S. Moon. Packetshader: A gpu-accelerated software router. In Proc. ACM SIGCOMM, 2010. Google ScholarDigital Library
- 27.W. Jiang. Scalable ternary content addressable memory implementation using fpgas. In Proc. ANCS, 2013. Google ScholarDigital Library
- 28.S. Kestur, J. D. Davis, and O. Williams. Blas comparison on fpga, cpu and gpu. In IEEE Computer Society Symposium on VLSI, July 2010. Google ScholarDigital Library
- 29.E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The click modular router. ACM Transactions on Computer Systems (TOCS), 18(3):263–297, 2000. Google ScholarDigital Library
- 30.T. Koponen, K. Amidon, P. Balland, M. Casado, A. Chanda, B. Fulton, I. Ganichev, J. Gross, N. Gude, P. Ingram, et al. Network virtualization in multi-tenant datacenters. In Proc. USENIX NSDI, Berkeley, CA, USA, 2014. Google ScholarDigital Library
- 31.M. Lavasani, L. Dennison, and D. Chiou. Compiling high throughput network processors. In Proc. FPGA, 2012. Google ScholarDigital Library
- 32.J. Lee, S. Lee, J. Lee, Y. Yi, and K. Park. Flosis: a highly scalable network flow capture system for fast retrieval and storage efficiency. In Proc. USENIX ATC, 2015. Google ScholarDigital Library
- 33.J. Martins, M. Ahmed, C. Raiciu, V. Olteanu, M. Honda, R. Bifulco, and F. Huici. Clickos and the art of network function virtualization. In Proc. USENIX NSDI, 2014. Google ScholarDigital Library
- 34.N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: enabling innovation in campus networks. ACM SIGCOMM CCR, 38(2):69–74, 2008. Google ScholarDigital Library
- 35.S.-W. Moon, J. Rexford, and K. G. Shin. Scalable hardware priority queue architectures for high-speed packet switches. IEEE Transactions on Computers, 2000. Google ScholarDigital Library
- 36.J. Naous, G. Gibb, S. Bolouki, and N. McKeown. Netfpga: Reusable router architecture for experimental research. In Proc. PRESTO, 2008. Google ScholarDigital Library
- 37.R. S. Nikhil and Arvind. What is bluespec? ACM SIGDA Newsletter, 39(1):1–1, Jan. 2009. Google ScholarDigital Library
- 38.R. Pagh and F. F. Rodler. Cuckoo hashing. Algorithms - ESA 2001. Lecture Notes in Computer Science 2161, 2001. Google ScholarDigital Library
- 39.P. Patel, D. Bansal, L. Yuan, A. Murthy, A. Greenberg, D. A. Maltz, R. Kern, H. Kumar, M. Zikos, H. Wu, C. Kim, and N. Karri. Ananta: Cloud scale load balancing. In Proc. ACM SIGCOMM, 2013. Google ScholarDigital Library
- 40.A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In Proc. Intl. Symp. on Computer Architecture (ISCA), 2014. Google ScholarDigital Library
- 41.T. Rinta-aho, M. Karlstedt, and M. P. Desai. The click2netfpga toolchain. In Proc. USENIX ATC, 2012. Google ScholarDigital Library
- 42.E. Rubow, R. McGeer, J. Mogul, and A. Vahdat. Chimpp: A click-based programming and simulation environment for reconfigurable networking hardware. In Proc. ANCS, 2010. Google ScholarDigital Library
- 43.V. Sekar, N. Egi, S. Ratnasamy, M. K. Reiter, and G. Shi. Design and implementation of a consolidated middlebox architecture. In Proc. USENIX NSDI, 2012. Google ScholarDigital Library
- 44.J. Sherry, P. Gao, S. Basu, A. Panda, A. Krishnamurthy, C. Macciocco, M. Manesh, J. Martins, S. Ratnasamy, L. Rizzo, and S. Shenker. Rollback recovery for middleboxes. In Proc. ACM SIGCOMM, 2015. Google ScholarDigital Library
- 45.D. Singh. Implementing fpga design with the opencl standard. Altera whitepaper, 2011.Google Scholar
- 46.R. Wester. A transformation-based approach to hardware design using higher-order functions. 2015.Google Scholar
Index Terms
- ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware
Recommendations
Acceleration of Image Processing Algorithms Using Minimal Resources of Custom Reconfigurable Hardware
PCI '12: Proceedings of the 2012 16th Panhellenic Conference on InformaticsThe hardware/software implementation of a custom vision board using minimal resources out of a reconfigurable platform is described. Demanding robotic vision applications in most cases require dedicated hardware for reliable operation. The designed ...
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture
Field-programmable gate arrays (FPGAs) have become an extremely popular implementation technology for custom hardware because they offer a combination of low cost and very fast turnaround. Because of their in-system reconfigurability, FPGAs have also ...
Self-Reconfigurable Embedded Systems on Low-Cost FPGAs
Hardware acceleration significantly increases the performance of embedded systems built on programmable logic. Allowing a FPGA-based MicroBlaze processor to self-select the coprocessors it uses can help reduce area requirements and increase a system's ...
Comments