research-article

Free Access

ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware

Authors:
Bojie Li

USTC and Microsoft Research

USTC and Microsoft Research
View Profile

,
Kun Tan

Microsoft Research

Microsoft Research
View Profile

,
Layong (Larry) Luo

Microsoft

Microsoft
View Profile

,
Yanqing Peng

SJTU and Microsoft Research

SJTU and Microsoft Research
View Profile

,
Renqian Luo

USTC and Microsoft Research

USTC and Microsoft Research
View Profile

,
Ningyi Xu

Microsoft Research

Microsoft Research
View Profile

,
Yongqiang Xiong

Microsoft Research

Microsoft Research
View Profile

,
Peng Cheng

Microsoft Research

Microsoft Research
View Profile

,
Enhong Chen

USTC

USTC
View Profile

Authors Info & Claims

SIGCOMM '16: Proceedings of the 2016 ACM SIGCOMM ConferenceAugust 2016Pages 1–14https://doi.org/10.1145/2934872.2934897

Published:22 August 2016Publication History

SIGCOMM '16: Proceedings of the 2016 ACM SIGCOMM Conference

Pages 1–14

ABSTRACT

Highly flexible software network functions (NFs) are crucial components to enable multi-tenancy in the clouds. However, software packet processing on a commodity server has limited capacity and induces high latency. While software NFs could scale out using more servers, doing so adds significant cost. This paper focuses on accelerating NFs with programmable hardware, i.e., FPGA, which is now a mature technology and inexpensive for datacenters. However, FPGA is predominately programmed using low-level hardware description languages (HDLs), which are hard to code and difficult to debug. More importantly, HDLs are almost inaccessible for most software programmers. This paper presents ClickNP, a FPGA-accelerated platform for highly flexible and high-performance NFs with commodity servers. ClickNP is highly flexible as it is completely programmable using high-level C-like languages, and exposes a modular programming abstraction that resembles Click Modular Router. ClickNP is also high performance. Our prototype NFs show that they can process traffic at up to 200 million packets per second with ultra-low latency ($< 2\mu$s). Compared to existing software counterparts, with FPGA, ClickNP improves throughput by 10x, while reducing latency by 10x. To the best of our knowledge, ClickNP is the first FPGA-accelerated platform for NFs, written completely in high-level language and achieving 40 Gbps line rate at any packet size.

Supplemental Material

p1.mp4

mp4

213.8 MB

Download

References

1.Altera SDK for OpenCL. http://www.altera.com/.Google Scholar
2.Cavium Networks OCTEON II processors. http://www.caviumnetworks.com.Google Scholar
3.Dell networking s6000 spec sheet.Google Scholar
4.Linux virtual server. http://www.linuxvirtualserver.org/.Google Scholar
5.Netronome Flow Processor NFP-6xxx. https://netronome.com/product/nfp-6xxx/.Google Scholar
6.SDAccel Development Environment. http://www.xilinx.com/.Google Scholar
7.Strongswan ipsec-based vpn. https://www.strongswan.org/.Google Scholar
8.The OpenCL Specifications ver 2.1. Khronos Group.Google Scholar
9.Vivado Design Suite. http://www.xilinx.com/.Google Scholar
10.Ethernet switch series, 2013. Broadcom Trident II.Google Scholar
11.Introducing EDR 100GB/s - Enabling the Use of Data, 2014. Mellanox White Paper.Google Scholar
12.M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pfabric: Minimal near-optimal datacenter transport. In Proc. ACM SIGCOMM, 2013. Google ScholarDigital Library
13.J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: a java-compatible and synthesizable language for heterogeneous architectures. In ACM SIGPLAN Notices, volume 45, pages 89–108. ACM, 2010. Google ScholarDigital Library
14.J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović. Chisel: constructing hardware in a scala embedded language. In Proc. ACM Annual Design Automation Conf., 2012. Google ScholarDigital Library
15.D. F. Bacon, R. Rabbah, and S. Shukla. Fpga programming for the masses. Communications of the ACM, 56(4):56–63, 2013. Google ScholarDigital Library
16.W. Bai, L. Chen, K. Chen, and H. Wu. Enabling ecn in multi-service multi-queue data centers. In Proc. USENIX NSDI, 2016. Google ScholarDigital Library
17.T. Barbette, C. Soldani, and L. Mathy. Fast userspace packet processing. In Proc. ANCS, 2015. Google ScholarDigital Library
18.A. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, EC-15(5):757–763, Oct 1966.Google ScholarCross Ref
19.B. Betkaoui, D. B. Thomas, and W. Luk. Comparing performance and energy efficiency of fpgas and gpus for high productivity computing. In 2010 International Conference on Field-Programmable Technology, 2010.Google ScholarCross Ref
20.M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. Routebricks: Exploiting parallelism to scale software routers. In Proc. ACM SOSP, 2009. Google ScholarDigital Library
21.N. Egi, A. Greenhalgh, M. Handley, M. Hoerdt, F. Huici, and L. Mathy. Towards high performance virtual routers on commodity hardware. In Proc. ACM CoNEXT, 2008. Google ScholarDigital Library
22.R. Gandhi, H. H. Liu, Y. C. Hu, G. Lu, J. Padhye, L. Yuan, and M. Zhang. Duet: Cloud scale load balancing with hardware and software. In Proc. ACM SIGCOMM, 2014. Google ScholarDigital Library
23.A. Greenberg. Windows Azure: Scaling SDN in Public Cloud, 2014. OpenNet Submit.Google Scholar
24.A. Greenberg. SDN for the Cloud, 2015. Keynote at SIGCOMM 2015 (https://azure.microsoft.com/en-us/blog/microsoft-showcases-software-defined-networking-innovation-at-sigcomm-v2/).Google Scholar
25.A. Greenhalgh, F. Huici, M. Hoerdt, P. Papadimitriou, M. Handley, and L. Mathy. Flow processing and the rise of commodity network hardware. ACM SIGCOMM CCR, 39(2):20–26, Mar. 2009. Google ScholarDigital Library
26.S. Han, K. Jang, K. Park, and S. Moon. Packetshader: A gpu-accelerated software router. In Proc. ACM SIGCOMM, 2010. Google ScholarDigital Library
27.W. Jiang. Scalable ternary content addressable memory implementation using fpgas. In Proc. ANCS, 2013. Google ScholarDigital Library
28.S. Kestur, J. D. Davis, and O. Williams. Blas comparison on fpga, cpu and gpu. In IEEE Computer Society Symposium on VLSI, July 2010. Google ScholarDigital Library
29.E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The click modular router. ACM Transactions on Computer Systems (TOCS), 18(3):263–297, 2000. Google ScholarDigital Library
30.T. Koponen, K. Amidon, P. Balland, M. Casado, A. Chanda, B. Fulton, I. Ganichev, J. Gross, N. Gude, P. Ingram, et al. Network virtualization in multi-tenant datacenters. In Proc. USENIX NSDI, Berkeley, CA, USA, 2014. Google ScholarDigital Library
31.M. Lavasani, L. Dennison, and D. Chiou. Compiling high throughput network processors. In Proc. FPGA, 2012. Google ScholarDigital Library
32.J. Lee, S. Lee, J. Lee, Y. Yi, and K. Park. Flosis: a highly scalable network flow capture system for fast retrieval and storage efficiency. In Proc. USENIX ATC, 2015. Google ScholarDigital Library
33.J. Martins, M. Ahmed, C. Raiciu, V. Olteanu, M. Honda, R. Bifulco, and F. Huici. Clickos and the art of network function virtualization. In Proc. USENIX NSDI, 2014. Google ScholarDigital Library
34.N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: enabling innovation in campus networks. ACM SIGCOMM CCR, 38(2):69–74, 2008. Google ScholarDigital Library
35.S.-W. Moon, J. Rexford, and K. G. Shin. Scalable hardware priority queue architectures for high-speed packet switches. IEEE Transactions on Computers, 2000. Google ScholarDigital Library
36.J. Naous, G. Gibb, S. Bolouki, and N. McKeown. Netfpga: Reusable router architecture for experimental research. In Proc. PRESTO, 2008. Google ScholarDigital Library
37.R. S. Nikhil and Arvind. What is bluespec? ACM SIGDA Newsletter, 39(1):1–1, Jan. 2009. Google ScholarDigital Library
38.R. Pagh and F. F. Rodler. Cuckoo hashing. Algorithms - ESA 2001. Lecture Notes in Computer Science 2161, 2001. Google ScholarDigital Library
39.P. Patel, D. Bansal, L. Yuan, A. Murthy, A. Greenberg, D. A. Maltz, R. Kern, H. Kumar, M. Zikos, H. Wu, C. Kim, and N. Karri. Ananta: Cloud scale load balancing. In Proc. ACM SIGCOMM, 2013. Google ScholarDigital Library
40.A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In Proc. Intl. Symp. on Computer Architecture (ISCA), 2014. Google ScholarDigital Library
41.T. Rinta-aho, M. Karlstedt, and M. P. Desai. The click2netfpga toolchain. In Proc. USENIX ATC, 2012. Google ScholarDigital Library
42.E. Rubow, R. McGeer, J. Mogul, and A. Vahdat. Chimpp: A click-based programming and simulation environment for reconfigurable networking hardware. In Proc. ANCS, 2010. Google ScholarDigital Library
43.V. Sekar, N. Egi, S. Ratnasamy, M. K. Reiter, and G. Shi. Design and implementation of a consolidated middlebox architecture. In Proc. USENIX NSDI, 2012. Google ScholarDigital Library
44.J. Sherry, P. Gao, S. Basu, A. Panda, A. Krishnamurthy, C. Macciocco, M. Manesh, J. Martins, S. Ratnasamy, L. Rizzo, and S. Shenker. Rollback recovery for middleboxes. In Proc. ACM SIGCOMM, 2015. Google ScholarDigital Library
45.D. Singh. Implementing fpga design with the opencl standard. Altera whitepaper, 2011.Google Scholar
46.R. Wester. A transformation-based approach to hardware design using higher-order functions. 2015.Google Scholar

Index Terms

ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Hardware-software codesign
2. Networks
  1. Network components
    1. Middle boxes / network appliances
  2. Network types
    1. Data center networks

Recommendations

Acceleration of Image Processing Algorithms Using Minimal Resources of Custom Reconfigurable Hardware
PCI '12: Proceedings of the 2012 16th Panhellenic Conference on Informatics

The hardware/software implementation of a custom vision board using minimal resources out of a reconfigurable platform is described. Demanding robotic vision applications in most cases require dedicated hardware for reliable operation. The designed ...
Read More
Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture

Field-programmable gate arrays (FPGAs) have become an extremely popular implementation technology for custom hardware because they offer a combination of low cost and very fast turnaround. Because of their in-system reconfigurability, FPGAs have also ...
Read More
Self-Reconfigurable Embedded Systems on Low-Cost FPGAs

Hardware acceleration significantly increases the performance of embedded systems built on programmable logic. Allowing a FPGA-based MicroBlaze processor to self-select the coprocessors it uses can help reduce area requirements and increase a system's ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGCOMM '16: Proceedings of the 2016 ACM SIGCOMM Conference
August 2016
645 pages
ISBN:9781450341936
DOI:10.1145/2934872
General Chairs:
Marinho Barcellos
UFRGS
,
Jon Crowcroft
University of Cambridge
,
Program Chairs:
Amin Vahdat
Google
,
Sachin Katti
Stanford University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Compiler
FPGA
Network Function Virtualization
Reconfigurable Hardware
Qualifiers
- research-article
Conference

Acceptance Rates
SIGCOMM '16 Paper Acceptance Rate39of231submissions,17%Overall Acceptance Rate554of3,547submissions,16%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 192
  Total Citations
  View Citations
- 4,956
  Total Downloads
- Downloads (Last 12 months)433
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware

SIGCOMM '16: Proceedings of the 2016 ACM SIGCOMM Conference

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Acceleration of Image Processing Algorithms Using Minimal Resources of Custom Reconfigurable Hardware

Implementing an OFDM Receiver on the RaPiD Reconfigurable Architecture

Self-Reconfigurable Embedded Systems on Low-Cost FPGAs