PacketShader: a GPU-accelerated software router

Authors:
Sangjin Han

KAIST, Daejeon, South Korea

KAIST, Daejeon, South Korea
View Profile

,
Keon Jang

KAIST, Daejeon, South Korea

KAIST, Daejeon, South Korea
View Profile

,
KyoungSoo Park

KAIST, Daejeon, South Korea

KAIST, Daejeon, South Korea
View Profile

,
Sue Moon

KAIST, Daejeon, South Korea

KAIST, Daejeon, South Korea
View Profile

SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conferenceAugust 2010Pages 195–206https://doi.org/10.1145/1851182.1851207

Published:30 August 2010Publication History

SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference

Pages 195–206

ABSTRACT

We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high-performance packet I/O engine, PacketShader outperforms existing software routers by more than a factor of four, forwarding 64B IPv4 packets at 39 Gbps on a single commodity PC. We have implemented IPv4 and IPv6 forwarding, OpenFlow switching, and IPsec tunneling to demonstrate the flexibility and performance advantage of PacketShader. The evaluation results show that GPU brings significantly higher throughput over the CPU-only implementation, confirming the effectiveness of GPU for computation and memory-intensive operations in packet processing.

References

AMD Fusion. http://fusion.amd.com.Google Scholar
Cavium Networks OCTEON II processors. http://www.caviumnetworks.com/OCTEON_II_MIPS64.html.Google Scholar
Check Point IP Security Appliances. http://www.checkpoint.com/products/ip-appliances/index.html.Google Scholar
Cisco QuantumFlow Processors. http://www.cisco.com/en/US/prod/collateral/routers/ps9343/solution_over%view_c22--448936.html.Google Scholar
General Purpose computation on GPUs. http://www.gpgpu.org.Google Scholar
GNU Zebra project. http://www.zebra.org.Google Scholar
NVIDIA CUDA GPU Computing Discussion Forum. http://forums.nvidia.com/index.php?showtopic=104243.Google Scholar
NVIDIA Fermi Architecture. http://www.nvidia.com/object/fermi_architecture.html.Google Scholar
OpenFlow Reference System. http://www.openflowswitch.org/wp/downloads/.Google Scholar
OpenFlow Switch Specification, Version 0.8.9. http://www.openflowswitch.org/documents/openflow-spec-v0.8.9.pdf.Google Scholar
Quagga project. http://www.quagga.net.Google Scholar
Receive-Side Scaling Enhancements in Windows Server 2008. http://www.microsoft.com/whdc/device/network/ndis_rss.mspx.Google Scholar
The OpenFlow Switch Consortium. http://www.openflowswitch.org.Google Scholar
University of Oregon RouteViews project. http://www.routeviews.org/.Google Scholar
R. Bolla and R. Bruschi. PC-based software routers: High performance and application service support. In ACM PRESTO, 2008. Google ScholarDigital Library
J. Bonwick. The slab allocator: an object-caching kernel memory allocator. In USENIX Summer Technical Conference, 1994. Google ScholarDigital Library
S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In OSDI, 2008. Google ScholarDigital Library
T. Brecht, G. J. Janakiraman, B. Lynn, V. Saletore, and Y. Turner. Evaluating network processing efficiency with processor partitioning and asynchronous i/o. SIGOPS Oper. Syst. Rev., 40(4):265--278, 2006. Google ScholarDigital Library
M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. RouteBricks: exploiting parallelism to scale software routers. In SOSP, 2009. Google ScholarDigital Library
K. Fatahalian and M. Houston. A closer look at GPUs. Communications of the ACM, 51:50--57, 2008. Google ScholarDigital Library
A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In IEEE ICON, 2004.Google ScholarCross Ref
P. Gupta, S. Lin, and N. McKeown. Routing lookups in hardware at memory access speeds. In IEEE INFOCOM, 1998.Google ScholarCross Ref
S. Han, K. Jang, K. Park, and S. Moon. Building a single-box 100 gbps software router. In IEEE Workshop on Local and Metropolitan Area Networks, 2010.Google ScholarCross Ref
O. Harrison and J. Waldron. Practical Symmetric Key Cryptography on Modern Graphics Hardware. In USENIX Security, 2008. Google ScholarDigital Library
S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009. Google ScholarDigital Library
V. Jacobson, C. Leres, and S. McCanne. libpcap, Lawrence Berkeley Laboratory, Berkeley, CA. http://www.tcpdump.org.Google Scholar
K. Jang, S. Han, S. Moon, and K. Park. Converting your graphics card into high-performance SSL accelerator. submitted for publication.Google Scholar
G. Jin and B. L. Tierney. System capability effects on algorithms for network bandwidth measurement. In IMC, 2003. Google ScholarDigital Library
D. Kim, J. Heo, J. Huh, J. Kim, and S. Yoon. HPCCD: Hybrid Parallel Continuous Collision Detection using CPUs and GPUs. In Computer Graphics Forum, volume 28, pages 1791--1800. John Wiley & Sons, 2009.Google Scholar
E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The Click modular router. ACM TOCS, 18(3):263--297, 2000. Google ScholarDigital Library
Y. Liao, D. Yin, and L. Gao. PdP: parallelizing data plane in virtual network substrate. In ACM VISA, 2009. Google ScholarDigital Library
S. Manavski. CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In IEEE Signal Processing and Communications, 2007.Google ScholarCross Ref
N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. OpenFlow: enabling innovation in campus networks. SIGCOMM CCR, 38(2):69--74, 2008. Google ScholarDigital Library
J. Mogul and K. Ramarkishnan. Eliminating Receive Livelock in an Interrupt-Driven Kernel. ACM TOCS, 15(3):217--252, 1997. Google ScholarDigital Library
S. Mu, X. Zhang, N. Zhang, J. Lu, Y. S. Deng, and S. Zhang. Ip routing processing with graphic processors. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010. Google ScholarDigital Library
J. Naous, D. Erickson, G. A. Covington, G. Appenzeller, and N. McKeown. Implementing an OpenFlow switch on the NetFPGA platform. In ANCS, 2008. Google ScholarDigital Library
J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. Queue, 6(2):40--53, 2008. Google ScholarDigital Library
NVIDIA Corporation. NVIDIA CUDA Best Practices Guide, Version 3.0.Google Scholar
NVIDIA Corporation. NVIDIA CUDA Architecture Introduction and Overview, 2009.Google Scholar
NVIDIA Corporation. NVIDIA CUDA Programming Guide, Version 3.0, 2009.Google Scholar
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. In Eurographics 2005, State of the Art Reports, pages 21--51, Aug. 2005.Google Scholar
K. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26:80--113, 2007.Google ScholarCross Ref
S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In ACM PPoPP, 2008. Google ScholarDigital Library
J. H. Salim, R. Olsson, and A. Kuznetsov. Beyond softnet. In Annual Linux Showcase & Conference, 2001. Google ScholarDigital Library
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, et al. Larrabee: a many-core x86 architecture for visual computing. In ACM SIGGRAPH, 2008. Google ScholarDigital Library
N. Shah, W. Plishker, K. Ravindran, and K. Keutzer. Np-click: A productive software development approach for network processors. IEEE Micro, 24(5):45--54, 2004. Google ScholarDigital Library
H. Shojania, B. Li, and X. Wang. Nuclei: GPU-accelerated many-core network coding. In IEEE INFOCOM, 2009.Google ScholarCross Ref
R. Smith, N. Goyal, J. Ormont, C. Estan, and K. Sankaralingam. Evaluating GPUs for network packet signature matching. In IEEE ISPASS, 2009.Google ScholarCross Ref
R. Szerwinski and T. Güneysu. Exploiting the power of GPUs for asymmetric cryptography. Cryptographic Hardware and Embedded Systems, pages 79--99, 2008. Google ScholarDigital Library
J. Torrellas, H. S. Lam, and J. L. Hennessy. False Sharing and Spatial Locality in Multiprocessor Caches. IEEE Trans. on Computers, 43(6):651--663, 1994. Google ScholarDigital Library
J. S. Turner, P. Crowley, J. DeHart, A. Freestone, B. Heller, F. Kuhns, S. Kumar, J. Lockwood, J. Lu, M. Wilson, C. Wiseman, and D. Zar. Supercharging planetlab: a high performance, multi-application, overlay network platform. SIGCOMM CCR, 37(4):85--96, 2007. Google ScholarDigital Library
L. G. Valiant and G. J. Brebner. Universal schemes for parallel communication. In Proceedings of the ACM symposium on Theory of computing, 1981. Google ScholarDigital Library
G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In Proc. of Recent Advances in Intrusion Detection (RAID), 2008. Google ScholarDigital Library
B. Veal and A. Foong. Performance Scalability of a Multi-Core Web Server. In ANCS, 2007. Google ScholarDigital Library
M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed IP routing lookups. In SIGCOMM, 1997. Google ScholarDigital Library

Index Terms

PacketShader: a GPU-accelerated software router
1. Networks
  1. Network components
    1. Intermediate nodes
      1. Routers
  2. Network protocols

Recommendations

PacketShader: a GPU-accelerated software router
SIGCOMM '10

We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in ...
Read More
Out-of-core implementation for accelerator kernels on heterogeneous clouds

Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to ...
Read More
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference
August 2010
500 pages
ISBN:9781450302012
DOI:10.1145/1851182
General Chairs:
Shiv Kalyanaraman
IBM Research, India
,
Venkat Padmanabhan
Microsoft Research, India
,
K. K. Ramakrishnan
IBM Research, India
,
Rajeev Shorey
NIIT University, India
,
Program Chairs:
K. K. Ramakrishnan
IBM Research, India
,
Geoffrey M. Voelker
University of California, San Diego, USA
ACM SIGCOMM Computer Communication Review Volume 40, Issue 4
SIGCOMM '10
October 2010
481 pages
ISSN:0146-4833
DOI:10.1145/1851275
Issue’s Table of Contents
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 August 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CUDA
GPU
software router
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate554of3,547submissions,16%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 546
  Total Citations
  View Citations
- 3,838
  Total Downloads
- Downloads (Last 12 months)316
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PacketShader: a GPU-accelerated software router

SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

PacketShader: a GPU-accelerated software router

Out-of-core implementation for accelerator kernels on heterogeneous clouds

A performance study of general-purpose applications on graphics processors using CUDA