skip to main content
10.1145/1851182.1851207acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

PacketShader: a GPU-accelerated software router

Published:30 August 2010Publication History

ABSTRACT

We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high-performance packet I/O engine, PacketShader outperforms existing software routers by more than a factor of four, forwarding 64B IPv4 packets at 39 Gbps on a single commodity PC. We have implemented IPv4 and IPv6 forwarding, OpenFlow switching, and IPsec tunneling to demonstrate the flexibility and performance advantage of PacketShader. The evaluation results show that GPU brings significantly higher throughput over the CPU-only implementation, confirming the effectiveness of GPU for computation and memory-intensive operations in packet processing.

References

  1. AMD Fusion. http://fusion.amd.com.Google ScholarGoogle Scholar
  2. Cavium Networks OCTEON II processors. http://www.caviumnetworks.com/OCTEON_II_MIPS64.html.Google ScholarGoogle Scholar
  3. Check Point IP Security Appliances. http://www.checkpoint.com/products/ip-appliances/index.html.Google ScholarGoogle Scholar
  4. Cisco QuantumFlow Processors. http://www.cisco.com/en/US/prod/collateral/routers/ps9343/solution_over%view_c22--448936.html.Google ScholarGoogle Scholar
  5. General Purpose computation on GPUs. http://www.gpgpu.org.Google ScholarGoogle Scholar
  6. GNU Zebra project. http://www.zebra.org.Google ScholarGoogle Scholar
  7. NVIDIA CUDA GPU Computing Discussion Forum. http://forums.nvidia.com/index.php?showtopic=104243.Google ScholarGoogle Scholar
  8. NVIDIA Fermi Architecture. http://www.nvidia.com/object/fermi_architecture.html.Google ScholarGoogle Scholar
  9. OpenFlow Reference System. http://www.openflowswitch.org/wp/downloads/.Google ScholarGoogle Scholar
  10. OpenFlow Switch Specification, Version 0.8.9. http://www.openflowswitch.org/documents/openflow-spec-v0.8.9.pdf.Google ScholarGoogle Scholar
  11. Quagga project. http://www.quagga.net.Google ScholarGoogle Scholar
  12. Receive-Side Scaling Enhancements in Windows Server 2008. http://www.microsoft.com/whdc/device/network/ndis_rss.mspx.Google ScholarGoogle Scholar
  13. The OpenFlow Switch Consortium. http://www.openflowswitch.org.Google ScholarGoogle Scholar
  14. University of Oregon RouteViews project. http://www.routeviews.org/.Google ScholarGoogle Scholar
  15. R. Bolla and R. Bruschi. PC-based software routers: High performance and application service support. In ACM PRESTO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Bonwick. The slab allocator: an object-caching kernel memory allocator. In USENIX Summer Technical Conference, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In OSDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Brecht, G. J. Janakiraman, B. Lynn, V. Saletore, and Y. Turner. Evaluating network processing efficiency with processor partitioning and asynchronous i/o. SIGOPS Oper. Syst. Rev., 40(4):265--278, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Dobrescu, N. Egi, K. Argyraki, B.-G. Chun, K. Fall, G. Iannaccone, A. Knies, M. Manesh, and S. Ratnasamy. RouteBricks: exploiting parallelism to scale software routers. In SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Fatahalian and M. Houston. A closer look at GPUs. Communications of the ACM, 51:50--57, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In IEEE ICON, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  22. P. Gupta, S. Lin, and N. McKeown. Routing lookups in hardware at memory access speeds. In IEEE INFOCOM, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  23. S. Han, K. Jang, K. Park, and S. Moon. Building a single-box 100 gbps software router. In IEEE Workshop on Local and Metropolitan Area Networks, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  24. O. Harrison and J. Waldron. Practical Symmetric Key Cryptography on Modern Graphics Hardware. In USENIX Security, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Hong and H. Kim. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. Jacobson, C. Leres, and S. McCanne. libpcap, Lawrence Berkeley Laboratory, Berkeley, CA. http://www.tcpdump.org.Google ScholarGoogle Scholar
  27. K. Jang, S. Han, S. Moon, and K. Park. Converting your graphics card into high-performance SSL accelerator. submitted for publication.Google ScholarGoogle Scholar
  28. G. Jin and B. L. Tierney. System capability effects on algorithms for network bandwidth measurement. In IMC, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Kim, J. Heo, J. Huh, J. Kim, and S. Yoon. HPCCD: Hybrid Parallel Continuous Collision Detection using CPUs and GPUs. In Computer Graphics Forum, volume 28, pages 1791--1800. John Wiley & Sons, 2009.Google ScholarGoogle Scholar
  30. E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The Click modular router. ACM TOCS, 18(3):263--297, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Liao, D. Yin, and L. Gao. PdP: parallelizing data plane in virtual network substrate. In ACM VISA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Manavski. CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In IEEE Signal Processing and Communications, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  33. N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. OpenFlow: enabling innovation in campus networks. SIGCOMM CCR, 38(2):69--74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Mogul and K. Ramarkishnan. Eliminating Receive Livelock in an Interrupt-Driven Kernel. ACM TOCS, 15(3):217--252, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Mu, X. Zhang, N. Zhang, J. Lu, Y. S. Deng, and S. Zhang. Ip routing processing with graphic processors. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Naous, D. Erickson, G. A. Covington, G. Appenzeller, and N. McKeown. Implementing an OpenFlow switch on the NetFPGA platform. In ANCS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. Queue, 6(2):40--53, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. NVIDIA Corporation. NVIDIA CUDA Best Practices Guide, Version 3.0.Google ScholarGoogle Scholar
  39. NVIDIA Corporation. NVIDIA CUDA Architecture Introduction and Overview, 2009.Google ScholarGoogle Scholar
  40. NVIDIA Corporation. NVIDIA CUDA Programming Guide, Version 3.0, 2009.Google ScholarGoogle Scholar
  41. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A Survey of General-Purpose Computation on Graphics Hardware. In Eurographics 2005, State of the Art Reports, pages 21--51, Aug. 2005.Google ScholarGoogle Scholar
  42. K. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26:80--113, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  43. S. Ryoo, C. I. Rodrigues, S. S. Baghsorkhi, S. S. Stone, D. B. Kirk, and W.-m. W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In ACM PPoPP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. H. Salim, R. Olsson, and A. Kuznetsov. Beyond softnet. In Annual Linux Showcase & Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, et al. Larrabee: a many-core x86 architecture for visual computing. In ACM SIGGRAPH, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. N. Shah, W. Plishker, K. Ravindran, and K. Keutzer. Np-click: A productive software development approach for network processors. IEEE Micro, 24(5):45--54, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. H. Shojania, B. Li, and X. Wang. Nuclei: GPU-accelerated many-core network coding. In IEEE INFOCOM, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  48. R. Smith, N. Goyal, J. Ormont, C. Estan, and K. Sankaralingam. Evaluating GPUs for network packet signature matching. In IEEE ISPASS, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  49. R. Szerwinski and T. Güneysu. Exploiting the power of GPUs for asymmetric cryptography. Cryptographic Hardware and Embedded Systems, pages 79--99, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Torrellas, H. S. Lam, and J. L. Hennessy. False Sharing and Spatial Locality in Multiprocessor Caches. IEEE Trans. on Computers, 43(6):651--663, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. S. Turner, P. Crowley, J. DeHart, A. Freestone, B. Heller, F. Kuhns, S. Kumar, J. Lockwood, J. Lu, M. Wilson, C. Wiseman, and D. Zar. Supercharging planetlab: a high performance, multi-application, overlay network platform. SIGCOMM CCR, 37(4):85--96, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. L. G. Valiant and G. J. Brebner. Universal schemes for parallel communication. In Proceedings of the ACM symposium on Theory of computing, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In Proc. of Recent Advances in Intrusion Detection (RAID), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. B. Veal and A. Foong. Performance Scalability of a Multi-Core Web Server. In ANCS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed IP routing lookups. In SIGCOMM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PacketShader: a GPU-accelerated software router

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference
        August 2010
        500 pages
        ISBN:9781450302012
        DOI:10.1145/1851182
        • cover image ACM SIGCOMM Computer Communication Review
          ACM SIGCOMM Computer Communication Review  Volume 40, Issue 4
          SIGCOMM '10
          October 2010
          481 pages
          ISSN:0146-4833
          DOI:10.1145/1851275
          Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 August 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate554of3,547submissions,16%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader