research-article

INCA: in-network compute assistance

Authors:
Whit Schonbein

Sandia National Laboratories Center for Computational Research

Sandia National Laboratories Center for Computational Research
View Profile

,
Ryan E. Grant

Sandia National Laboratories Center for Computational Research

Sandia National Laboratories Center for Computational Research
View Profile

,
Matthew G. F. Dosanjh

Sandia National Laboratories Center for Computational Research

Sandia National Laboratories Center for Computational Research
View Profile

,
Dorian Arnold

Emory University

Emory University
View Profile

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisNovember 2019Article No.: 54Pages 1–13https://doi.org/10.1145/3295500.3356153

Published:17 November 2019Publication History

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1–13

ABSTRACT

Current proposals for in-network data processing operate on data as it streams through a network switch or endpoint. Since compute resources must be available when data arrives, these approaches provide deadline-based models of execution. This paper introduces a deadline-free general compute model for network endpoints called INCA: In-Network Compute Assistance. INCA builds upon contemporary NIC offload capabilities to provide on-NIC, deadline-free, general-purpose compute capacities that can be utilized when the network is inactive. We demonstrate INCA is Turing complete, and provide a detailed design for extending existing hardware to support this model. We evaluate runtimes for a selection of kernels, including several optimizations, and show INCA can provide up to a 11% speedup for applications with minimal code modifications and between 25% to 37% when applications are optimized for INCA.

References

Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, and Chris Scheiman. 1995. LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation. ACM Press, 95--105. Google ScholarDigital Library
Brian W. Barrett, Ron Brightwell, Ryan E. Grant, Scott Hemmert, Kevin Pedretti, Kyle Wheeler, Keith Underwood, Rolf Riesen, Torsten Hoefler, Arthur B. Maccabe, and Trammell Hudson. 2018. The Portals 4.2 Network Programming Interface. Technical Report SAND2018-12790.Google Scholar
Brian W Barrett, Ron Brightwell, K Scott Hemmert, Kyle B Wheeler, and Keith D Underwood. 2011. Using triggered operations to offload rendezvous messages. In European MPI Users' Group Meeting. Springer, 120--129.Google ScholarDigital Library
Nanette J Boden, Danny Cohen, Robert E Felderman, Alan E. Kulawik, Charles L Seitz, Jakov N Seizovic, and Wen-King Su. 1995. Myrinet: A gigabit-per-second local area network. IEEE Micro 15, 1 (1995), 29--36.Google ScholarDigital Library
Ron Brightwell, Kevin T Pedretti, Keith D Underwood, and Trammell Hudson. 2006. SeaStar interconnect: Balanced bandwidth for scalable performance. IEEE Micro 26, 3 (2006), 41--57.Google ScholarDigital Library
Broadcom. 2019. Stingray SmartNIC. Retrieved 2019-10-01 from https://www.broadcom.com/products/ethernet-connectivity/smartnic/ps225Google Scholar
Darius Buntinas, Dhabaleswar K. Panda, and Ponnuswamy Sadayappan. 2001. Fast NIC-based barrier over Myrinet/GM. In Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001. 52--59. Google ScholarCross Ref
Christopher L Chappell and James Mitchell. 2012. Packet processing in switched fabric networks. Patent No. 8285907, Filed December 10th., 2004, Issued October 9th., 2012.Google Scholar
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. 1993. LogP: Towards a Realistic Model of Parallel Computation. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP '93). ACM, New York, NY, USA, 1--12. Google ScholarDigital Library
Dennis Dalessandro, Ananth Devulapalli, and Pete Wyckoff. 2005. Design and implementation of the iWARP protocol in software. In Proceedings of the 17th IASTED International Conference on Parallel and Distributed Computing and Systems. Phoenix, Arizona, 471--476.Google Scholar
Dennis Dalessandro, Pete Wyckoff, and Gary Montry. 2006. Initial performance evaluation of the neteffect 10 gigabit iwarp adapter. In 2006 IEEE International Conference on Cluster Computing. IEEE, 1--7.Google ScholarCross Ref
S. Derradji, T. Palfer-Sollier, J. P. Panziera, A. Poudes, and F. W. Atos. 2015. The BXI Interconnect Architecture. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. 18--25. Google ScholarDigital Library
Hans Devries. 2019. Chip Architect. Retrieved 2019-04-09 from http://www.chip-architect.com/Google Scholar
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure accelerated networking: SmartNICs in the public cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 51--66.Google ScholarDigital Library
Richard L Graham, Devendar Bureddy, Pak Lui, Hal Rosenstock, Gilad Shainer, Gil Bloch, Dror Goldenerg, Mike Dubman, Sasha Kotchubievsky, Vladimir Koushnir, et al. 2016. Scalable hierarchical aggregation protocol (SHArP): a hardware architecture for efficient data reduction. In Proceedings of the First Workshop on Optimization of Communication in HPC. IEEE Press, 1--10.Google ScholarCross Ref
Richard L Graham, Steve Poole, Pavel Shamis, Gil Bloch, Noam Bloch, Hillel Chapman, Michael Kagan, Ariel Shahar, Ishai Rabinovitz, and Gilad Shainer. 2010. Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities. In Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on. IEEE, 1--8.Google Scholar
Ryan E Grant, Mohammad J Rashti, Ahmad Afsahi, and Pavan Balaji. 2011. RDMA capable iWARP over datagrams. In Parallel & Distributed Processing Symposium (IPDPS), 2011 IEEE International. IEEE, 628--639.Google ScholarDigital Library
K. Scott Hemmert, Brian Barrett, and Keith D. Underwood. 2010. Using Triggered Operations to Offload Collective Communication Operations. In Recent Advances in the Message Passing Interface (Lecture Notes in Computer Science). Springer, Berlin, Heidelberg, 249--256. Google ScholarCross Ref
Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.Google Scholar
Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E. Grant, and Ron Brightwell. 2017. sPIN: High-performance Streaming Processing In the Network. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, 59:1--59:16. Google ScholarDigital Library
Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 67--81. Google ScholarDigital Library
D. Brian Larkins, John Snyder, and James Dinan. 2018. Efficient Runtime Support for a Partitioned Global Logical Address Space. In ICPP 2018: 47th International Conference on Parallel Processing. ACM, Eugune, Oregon.Google ScholarDigital Library
Mellanox. 2018. Mellanox BlueField SmartNIC. Retrieved 2019-10-01 from https://www.mellanox.com/products/bluefield-overviewGoogle Scholar
Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, and Eitan Frachtenberg. 2002. The Quadrics network: High-performance clustering technology. IEEE Micro 22, 1 (2002), 46--57.Google ScholarDigital Library
Steve Plimpton. 1995. Fast parallel algorithms for short-range molecular dynamics. Journal of computational physics 117, 1 (1995), 1--19.Google ScholarDigital Library
ECP Project. 2019. ECP Proxy Applications. Retrieved 2019-10-01 from https://proxyapps.exascaleproject.org/Google Scholar
Mohammad J Rashti, Ryan E Grant, Ahmad Afsahi, and Pavan Balaji. 2010. iWARP redefined: Scalable connectionless communication over high-speed Ethernet. In High Performance Computing (HiPC), 2010 International Conference on. IEEE, 1--10.Google ScholarCross Ref
Timo Schneider, Torsten Hoefler, Ryan E Grant, Brian W Barrett, and Ron Brightwell. 2013. Protocols for fully offloaded collective operations on accelerated network adapters. In Parallel Processing (ICPP), 2013 42nd International Conference on. IEEE, 593--602.Google ScholarDigital Library
J. C. Shepherdson and H. E. Sturgis. 1963. Computability of Recursive Functions. J. ACM 10, 2 (April 1963), 217--255. Google ScholarDigital Library
Krishna Parasuram Srinivasan. 2018. Creating a PCI express interconnect in the gem5 simulator. Master's thesis.Google Scholar
K. D. Underwood, J. Coffman, R. Larsen, K. S. Hemmert, B.W. Barrett, R. Brightwell, and M. Levenhagen. 2011. Enabling Flexible Collective Communication Offload with Triggered Operations. In 2011 IEEE 19th Annual Symposium on High Performance Interconnects. 35--42. Google ScholarDigital Library
K. D. Underwood, K. S. Hemmert, A. Rodrigues, R. Murphy, and R. Brightwell. 2005. A Hardware Acceleration Unit for MPI Queue Processing. In 19th IEEE International Parallel and Distributed Processing Symposium. Google ScholarDigital Library

Index Terms

INCA: in-network compute assistance
1. Networks
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Software infrastructure
        Middleware
        Message oriented middleware

Recommendations

INCA: a next-generation architecture for simulation
IVC '96: Proceedings of the 1996 IEEE International Verilog HDL Conference (IVC '96)

The paper presents INCA, the Interleaved Native-Compiled code Architecture for simulation. INCA is a flexible strategy to create optimized simulations involving multiple design styles, languages, and scheduling paradigms. INCA emphasizes optimized ...
Read More
INCA: An Architecture for In-Network Computing
ENCP '19: Proceedings of the 1st ACM CoNEXT Workshop on Emerging in-Network Computing Paradigms

We present some results on integrating computing with networking so as to optimize the placement of workloads within a distributed network. We describe INCA, an In-Network Computing Architecture that allows clients to request functions that are then ...
Read More
INCA: INterruptible CNN accelerator for multi-tasking in embedded robots
DAC '20: Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference

In recent years, Convolutional Neural Network (CNN) has been widely used in robotics, which has dramatically improved the perception and decision-making ability of robots. A series of CNN accelerators have been designed to implement energy-efficient CNN ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2019
1921 pages
ISBN:9781450362290
DOI:10.1145/3295500
General Chair:
Michela Taufer,
Program Chairs:
Pavan Balaji,
Antonio J. Peña
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 November 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 693
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

INCA: in-network compute assistance

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

INCA: a next-generation architecture for simulation

INCA: An Architecture for In-Network Computing

INCA: INterruptible CNN accelerator for multi-tasking in embedded robots

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

INCA: in-network compute assistance

SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

INCA: a next-generation architecture for simulation

INCA: An Architecture for In-Network Computing

INCA: INterruptible CNN accelerator for multi-tasking in embedded robots

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media