research-article

DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip

Authors:
Sheng Ma

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China
View Profile

,
Natalie Enright Jerger

University of Toronto, Toronto, ON, Canada

University of Toronto, Toronto, ON, Canada
View Profile

,
Zhiying Wang

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China
View Profile

ISCA '11: Proceedings of the 38th annual international symposium on Computer architectureJune 2011Pages 413–424https://doi.org/10.1145/2000064.2000113

Published:04 June 2011Publication History

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

Pages 413–424

ABSTRACT

With the emergence of many-core architectures, it is quite likely that multiple applications will run concurrently on a system. Existing locally and globally adaptive routing algorithms largely overlook issues associated with workload consolidation. The shortsightedness of locally adaptive routing algorithms limits performance due to poor network congestion avoidance. Globally adaptive routing algorithms attack this issue by introducing a congestion propagation network to obtain network status information beyond neighboring nodes. However, they may suffer from intra- and inter-application interference during output port selection for consolidated workloads, coupling the behavior of otherwise independent applications and negatively affecting performance.

To address these two issues, we propose Destination-Based Adaptive Routing (DBAR). We design a novel low-cost congestion propagation network that leverages both local and non-local network information for more accurate congestion estimates. Thus, DBAR offers effective adaptivity for congestion beyond neighboring nodes. More importantly, by integrating the destination into the selection function, DBAR mitigates intra- and inter-application interference and offers dynamic isolation among regions. Experimental results show that DBAR can offer better performance than the best baseline algorithm for all measured configurations; it is well suited for workload consolidation. The wiring overhead of DBAR is low and DBAR provides improvement in the energy-delay product for medium and high injection rates.

Supplemental Material

isca_8b_3.mp4

mp4

109.5 MB

Download

References

G. Ascia, V. Catania, M. Palesi, and D. Patti. Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip. Computers, IEEE Transactions on, 57(6):809--820, June 2008. Google ScholarDigital Library
S. Bell et al. TILE64 - processor: A 64-core SoC with mesh interconnect. In ISSCC 2008, pages 88--598, February 2008.Google Scholar
G.-M. Chiu. The odd-even turn model for adaptive routing. Parallel and Distributed Systems, IEEE Transactions on, 11(7):729--738, July 2000. Google ScholarDigital Library
C.-L. Chou and R. Marculescu. Run-time task allocation considering user behavior in embedded multiprocessor networks-on-chip. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 29(1):78--91, January 2010. Google ScholarDigital Library
C.-L. Chou, U. Ogras, and R. Marculescu. Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 27(10):1866--1879, October 2008. Google ScholarDigital Library
W. Dally and C. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. Computers, IEEE Transactions on, C-36(5):547--553, May 1987. Google ScholarDigital Library
W. Dally and B. Towles. Route packets, not wires: on-chip interconnection networks. In DAC 2001, pages 684--689, May 2001. Google ScholarDigital Library
W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003. Google ScholarDigital Library
W. J. Dally and H. Aoki. Deadlock-free adaptive routing in multicomputer networks using virtual channels. Parallel and Distributed Systems, IEEE Transactions on, 4:466--475, April 1993. Google ScholarDigital Library
J. Duato. A new theory of deadlock-free adaptive routing in wormhole networks. Parallel and Distributed Systems, IEEE Transactions on, 4(12):1320--1331, December 1993. Google ScholarDigital Library
J. Duato. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. Parallel and Distributed Systems, IEEE Transactions on, 6(10):1055--1067, October 1995. Google ScholarDigital Library
J. Duato. A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. Parallel and Distributed Systems, IEEE Transactions on, 7(8):841--854, August 1996. Google ScholarDigital Library
N. Enright Jerger and L. Peh. On-Chip Networks. Morgan and Claypool Publishers, San Francisco, CA, USA, 1 edition, 2009. Google ScholarDigital Library
W.-C. Feng and K. G. Shin. Impact of selection functions on routing algorithm performance in multicomputer networks. In ICS 1997, pages 132--139, July 1997. Google ScholarDigital Library
M. Galles. Spider: a high-speed network interconnect. Micro, IEEE, 17(1):34--39, January-February 1997. Google ScholarDigital Library
C. Glass and L. Ni. The turn model for adaptive routing. In ISCA 1992, pages 278--287, June 1992. Google ScholarDigital Library
P. Gratz, B. Grot, and S. Keckler. Regional congestion awareness for load balance in networks-on-chip. In HPCA 2008, pages 203--214, February 2008.Google ScholarCross Ref
P. Gratz, K. Sankaralingam, H. Hanson, P. Shivakumar, R. McDonald, S. Keckler, and D. Burger. Implementation and evaluation of a dynamically routed processor operand network. In NOCS 2007, pages 7--17, May 2007. Google ScholarDigital Library
Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-GHz mesh interconnect for a Teraflops processor. Micro, IEEE, 27(5):51--61, September-October 2007. Google ScholarDigital Library
J. Hu and R. Marculescu. DyAD - smart routing for networks-on-chip. In DAC 2004, pages 260--263, June 2004. Google ScholarDigital Library
J. Hu and R. Marculescu. Energy- and performance-aware mapping for regular NoC architectures. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 24(4):551--562, April 2005. Google ScholarDigital Library
ITRS. International Technology Roadmap for Semiconductors, 2007 edition. http://www.itrs.net, 2007.Google Scholar
M. Karol, M. Hluchyj, and S. Morgan. Input versus output queueing on a space-division packet switch. Communications, IEEE Transactions on, 35(12):1347--1356, December 1987.Google ScholarCross Ref
J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan, and C. Das. A low latency router supporting adaptivity for on-chip interconnects. In DAC 2005, pages 559--564, June 2005. Google ScholarDigital Library
A. Kumar, P. Kundu, A. Singh, L.-S. Peh, and N. Jha. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In ICCD 2007, pages 63--70, October 2007.Google ScholarCross Ref
T. Lei and S. Kumar. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In DSD 2003, pages 180--187, September 2003. Google ScholarDigital Library
M. Li, Q.-A. Zeng, and W.-B. Jone. DyXY - a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In DAC 2006, pages 849--852, June 2006. Google ScholarDigital Library
D. llitzky, J. Hoffman, A. Chun, and B. Esparza. Architecture of the scalable communications core's network on chip. Micro, IEEE, 27(5):62--74, September-October 2007. Google ScholarDigital Library
J. C. Martínez, F. Silla, P. López, and J. Duato. On the influence of the selection function on the performance of networks of workstations. In ISHPC 2000, pages 292--299, October 2000. Google ScholarDigital Library
G. Michelogiannakis, D. Sanchez, W. Dally, and C. Kozyrakis. Evaluating bufferless flow control for on-chip networks. In NOCS 2010, pages 9--16, May 2010. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA 2008, pages 63--74, June 2008. Google ScholarDigital Library
L.-S. Peh and W. Dally. A delay model and speculative architecture for pipelined routers. In HPCA 2001, pages 255--266, May 2001. Google ScholarDigital Library
R. S. Ramanujam and B. Lin. Destination-based adaptive routing on 2D mesh networks. In ANCS 2010, pages 19:1--19:12, October 25-26 2010. Google ScholarDigital Library
S. Rodrigo, J. Flich, J. Duato, and M. Hummel. Efficient unicast and multicast support for CMPs. In MICRO 2008, pages 364--375, November 2008. Google ScholarDigital Library
L. Schwiebert and R. Bell. Performance tuning of adaptive wormhole routing through selection function choice. J. Parallel Distrib. Comput., 62:1121--1141, July 2002. Google ScholarDigital Library
A. Singh, W. Dally, A. Gupta, and B. Towles. GOAL: a load-balanced adaptive routing algorithm for torus networks. In ISCA 2003, pages 194--205, June 2003. Google ScholarDigital Library
SPEC. SPEC benchmarks. http://www.spec.org, 2009.Google Scholar
TPC. TPC benchmarks. http://www.tpc.org, 2008.Google Scholar
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In ISCA 1995, pages 24--36, June 1995. Google ScholarDigital Library
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS 2010, pages 129--142, March 2010. Google ScholarDigital Library

Index Terms

DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
2. Hardware
  1. Integrated circuits
    1. Interconnect

Recommendations

DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip
ISCA '11

With the emergence of many-core architectures, it is quite likely that multiple applications will run concurrently on a system. Existing locally and globally adaptive routing algorithms largely overlook issues associated with workload consolidation. The ...
Read More
RISO: relaxed network-on-chip isolation for cloud processors
DAC '13: Proceedings of the 50th Annual Design Automation Conference

Cloud service providers use workload consolidation technique in many-core cloud processors to optimize system utilization and augment performance for ever extending scale-out workloads. Performance isolation usually has to be enforced for the ...
Read More
Holistic Routing Algorithm Design to Support Workload Consolidation in NoCs

To provide efficient, high-performance routing algorithms, a holistic approach should be taken. The key aspects of routing algorithm design include adaptivity, path selection strategy, VC allocation, isolation, and hardware implementation cost; these ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
June 2011
488 pages
ISBN:9781450304726
DOI:10.1145/2000064
General Chairs:
Ravi Iyer
Intel
,
Qing Yang
University of Rhode Island
,
Program Chair:
Antonio González
Intel and UPC
ACM SIGARCH Computer Architecture News Volume 39, Issue 3
ISCA '11
June 2011
462 pages
ISSN:0163-5964
DOI:10.1145/2024723
Issue’s Table of Contents
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
networks-on-chip
routing algorithm
workload consolidation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 153
  Total Citations
  View Citations
- 1,112
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip

ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip

RISO: relaxed network-on-chip isolation for cloud processors

Holistic Routing Algorithm Design to Support Workload Consolidation in NoCs