ABSTRACT
With the emergence of many-core architectures, it is quite likely that multiple applications will run concurrently on a system. Existing locally and globally adaptive routing algorithms largely overlook issues associated with workload consolidation. The shortsightedness of locally adaptive routing algorithms limits performance due to poor network congestion avoidance. Globally adaptive routing algorithms attack this issue by introducing a congestion propagation network to obtain network status information beyond neighboring nodes. However, they may suffer from intra- and inter-application interference during output port selection for consolidated workloads, coupling the behavior of otherwise independent applications and negatively affecting performance.
To address these two issues, we propose Destination-Based Adaptive Routing (DBAR). We design a novel low-cost congestion propagation network that leverages both local and non-local network information for more accurate congestion estimates. Thus, DBAR offers effective adaptivity for congestion beyond neighboring nodes. More importantly, by integrating the destination into the selection function, DBAR mitigates intra- and inter-application interference and offers dynamic isolation among regions. Experimental results show that DBAR can offer better performance than the best baseline algorithm for all measured configurations; it is well suited for workload consolidation. The wiring overhead of DBAR is low and DBAR provides improvement in the energy-delay product for medium and high injection rates.
Supplemental Material
- G. Ascia, V. Catania, M. Palesi, and D. Patti. Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip. Computers, IEEE Transactions on, 57(6):809--820, June 2008. Google ScholarDigital Library
- S. Bell et al. TILE64 - processor: A 64-core SoC with mesh interconnect. In ISSCC 2008, pages 88--598, February 2008.Google Scholar
- G.-M. Chiu. The odd-even turn model for adaptive routing. Parallel and Distributed Systems, IEEE Transactions on, 11(7):729--738, July 2000. Google ScholarDigital Library
- C.-L. Chou and R. Marculescu. Run-time task allocation considering user behavior in embedded multiprocessor networks-on-chip. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 29(1):78--91, January 2010. Google ScholarDigital Library
- C.-L. Chou, U. Ogras, and R. Marculescu. Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 27(10):1866--1879, October 2008. Google ScholarDigital Library
- W. Dally and C. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. Computers, IEEE Transactions on, C-36(5):547--553, May 1987. Google ScholarDigital Library
- W. Dally and B. Towles. Route packets, not wires: on-chip interconnection networks. In DAC 2001, pages 684--689, May 2001. Google ScholarDigital Library
- W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003. Google ScholarDigital Library
- W. J. Dally and H. Aoki. Deadlock-free adaptive routing in multicomputer networks using virtual channels. Parallel and Distributed Systems, IEEE Transactions on, 4:466--475, April 1993. Google ScholarDigital Library
- J. Duato. A new theory of deadlock-free adaptive routing in wormhole networks. Parallel and Distributed Systems, IEEE Transactions on, 4(12):1320--1331, December 1993. Google ScholarDigital Library
- J. Duato. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. Parallel and Distributed Systems, IEEE Transactions on, 6(10):1055--1067, October 1995. Google ScholarDigital Library
- J. Duato. A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. Parallel and Distributed Systems, IEEE Transactions on, 7(8):841--854, August 1996. Google ScholarDigital Library
- N. Enright Jerger and L. Peh. On-Chip Networks. Morgan and Claypool Publishers, San Francisco, CA, USA, 1 edition, 2009. Google ScholarDigital Library
- W.-C. Feng and K. G. Shin. Impact of selection functions on routing algorithm performance in multicomputer networks. In ICS 1997, pages 132--139, July 1997. Google ScholarDigital Library
- M. Galles. Spider: a high-speed network interconnect. Micro, IEEE, 17(1):34--39, January-February 1997. Google ScholarDigital Library
- C. Glass and L. Ni. The turn model for adaptive routing. In ISCA 1992, pages 278--287, June 1992. Google ScholarDigital Library
- P. Gratz, B. Grot, and S. Keckler. Regional congestion awareness for load balance in networks-on-chip. In HPCA 2008, pages 203--214, February 2008.Google ScholarCross Ref
- P. Gratz, K. Sankaralingam, H. Hanson, P. Shivakumar, R. McDonald, S. Keckler, and D. Burger. Implementation and evaluation of a dynamically routed processor operand network. In NOCS 2007, pages 7--17, May 2007. Google ScholarDigital Library
- Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-GHz mesh interconnect for a Teraflops processor. Micro, IEEE, 27(5):51--61, September-October 2007. Google ScholarDigital Library
- J. Hu and R. Marculescu. DyAD - smart routing for networks-on-chip. In DAC 2004, pages 260--263, June 2004. Google ScholarDigital Library
- J. Hu and R. Marculescu. Energy- and performance-aware mapping for regular NoC architectures. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 24(4):551--562, April 2005. Google ScholarDigital Library
- ITRS. International Technology Roadmap for Semiconductors, 2007 edition. http://www.itrs.net, 2007.Google Scholar
- M. Karol, M. Hluchyj, and S. Morgan. Input versus output queueing on a space-division packet switch. Communications, IEEE Transactions on, 35(12):1347--1356, December 1987.Google ScholarCross Ref
- J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan, and C. Das. A low latency router supporting adaptivity for on-chip interconnects. In DAC 2005, pages 559--564, June 2005. Google ScholarDigital Library
- A. Kumar, P. Kundu, A. Singh, L.-S. Peh, and N. Jha. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In ICCD 2007, pages 63--70, October 2007.Google ScholarCross Ref
- T. Lei and S. Kumar. A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In DSD 2003, pages 180--187, September 2003. Google ScholarDigital Library
- M. Li, Q.-A. Zeng, and W.-B. Jone. DyXY - a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In DAC 2006, pages 849--852, June 2006. Google ScholarDigital Library
- D. llitzky, J. Hoffman, A. Chun, and B. Esparza. Architecture of the scalable communications core's network on chip. Micro, IEEE, 27(5):62--74, September-October 2007. Google ScholarDigital Library
- J. C. Martínez, F. Silla, P. López, and J. Duato. On the influence of the selection function on the performance of networks of workstations. In ISHPC 2000, pages 292--299, October 2000. Google ScholarDigital Library
- G. Michelogiannakis, D. Sanchez, W. Dally, and C. Kozyrakis. Evaluating bufferless flow control for on-chip networks. In NOCS 2010, pages 9--16, May 2010. Google ScholarDigital Library
- O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In ISCA 2008, pages 63--74, June 2008. Google ScholarDigital Library
- L.-S. Peh and W. Dally. A delay model and speculative architecture for pipelined routers. In HPCA 2001, pages 255--266, May 2001. Google ScholarDigital Library
- R. S. Ramanujam and B. Lin. Destination-based adaptive routing on 2D mesh networks. In ANCS 2010, pages 19:1--19:12, October 25-26 2010. Google ScholarDigital Library
- S. Rodrigo, J. Flich, J. Duato, and M. Hummel. Efficient unicast and multicast support for CMPs. In MICRO 2008, pages 364--375, November 2008. Google ScholarDigital Library
- L. Schwiebert and R. Bell. Performance tuning of adaptive wormhole routing through selection function choice. J. Parallel Distrib. Comput., 62:1121--1141, July 2002. Google ScholarDigital Library
- A. Singh, W. Dally, A. Gupta, and B. Towles. GOAL: a load-balanced adaptive routing algorithm for torus networks. In ISCA 2003, pages 194--205, June 2003. Google ScholarDigital Library
- SPEC. SPEC benchmarks. http://www.spec.org, 2009.Google Scholar
- TPC. TPC benchmarks. http://www.tpc.org, 2008.Google Scholar
- S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In ISCA 1995, pages 24--36, June 1995. Google ScholarDigital Library
- S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS 2010, pages 129--142, March 2010. Google ScholarDigital Library
Index Terms
- DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip
Recommendations
DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip
ISCA '11With the emergence of many-core architectures, it is quite likely that multiple applications will run concurrently on a system. Existing locally and globally adaptive routing algorithms largely overlook issues associated with workload consolidation. The ...
RISO: relaxed network-on-chip isolation for cloud processors
DAC '13: Proceedings of the 50th Annual Design Automation ConferenceCloud service providers use workload consolidation technique in many-core cloud processors to optimize system utilization and augment performance for ever extending scale-out workloads. Performance isolation usually has to be enforced for the ...
Holistic Routing Algorithm Design to Support Workload Consolidation in NoCs
To provide efficient, high-performance routing algorithms, a holistic approach should be taken. The key aspects of routing algorithm design include adaptivity, path selection strategy, VC allocation, isolation, and hardware implementation cost; these ...
Comments