High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics

Authors:
Karthi Duraisamy

Washington State University at Pullman, Pullman, WA

Washington State University at Pullman, Pullman, WA
View Profile

,
Hao Lu

Washington State University at Pullman, Pullman, WA

Washington State University at Pullman, Pullman, WA
View Profile

,
Partha Pratim Pande

Washington State University at Pullman, Pullman, WA

Washington State University at Pullman, Pullman, WA
View Profile

,
Ananth Kalyanaraman

Washington State University at Pullman, Pullman, WA

Washington State University at Pullman, Pullman, WA
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 15 Issue 4Article No.: 66pp 1–26https://doi.org/10.1145/2961027

Published:01 September 2016Publication History

ACM Transactions on Embedded Computing Systems

Abstract

With its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. One of the most important components of a multicore chip is its communication backbone. Due to inherent irregularities in data movements manifested by graph-based applications, it is essential to design efficient on-chip interconnection architectures for multicore chips performing graph analytics. In this article, we present a detailed analysis of the traffic patterns generated by graph-based applications when mapped to multicore chips. Based on this analysis, we explore the design-space for the Network-on-Chip (NoC) architecture to enable an efficient implementation of graph analytics. We principally consider three types of NoC architectures, viz., traditional mesh, small-world, and high-radix networks. We demonstrate that the small-world-network-enabled wireless NoC (WiNoC) is the most suitable platform for executing the considered graph applications. The WiNoC achieves an average of 38% and 18% full-system Energy Delay Product savings compared to wireline-mesh and high-radix NoCs, respectively.

References

N. Abeyratne, R. Das, Q. Li, K. Sewell, B. Giridhar, R. G. Dreslinski, D. Blaauw, and T. Mudge. 2013. Scaling toward kilo-core processors with asymmetric high-radix topologies. In Proceedings of 19th International Symposium on High Performance Computer Architecture (HPCA2013). 496--507. Google ScholarDigital Library
D. A. Bader, G. Cong, and John Feo. 2005. On the architectural requirements for efficient execution of graph algorithms. In Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005). 547--556. Google ScholarDigital Library
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Said, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The GEM5 simulator. ACM SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarDigital Library
H. L. Bodlaendera and F. V. Fominb. 2005. Equitable colorings of bounded treewidth graphs. Theoretical Computer Science 349, 1, 22--30. Google ScholarDigital Library
W. Bogaerts, M. Fiers, and P. Dumon. 2014. Design challenges in silicon photonics. IEEE Journal of Selected Topics in Quantum Electronics 20, 4, 1,8 (July-Aug. 2014).Google ScholarCross Ref
J. Branch, X. Guo, A. Sugavanam, J. J. Lin, and K. K. O. 2005. Wireless communication in a flip-chip package using integrated antennas on silicon substrates. IEEE Electronic Device Letters 26, 2, 115--117.Google ScholarCross Ref
M. Castro, E. Francesquini, T. M. Nguélé, and J. F. Méhaut. 2013. Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application. In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms. Google ScholarDigital Library
Umit V. Çatalyürek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen. 2012. Graph coloring algorithms for multi-core and massively multithreaded architectures. Parallel Computing 38, 10--11 (October 2012), 576--594. Google ScholarDigital Library
D. Chavarría-Miranda, M. Halappanavar, and A. Kalyanaraman. 2014. Scaling graph community detection on the tilera manycore architecture. In Proceedings of HiPC 2014, Goa, India, 2014.Google Scholar
D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, and A. Choudhury. 2012. Looking under the hood of the IBM blue gene/Q network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--12. Google ScholarDigital Library
W. J. Dally and C. L. Seitz. 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Computer C-36, 5 (May 1987), 547--553. Google ScholarDigital Library
B. D. De Dinechin, D. Van Amstel, M. Poulhies, and G. Lager. 2014.Time-critical computing on a single-chip massively parallel processor. In Proceedings of IEEE DATE. 1--6. Google ScholarDigital Library
S. Deb, A. Ganguly, P. P. Pande, B. Belzer, D. Heo. 2012. Wireless NoC as interconnection backbone for multicore chips: Promises and challenges. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 2, 228--239.Google ScholarCross Ref
S. Deb, K. Chang, Yu Xinmin, S. P. Sah, M. Cosic, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo. 2013. Design of an energy efficient CMOS compatible NoC architecture with millimeter-wave wireless interconnects. IEEE Transactions on Computers 62, 12, 2382--2396. Google ScholarDigital Library
DIMACS10. 2016. The 10th DIMACS implementation challenge -- Graph partioning and clustering. URL: http://www.cc.gatech.edu/dimacs10/ (Last date accessed: May 2016).Google Scholar
K. Duraisamy, R. G. Kim, and P. P. Pande. 2015. Enhancing performance of wireless NoCs with distributed MAC protocols. In Proceeedings of ISQED. 2015.Google Scholar
D. Ediger. 2013. Analyzing Hybrid Architectures for Massively Parallel Graph Analysis. Ph.D. Dissertation, Georgia Institute of Technology, Atlanta, Ga., (May 2013).Google Scholar
S. Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3, 75--174.Google ScholarCross Ref
E. Francesquini, M. Castro, P. H. Penna, F. Dupros, H. C. Freitas, P. O. Navaux, and J. F. Méhaut. 2015. On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms. Journal of Parallel and Distributed Computing 76, 32--48. Google ScholarDigital Library
M. Frasca, K. Madduri, and P. Raghavan. 2012. NUMA-aware graph mining techniques for performance and energy efficiency. In Proceedings of IEEE International Conference on High Performance Computing. Networking, Storage and Analysis (SC), 1--11. Google ScholarDigital Library
H. Furmanczyk. 2004. Equitable coloring of graphs. In Graph Colorings, M. Kubale (Ed.). Contemporary Mathematics, Vol. 352. American Mathematical Society, Providence, Rhode Island, 35--53.Google Scholar
L. Gwennup. 2011. Adapteva: More flops, less watts: Epiphany offers floating-point accelerator for mobile processors. Microprocess. Rep 2, 1--5.Google Scholar
T. R. Jensen and B. Toft. 1995. Graph Coloring Problems. Wiley Series in Discrete Mathematics and Optimization, Wiley Interscience, New York.Google Scholar
M. T. Jones and P. E. Plassmann. 1993. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing 14, 3, 654--669. Google ScholarDigital Library
J. Kim, J. Balfour, and W. J. Dally. 2007. Flattened Butter-Fly: A cost-efficient topology for high-radix networks. IEEE Computer Architecture Letters 6, 2, 37--40. Google ScholarDigital Library
T. Krishna, A. Kumar, P. Chiang, M. Erez, and L. Peh. 2008. NoC with near-ideal express virtual channels using global-line communication. In Proceedings of the 16th IEEE Symposium on High Performance Interconnects (HOTI’08). 11--20, 26--28. Google ScholarDigital Library
T. Krishna, C. O. Chen, S. Park, W. C. Kwon, S. Subramanian, A. P. Chandrakasan, and L. Peh. 2013. Single-cycle multihop asynchronous repeated traversal: A smart future for reconfigurable on-chip networks. IEEE Computer 10, 48--55. Google ScholarDigital Library
T. Krishna, C. O. Chen, W. C. Kwon, and L. Peh. 2014. Smart:single-cycle multihop traversals over a shared network on chip. IEEE Micro 34, 3, 43--56.Google ScholarCross Ref
A. Kumar, L. Peh, P. Kundu, and N. K. Jha. 2008. Toward ideal on-chip communication using express virtual channels. IEEE Micro 28, 1, 80--90. Google ScholarDigital Library
F. T. Leighton. 1979. A graph coloring algorithm for large scheduling problems. Journal of Research of the National Bureau of Standards 84, 6, 489--506.Google ScholarCross Ref
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT:An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480. Google ScholarDigital Library
J. J. Lin, H. Wu, Y. Su, L. Gao, A. Sugavanam, J. E. Brewer, and K. K. O. 2007. Communication using antennas fabricated in silicon integrated circuits. IEEE Journal of Solid-State Circuits 42, 8, 1678--1687.Google ScholarCross Ref
H. Lu, M. Halappanavar, and A. Kalyanaraman. 2015b. Parallel heuristics for scalable community detection. Parallel Computing 47, 19--37. Google ScholarDigital Library
H. Lu, M. Halappanavar, D. Chavarria-Miranda, A. Gebremedhin, and A. Kalyanaraman. 2015a. Balanced coloring for parallel computing applications, In Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 25--29, Hyderabad, India. Google ScholarDigital Library
R. Marculescu, U. Y. Ogras, Peh Li-Shiuan, N. E. Jerger, and Y. Hoskote. 2009. Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 1, 3--21. Google ScholarDigital Library
M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23, 8577--8582.Google ScholarCross Ref
U. Y. Ogras and R. Marculescu. 2006. It's a small world after all: NoC performance optimization via long-range link insertion. IEEE Trans. Very Large Scale Integration Systems. 14, 7, 693--706. Google ScholarDigital Library
P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. 2005. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers. 54, 8, 1025--1040. Google ScholarDigital Library
T. Petermann and P. De Los Rios. 2005. Spatial small-world networks: A wiring cost perspective. arXiv: Condmat/0501420v2.Google Scholar
J. E. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader. 2012. Parallel community detection for massive graphs. In Parallel Processing and Applied Mathematics. Springer, Berlin, 286--296. Google ScholarDigital Library
Y. Saad. 2003. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA. Google ScholarDigital Library
E. E. Schadt, M. Linderman, J. Sorenson, L. Lee, and G. P. Nolan. 2010. Computational solutions to large-scale data management and analysis. Journal of Nature Reviews Genetics 11, 9, 647--657.Google ScholarCross Ref
E. Seok and K. K. O. 2005. Design rules for improving predictability of on-chip antenna characteristics in the presence of other metal structures. In Proceedings of IEEE International Interconnect Technology Conference. 6--8, 120--122.Google Scholar
K. Sewell, R. G. Dreslinski, T. Manville, S. Satpathy, N. Pinckney, G. Blake, M. Cieslak, R. Das, T. F. Wenisch, D. Sylvester, D. Blaauw, and T. Mudge. 2012. Swizzle-switch networks for many-core systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 2, 278--294.Google ScholarCross Ref
C. L. Staudt and M. Meyerhenke. 2013. Engineering high-performance community detection heuristics for massive graphs. In Proceedings of 42nd International Conference on Parallel Processing (ICPP). 180--189. Google ScholarDigital Library
Tilera Corporation. 2015. TILE-Gx72 Processor Product Brief. http://www.tilera.com/files/drim__TILE-Gx8072_PB041-04_WEB_7683.pdf (Last Accessed: May. 2016).Google Scholar
D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Letters to Nature. 393.6684, 440--442.Google Scholar
P. Wettin, R. Kim, J. Murray, Yu Xinmin, P. P. Pande, A. Ganguly, and D. Heoamlan. 2014. Design-space exploration for wireless NoCs incorporating irregular network routing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33, 11, (Nov. 2014), 1732--1745.Google ScholarCross Ref
B. Wu, Y. Dong, Q. Ke, and Y. Cai. 2011. A parallel computing model for large graph mining with MapReduce. In Proceedings of 7^th International Conference on Natural Computation (ICNC). 43--47.Google Scholar
Y. P. Zhang, Z. M. Chen, and M. Sun. 2007. Propagation mechanisms of radio waves over intra-chip channels with integrated antennas: Frequency-domain measurements and time-domain analysis. IEEE Transactions on Antennas and Propagation 55, 10, 2900--2906.Google ScholarCross Ref

Index Terms

High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics
1. Hardware
  1. Very large scale integration design
    1. Analog and mixed-signal circuits
      1. Radio frequency and wireless circuits
    2. Design reuse and communication-based design
      1. Network on chip
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

High performance and energy efficient wireless NoC-enabled multicore architectures for graph analytics
CASES '15: Proceedings of the 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems

With its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. The most important component of a multicore chip is its communication backbone. Due to the inherent ...
Read More
A shortly connected mesh topology for high performance and energy efficient network-on-chip architectures

Network-on-chip-based communication schemes represent a promising solution to the increasing complexity of system-on-chip problems. In this paper, we propose a new mesh-like topology called the shortly connected mesh technology (ScMesh), which is based ...
Read More
Design of high-performance, energy-efficient, and reliable network-on-chip (noc) architectures
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 15, Issue 4
Special Issue on ESWEEK2015 and Regular Papers
August 2016
411 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2982215
Editor:
Sandeep K. Shukla
Indian Institute of Technology, India
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 1 September 2016
- Revised: 1 May 2016
- Accepted: 1 May 2016
- Received: 1 December 2015
Published in tecs Volume 15, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Graph analytics
community detection
graph coloring
wireless NoCs
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 502
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

High performance and energy efficient wireless NoC-enabled multicore architectures for graph analytics

A shortly connected mesh topology for high performance and energy efficient network-on-chip architectures

Design of high-performance, energy-efficient, and reliable network-on-chip (noc) architectures