Abstract
With its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. One of the most important components of a multicore chip is its communication backbone. Due to inherent irregularities in data movements manifested by graph-based applications, it is essential to design efficient on-chip interconnection architectures for multicore chips performing graph analytics. In this article, we present a detailed analysis of the traffic patterns generated by graph-based applications when mapped to multicore chips. Based on this analysis, we explore the design-space for the Network-on-Chip (NoC) architecture to enable an efficient implementation of graph analytics. We principally consider three types of NoC architectures, viz., traditional mesh, small-world, and high-radix networks. We demonstrate that the small-world-network-enabled wireless NoC (WiNoC) is the most suitable platform for executing the considered graph applications. The WiNoC achieves an average of 38% and 18% full-system Energy Delay Product savings compared to wireline-mesh and high-radix NoCs, respectively.
- N. Abeyratne, R. Das, Q. Li, K. Sewell, B. Giridhar, R. G. Dreslinski, D. Blaauw, and T. Mudge. 2013. Scaling toward kilo-core processors with asymmetric high-radix topologies. In Proceedings of 19th International Symposium on High Performance Computer Architecture (HPCA2013). 496--507. Google ScholarDigital Library
- D. A. Bader, G. Cong, and John Feo. 2005. On the architectural requirements for efficient execution of graph algorithms. In Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005). 547--556. Google ScholarDigital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Said, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The GEM5 simulator. ACM SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarDigital Library
- H. L. Bodlaendera and F. V. Fominb. 2005. Equitable colorings of bounded treewidth graphs. Theoretical Computer Science 349, 1, 22--30. Google ScholarDigital Library
- W. Bogaerts, M. Fiers, and P. Dumon. 2014. Design challenges in silicon photonics. IEEE Journal of Selected Topics in Quantum Electronics 20, 4, 1,8 (July-Aug. 2014).Google ScholarCross Ref
- J. Branch, X. Guo, A. Sugavanam, J. J. Lin, and K. K. O. 2005. Wireless communication in a flip-chip package using integrated antennas on silicon substrates. IEEE Electronic Device Letters 26, 2, 115--117.Google ScholarCross Ref
- M. Castro, E. Francesquini, T. M. Nguélé, and J. F. Méhaut. 2013. Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application. In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms. Google ScholarDigital Library
- Umit V. Çatalyürek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen. 2012. Graph coloring algorithms for multi-core and massively multithreaded architectures. Parallel Computing 38, 10--11 (October 2012), 576--594. Google ScholarDigital Library
- D. Chavarría-Miranda, M. Halappanavar, and A. Kalyanaraman. 2014. Scaling graph community detection on the tilera manycore architecture. In Proceedings of HiPC 2014, Goa, India, 2014.Google Scholar
- D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, and A. Choudhury. 2012. Looking under the hood of the IBM blue gene/Q network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--12. Google ScholarDigital Library
- W. J. Dally and C. L. Seitz. 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Computer C-36, 5 (May 1987), 547--553. Google ScholarDigital Library
- B. D. De Dinechin, D. Van Amstel, M. Poulhies, and G. Lager. 2014.Time-critical computing on a single-chip massively parallel processor. In Proceedings of IEEE DATE. 1--6. Google ScholarDigital Library
- S. Deb, A. Ganguly, P. P. Pande, B. Belzer, D. Heo. 2012. Wireless NoC as interconnection backbone for multicore chips: Promises and challenges. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 2, 228--239.Google ScholarCross Ref
- S. Deb, K. Chang, Yu Xinmin, S. P. Sah, M. Cosic, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo. 2013. Design of an energy efficient CMOS compatible NoC architecture with millimeter-wave wireless interconnects. IEEE Transactions on Computers 62, 12, 2382--2396. Google ScholarDigital Library
- DIMACS10. 2016. The 10th DIMACS implementation challenge -- Graph partioning and clustering. URL: http://www.cc.gatech.edu/dimacs10/ (Last date accessed: May 2016).Google Scholar
- K. Duraisamy, R. G. Kim, and P. P. Pande. 2015. Enhancing performance of wireless NoCs with distributed MAC protocols. In Proceeedings of ISQED. 2015.Google Scholar
- D. Ediger. 2013. Analyzing Hybrid Architectures for Massively Parallel Graph Analysis. Ph.D. Dissertation, Georgia Institute of Technology, Atlanta, Ga., (May 2013).Google Scholar
- S. Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3, 75--174.Google ScholarCross Ref
- E. Francesquini, M. Castro, P. H. Penna, F. Dupros, H. C. Freitas, P. O. Navaux, and J. F. Méhaut. 2015. On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms. Journal of Parallel and Distributed Computing 76, 32--48. Google ScholarDigital Library
- M. Frasca, K. Madduri, and P. Raghavan. 2012. NUMA-aware graph mining techniques for performance and energy efficiency. In Proceedings of IEEE International Conference on High Performance Computing. Networking, Storage and Analysis (SC), 1--11. Google ScholarDigital Library
- H. Furmanczyk. 2004. Equitable coloring of graphs. In Graph Colorings, M. Kubale (Ed.). Contemporary Mathematics, Vol. 352. American Mathematical Society, Providence, Rhode Island, 35--53.Google Scholar
- L. Gwennup. 2011. Adapteva: More flops, less watts: Epiphany offers floating-point accelerator for mobile processors. Microprocess. Rep 2, 1--5.Google Scholar
- T. R. Jensen and B. Toft. 1995. Graph Coloring Problems. Wiley Series in Discrete Mathematics and Optimization, Wiley Interscience, New York.Google Scholar
- M. T. Jones and P. E. Plassmann. 1993. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing 14, 3, 654--669. Google ScholarDigital Library
- J. Kim, J. Balfour, and W. J. Dally. 2007. Flattened Butter-Fly: A cost-efficient topology for high-radix networks. IEEE Computer Architecture Letters 6, 2, 37--40. Google ScholarDigital Library
- T. Krishna, A. Kumar, P. Chiang, M. Erez, and L. Peh. 2008. NoC with near-ideal express virtual channels using global-line communication. In Proceedings of the 16th IEEE Symposium on High Performance Interconnects (HOTI’08). 11--20, 26--28. Google ScholarDigital Library
- T. Krishna, C. O. Chen, S. Park, W. C. Kwon, S. Subramanian, A. P. Chandrakasan, and L. Peh. 2013. Single-cycle multihop asynchronous repeated traversal: A smart future for reconfigurable on-chip networks. IEEE Computer 10, 48--55. Google ScholarDigital Library
- T. Krishna, C. O. Chen, W. C. Kwon, and L. Peh. 2014. Smart:single-cycle multihop traversals over a shared network on chip. IEEE Micro 34, 3, 43--56.Google ScholarCross Ref
- A. Kumar, L. Peh, P. Kundu, and N. K. Jha. 2008. Toward ideal on-chip communication using express virtual channels. IEEE Micro 28, 1, 80--90. Google ScholarDigital Library
- F. T. Leighton. 1979. A graph coloring algorithm for large scheduling problems. Journal of Research of the National Bureau of Standards 84, 6, 489--506.Google ScholarCross Ref
- S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT:An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480. Google ScholarDigital Library
- J. J. Lin, H. Wu, Y. Su, L. Gao, A. Sugavanam, J. E. Brewer, and K. K. O. 2007. Communication using antennas fabricated in silicon integrated circuits. IEEE Journal of Solid-State Circuits 42, 8, 1678--1687.Google ScholarCross Ref
- H. Lu, M. Halappanavar, and A. Kalyanaraman. 2015b. Parallel heuristics for scalable community detection. Parallel Computing 47, 19--37. Google ScholarDigital Library
- H. Lu, M. Halappanavar, D. Chavarria-Miranda, A. Gebremedhin, and A. Kalyanaraman. 2015a. Balanced coloring for parallel computing applications, In Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 25--29, Hyderabad, India. Google ScholarDigital Library
- R. Marculescu, U. Y. Ogras, Peh Li-Shiuan, N. E. Jerger, and Y. Hoskote. 2009. Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 1, 3--21. Google ScholarDigital Library
- M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23, 8577--8582.Google ScholarCross Ref
- U. Y. Ogras and R. Marculescu. 2006. It's a small world after all: NoC performance optimization via long-range link insertion. IEEE Trans. Very Large Scale Integration Systems. 14, 7, 693--706. Google ScholarDigital Library
- P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. 2005. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers. 54, 8, 1025--1040. Google ScholarDigital Library
- T. Petermann and P. De Los Rios. 2005. Spatial small-world networks: A wiring cost perspective. arXiv: Condmat/0501420v2.Google Scholar
- J. E. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader. 2012. Parallel community detection for massive graphs. In Parallel Processing and Applied Mathematics. Springer, Berlin, 286--296. Google ScholarDigital Library
- Y. Saad. 2003. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA. Google ScholarDigital Library
- E. E. Schadt, M. Linderman, J. Sorenson, L. Lee, and G. P. Nolan. 2010. Computational solutions to large-scale data management and analysis. Journal of Nature Reviews Genetics 11, 9, 647--657.Google ScholarCross Ref
- E. Seok and K. K. O. 2005. Design rules for improving predictability of on-chip antenna characteristics in the presence of other metal structures. In Proceedings of IEEE International Interconnect Technology Conference. 6--8, 120--122.Google Scholar
- K. Sewell, R. G. Dreslinski, T. Manville, S. Satpathy, N. Pinckney, G. Blake, M. Cieslak, R. Das, T. F. Wenisch, D. Sylvester, D. Blaauw, and T. Mudge. 2012. Swizzle-switch networks for many-core systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 2, 278--294.Google ScholarCross Ref
- C. L. Staudt and M. Meyerhenke. 2013. Engineering high-performance community detection heuristics for massive graphs. In Proceedings of 42nd International Conference on Parallel Processing (ICPP). 180--189. Google ScholarDigital Library
- Tilera Corporation. 2015. TILE-Gx72 Processor Product Brief. http://www.tilera.com/files/drim__TILE-Gx8072_PB041-04_WEB_7683.pdf (Last Accessed: May. 2016).Google Scholar
- D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Letters to Nature. 393.6684, 440--442.Google Scholar
- P. Wettin, R. Kim, J. Murray, Yu Xinmin, P. P. Pande, A. Ganguly, and D. Heoamlan. 2014. Design-space exploration for wireless NoCs incorporating irregular network routing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33, 11, (Nov. 2014), 1732--1745.Google ScholarCross Ref
- B. Wu, Y. Dong, Q. Ke, and Y. Cai. 2011. A parallel computing model for large graph mining with MapReduce. In Proceedings of 7th International Conference on Natural Computation (ICNC). 43--47.Google Scholar
- Y. P. Zhang, Z. M. Chen, and M. Sun. 2007. Propagation mechanisms of radio waves over intra-chip channels with integrated antennas: Frequency-domain measurements and time-domain analysis. IEEE Transactions on Antennas and Propagation 55, 10, 2900--2906.Google ScholarCross Ref
Index Terms
- High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics
Recommendations
High performance and energy efficient wireless NoC-enabled multicore architectures for graph analytics
CASES '15: Proceedings of the 2015 International Conference on Compilers, Architecture and Synthesis for Embedded SystemsWith its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. The most important component of a multicore chip is its communication backbone. Due to the inherent ...
A shortly connected mesh topology for high performance and energy efficient network-on-chip architectures
Network-on-chip-based communication schemes represent a promising solution to the increasing complexity of system-on-chip problems. In this paper, we propose a new mesh-like topology called the shortly connected mesh technology (ScMesh), which is based ...
Comments