skip to main content
research-article
Public Access

High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics

Published:01 September 2016Publication History
Skip Abstract Section

Abstract

With its applicability spanning numerous data-driven fields, the implementation of graph analytics on multicore platforms is gaining momentum. One of the most important components of a multicore chip is its communication backbone. Due to inherent irregularities in data movements manifested by graph-based applications, it is essential to design efficient on-chip interconnection architectures for multicore chips performing graph analytics. In this article, we present a detailed analysis of the traffic patterns generated by graph-based applications when mapped to multicore chips. Based on this analysis, we explore the design-space for the Network-on-Chip (NoC) architecture to enable an efficient implementation of graph analytics. We principally consider three types of NoC architectures, viz., traditional mesh, small-world, and high-radix networks. We demonstrate that the small-world-network-enabled wireless NoC (WiNoC) is the most suitable platform for executing the considered graph applications. The WiNoC achieves an average of 38% and 18% full-system Energy Delay Product savings compared to wireline-mesh and high-radix NoCs, respectively.

References

  1. N. Abeyratne, R. Das, Q. Li, K. Sewell, B. Giridhar, R. G. Dreslinski, D. Blaauw, and T. Mudge. 2013. Scaling toward kilo-core processors with asymmetric high-radix topologies. In Proceedings of 19th International Symposium on High Performance Computer Architecture (HPCA2013). 496--507. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. A. Bader, G. Cong, and John Feo. 2005. On the architectural requirements for efficient execution of graph algorithms. In Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005). 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Said, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The GEM5 simulator. ACM SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. L. Bodlaendera and F. V. Fominb. 2005. Equitable colorings of bounded treewidth graphs. Theoretical Computer Science 349, 1, 22--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. Bogaerts, M. Fiers, and P. Dumon. 2014. Design challenges in silicon photonics. IEEE Journal of Selected Topics in Quantum Electronics 20, 4, 1,8 (July-Aug. 2014).Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Branch, X. Guo, A. Sugavanam, J. J. Lin, and K. K. O. 2005. Wireless communication in a flip-chip package using integrated antennas on silicon substrates. IEEE Electronic Device Letters 26, 2, 115--117.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Castro, E. Francesquini, T. M. Nguélé, and J. F. Méhaut. 2013. Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application. In Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Umit V. Çatalyürek, J. Feo, A. H. Gebremedhin, M. Halappanavar, and A. Pothen. 2012. Graph coloring algorithms for multi-core and massively multithreaded architectures. Parallel Computing 38, 10--11 (October 2012), 576--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Chavarría-Miranda, M. Halappanavar, and A. Kalyanaraman. 2014. Scaling graph community detection on the tilera manycore architecture. In Proceedings of HiPC 2014, Goa, India, 2014.Google ScholarGoogle Scholar
  10. D. Chen, N. Eisley, P. Heidelberger, S. Kumar, A. Mamidala, F. Petrini, R. Senger, Y. Sugawara, R. Walkup, B. Steinmacher-Burow, and A. Choudhury. 2012. Looking under the hood of the IBM blue gene/Q network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. J. Dally and C. L. Seitz. 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Computer C-36, 5 (May 1987), 547--553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. D. De Dinechin, D. Van Amstel, M. Poulhies, and G. Lager. 2014.Time-critical computing on a single-chip massively parallel processor. In Proceedings of IEEE DATE. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Deb, A. Ganguly, P. P. Pande, B. Belzer, D. Heo. 2012. Wireless NoC as interconnection backbone for multicore chips: Promises and challenges. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 2, 228--239.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Deb, K. Chang, Yu Xinmin, S. P. Sah, M. Cosic, A. Ganguly, P. P. Pande, B. Belzer, and D. Heo. 2013. Design of an energy efficient CMOS compatible NoC architecture with millimeter-wave wireless interconnects. IEEE Transactions on Computers 62, 12, 2382--2396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. DIMACS10. 2016. The 10th DIMACS implementation challenge -- Graph partioning and clustering. URL: http://www.cc.gatech.edu/dimacs10/ (Last date accessed: May 2016).Google ScholarGoogle Scholar
  16. K. Duraisamy, R. G. Kim, and P. P. Pande. 2015. Enhancing performance of wireless NoCs with distributed MAC protocols. In Proceeedings of ISQED. 2015.Google ScholarGoogle Scholar
  17. D. Ediger. 2013. Analyzing Hybrid Architectures for Massively Parallel Graph Analysis. Ph.D. Dissertation, Georgia Institute of Technology, Atlanta, Ga., (May 2013).Google ScholarGoogle Scholar
  18. S. Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3, 75--174.Google ScholarGoogle ScholarCross RefCross Ref
  19. E. Francesquini, M. Castro, P. H. Penna, F. Dupros, H. C. Freitas, P. O. Navaux, and J. F. Méhaut. 2015. On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms. Journal of Parallel and Distributed Computing 76, 32--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Frasca, K. Madduri, and P. Raghavan. 2012. NUMA-aware graph mining techniques for performance and energy efficiency. In Proceedings of IEEE International Conference on High Performance Computing. Networking, Storage and Analysis (SC), 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Furmanczyk. 2004. Equitable coloring of graphs. In Graph Colorings, M. Kubale (Ed.). Contemporary Mathematics, Vol. 352. American Mathematical Society, Providence, Rhode Island, 35--53.Google ScholarGoogle Scholar
  22. L. Gwennup. 2011. Adapteva: More flops, less watts: Epiphany offers floating-point accelerator for mobile processors. Microprocess. Rep 2, 1--5.Google ScholarGoogle Scholar
  23. T. R. Jensen and B. Toft. 1995. Graph Coloring Problems. Wiley Series in Discrete Mathematics and Optimization, Wiley Interscience, New York.Google ScholarGoogle Scholar
  24. M. T. Jones and P. E. Plassmann. 1993. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing 14, 3, 654--669. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Kim, J. Balfour, and W. J. Dally. 2007. Flattened Butter-Fly: A cost-efficient topology for high-radix networks. IEEE Computer Architecture Letters 6, 2, 37--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Krishna, A. Kumar, P. Chiang, M. Erez, and L. Peh. 2008. NoC with near-ideal express virtual channels using global-line communication. In Proceedings of the 16th IEEE Symposium on High Performance Interconnects (HOTI’08). 11--20, 26--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Krishna, C. O. Chen, S. Park, W. C. Kwon, S. Subramanian, A. P. Chandrakasan, and L. Peh. 2013. Single-cycle multihop asynchronous repeated traversal: A smart future for reconfigurable on-chip networks. IEEE Computer 10, 48--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Krishna, C. O. Chen, W. C. Kwon, and L. Peh. 2014. Smart:single-cycle multihop traversals over a shared network on chip. IEEE Micro 34, 3, 43--56.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Kumar, L. Peh, P. Kundu, and N. K. Jha. 2008. Toward ideal on-chip communication using express virtual channels. IEEE Micro 28, 1, 80--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. T. Leighton. 1979. A graph coloring algorithm for large scheduling problems. Journal of Research of the National Bureau of Standards 84, 6, 489--506.Google ScholarGoogle ScholarCross RefCross Ref
  31. S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT:An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. J. Lin, H. Wu, Y. Su, L. Gao, A. Sugavanam, J. E. Brewer, and K. K. O. 2007. Communication using antennas fabricated in silicon integrated circuits. IEEE Journal of Solid-State Circuits 42, 8, 1678--1687.Google ScholarGoogle ScholarCross RefCross Ref
  33. H. Lu, M. Halappanavar, and A. Kalyanaraman. 2015b. Parallel heuristics for scalable community detection. Parallel Computing 47, 19--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Lu, M. Halappanavar, D. Chavarria-Miranda, A. Gebremedhin, and A. Kalyanaraman. 2015a. Balanced coloring for parallel computing applications, In Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 25--29, Hyderabad, India. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Marculescu, U. Y. Ogras, Peh Li-Shiuan, N. E. Jerger, and Y. Hoskote. 2009. Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 1, 3--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23, 8577--8582.Google ScholarGoogle ScholarCross RefCross Ref
  37. U. Y. Ogras and R. Marculescu. 2006. It's a small world after all: NoC performance optimization via long-range link insertion. IEEE Trans. Very Large Scale Integration Systems. 14, 7, 693--706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. 2005. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers. 54, 8, 1025--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Petermann and P. De Los Rios. 2005. Spatial small-world networks: A wiring cost perspective. arXiv: Condmat/0501420v2.Google ScholarGoogle Scholar
  40. J. E. Riedy, H. Meyerhenke, D. Ediger, and D. A. Bader. 2012. Parallel community detection for massive graphs. In Parallel Processing and Applied Mathematics. Springer, Berlin, 286--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Saad. 2003. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. E. E. Schadt, M. Linderman, J. Sorenson, L. Lee, and G. P. Nolan. 2010. Computational solutions to large-scale data management and analysis. Journal of Nature Reviews Genetics 11, 9, 647--657.Google ScholarGoogle ScholarCross RefCross Ref
  43. E. Seok and K. K. O. 2005. Design rules for improving predictability of on-chip antenna characteristics in the presence of other metal structures. In Proceedings of IEEE International Interconnect Technology Conference. 6--8, 120--122.Google ScholarGoogle Scholar
  44. K. Sewell, R. G. Dreslinski, T. Manville, S. Satpathy, N. Pinckney, G. Blake, M. Cieslak, R. Das, T. F. Wenisch, D. Sylvester, D. Blaauw, and T. Mudge. 2012. Swizzle-switch networks for many-core systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2, 2, 278--294.Google ScholarGoogle ScholarCross RefCross Ref
  45. C. L. Staudt and M. Meyerhenke. 2013. Engineering high-performance community detection heuristics for massive graphs. In Proceedings of 42nd International Conference on Parallel Processing (ICPP). 180--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tilera Corporation. 2015. TILE-Gx72 Processor Product Brief. http://www.tilera.com/files/drim__TILE-Gx8072_PB041-04_WEB_7683.pdf (Last Accessed: May. 2016).Google ScholarGoogle Scholar
  47. D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Letters to Nature. 393.6684, 440--442.Google ScholarGoogle Scholar
  48. P. Wettin, R. Kim, J. Murray, Yu Xinmin, P. P. Pande, A. Ganguly, and D. Heoamlan. 2014. Design-space exploration for wireless NoCs incorporating irregular network routing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33, 11, (Nov. 2014), 1732--1745.Google ScholarGoogle ScholarCross RefCross Ref
  49. B. Wu, Y. Dong, Q. Ke, and Y. Cai. 2011. A parallel computing model for large graph mining with MapReduce. In Proceedings of 7th International Conference on Natural Computation (ICNC). 43--47.Google ScholarGoogle Scholar
  50. Y. P. Zhang, Z. M. Chen, and M. Sun. 2007. Propagation mechanisms of radio waves over intra-chip channels with integrated antennas: Frequency-domain measurements and time-domain analysis. IEEE Transactions on Antennas and Propagation 55, 10, 2900--2906.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. High-Performance and Energy-Efficient Network-on-Chip Architectures for Graph Analytics

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 15, Issue 4
          Special Issue on ESWEEK2015 and Regular Papers
          August 2016
          411 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/2982215
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 September 2016
          • Revised: 1 May 2016
          • Accepted: 1 May 2016
          • Received: 1 December 2015
          Published in tecs Volume 15, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader