skip to main content
10.1145/3174243.3174260acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

Authors Info & Claims
Published:15 February 2018Publication History

ABSTRACT

Graph analytics, which explores the relationships among interconnected entities, is becoming increasingly important due to its broad applicability, from machine learning to social sciences. However, due to the irregular data access patterns in graph computations, one major challenge for graph processing systems is performance. The algorithms, softwares, and hardwares that have been tailored for mainstream parallel applications are generally not effective for massive, sparse graphs from the real-world problems, due to their complex and irregular structures. To address the performance issues in large-scale graph analytics, we leverage the exceptional random access performance of the emerging Hybrid Memory Cube (HMC) combined with the flexibility and efficiency of modern FPGAs. In particular, we develop a collaborative software/hardware technique to perform a level-synchronized Breadth First Search (BFS) on a FPGA-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that exploits the FPGA-HMC platform»s capability to improve data locality and memory access efficiency. From the hardware perspective, we further improve the FPGA-HMC graph processor architecture by designing a memory request merging unit to take advantage of the increased data locality resulting from graph clustering. We evaluate the performance of our BFS implementation using the AC-510 development kit from Micron and achieve $2.8 \times$ average performance improvement compared to the latest FPGA-HMC based graph processing system over a set of benchmarks from a wide range of applications.

References

  1. Gary D. Bader and Christopher W. Hogue. 2003. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 1 (13 Jan 2003), 2.Google ScholarGoogle Scholar
  2. Scott Beamer, Krste Asanovic, and David Patterson. 2012. Direction-optimizing Breadth-First Search. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Marcelo Blatt, Shai Wiseman, and Eytan Domany. 1996. Superparamagnetic Clustering of Data. Phys. Rev. Lett. 76 (Apr 1996), 3251--3254. Issue 18.Google ScholarGoogle ScholarCross RefCross Ref
  4. Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. 2003. Experiments on Graph Clustering Algorithms. Springer Berlin Heidelberg, Berlin, Heidelberg, 568--579.Google ScholarGoogle Scholar
  5. Sylvain Brohée and Jacques van Helden. 2006. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7, 1 (06 Nov 2006), 488.Google ScholarGoogle Scholar
  6. Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY, USA, 105--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pedro Felzenszwalb and Ramin Zabih. 2011. Dynamic Programming and Graph Algorithms in Computer Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 4 (April 2011), 721--740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. 2010. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In 2010 Proceedings IEEE INFOCOM. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Taher Haveliwala. 2003. Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search. IEEE Transactions on Knowledge and Data Engineering 15, 4 (July 2003), 784--796. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Andrew D. King, Natasa Pržulj, and Igor Jurisica. 2004. Protein complex prediction via cost-based clustering. Bioinformatics 20, 17 (2004), 3013--3020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mehmet Koyutürk, Ananth Grama, and Wojciech Szpankowski. 2004. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics 20, suppl-1 (2004), i200--i207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX, Hollywood, CA, 31--46. https://www.usenix.org/conference/osdi12/technical-sessions/ presentation/kyrola Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Guoqing Lei, Rongchun Li, Song Guo, and Fei Xia. 2015. TorusBFS: A Novel Message-passing Parallel Breadth-First Search Architecture on FPGAs. Engineering Science and Technology, an International Journal 5, 5 (10 2015), 313--318.Google ScholarGoogle Scholar
  15. Kyle Locke. 2011. Parameterizable Content-Addressable Memory. https://www.xilinx.com/support/documentation/application_notes/xapp1151_ Param_CAM.pdf. (2011).Google ScholarGoogle Scholar
  16. Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU Graph Traversal. SIGPLAN Not. 47, 8 (Feb. 2012), 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot Chips 23 Symposium (HCS). 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  18. Picocomputing. {n. d.}. Hybrid Memory Cube (HMC) and Controller IP. http: //picocomputing.com/hybrid-memory-cube-hmc-controller-ip/. ({n. d.}).Google ScholarGoogle Scholar
  19. Picocomputing. {n. d.}. UltraScale-based SuperProcessor with Hybrid Memory Cube. http://picocomputing.com/ac-510-superprocessor-module. ({n. d.}).Google ScholarGoogle Scholar
  20. Paul Rosenfeld. 2014. Performance exploration of the hybrid memory cube. Ph.D. Dissertation. Department of Electrical Engineering at University of Maryland.Google ScholarGoogle Scholar
  21. Yaman Umuroglu, Donn Morrison, and Magnus Jahre. 2015. Hybrid breadthfirst search on a single-chip FPGA-CPU heterogeneous platform. In 2015 25th International Conference on Field Programmable Logic and Applications (FPL). 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  22. Stijn van Dongen. 2000. Graph clustering by flow simulation. Ph.D. Dissertation. University of Utrecht.Google ScholarGoogle Scholar
  23. Yangzihao Wang, Andrew A. Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2015. Gunrock: A High-Performance Graph Processing Library on the GPU. CoRR abs/1501.05387 (2015). arXiv:1501.05387 http://arxiv. org/abs/1501.05387 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jialiang Zhang, Soroosh Khoram, and Jing Li. 2017. Boosting the Performance of FPGA-based Graph Processor Using Hybrid Memory Cube: A Case for Breadth First Search. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
            February 2018
            310 pages
            ISBN:9781450356145
            DOI:10.1145/3174243

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 February 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            FPGA '18 Paper Acceptance Rate10of116submissions,9%Overall Acceptance Rate125of627submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader