research-article

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

Authors:
Soroosh Khoram

University of Wisconsin-Madison, Madison, WI, USA

University of Wisconsin-Madison, Madison, WI, USA
View Profile

,
Jialiang Zhang

University of Wisconsin-Madison, Madison, WI, USA

University of Wisconsin-Madison, Madison, WI, USA
View Profile

,
Maxwell Strange

University of Wisconsin-Madison, Madison, WI, USA

University of Wisconsin-Madison, Madison, WI, USA
View Profile

,
Jing Li

University of Wisconsin-Madison, Madison, WI, USA

University of Wisconsin-Madison, Madison, WI, USA
View Profile

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2018Pages 239–248https://doi.org/10.1145/3174243.3174260

Published:15 February 2018Publication History

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 239–248

ABSTRACT

Graph analytics, which explores the relationships among interconnected entities, is becoming increasingly important due to its broad applicability, from machine learning to social sciences. However, due to the irregular data access patterns in graph computations, one major challenge for graph processing systems is performance. The algorithms, softwares, and hardwares that have been tailored for mainstream parallel applications are generally not effective for massive, sparse graphs from the real-world problems, due to their complex and irregular structures. To address the performance issues in large-scale graph analytics, we leverage the exceptional random access performance of the emerging Hybrid Memory Cube (HMC) combined with the flexibility and efficiency of modern FPGAs. In particular, we develop a collaborative software/hardware technique to perform a level-synchronized Breadth First Search (BFS) on a FPGA-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that exploits the FPGA-HMC platform»s capability to improve data locality and memory access efficiency. From the hardware perspective, we further improve the FPGA-HMC graph processor architecture by designing a memory request merging unit to take advantage of the increased data locality resulting from graph clustering. We evaluate the performance of our BFS implementation using the AC-510 development kit from Micron and achieve $2.8 \times$ average performance improvement compared to the latest FPGA-HMC based graph processing system over a set of benchmarks from a wide range of applications.

References

Gary D. Bader and Christopher W. Hogue. 2003. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4, 1 (13 Jan 2003), 2.Google Scholar
Scott Beamer, Krste Asanovic, and David Patterson. 2012. Direction-optimizing Breadth-First Search. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for. 1--10. Google ScholarDigital Library
Marcelo Blatt, Shai Wiseman, and Eytan Domany. 1996. Superparamagnetic Clustering of Data. Phys. Rev. Lett. 76 (Apr 1996), 3251--3254. Issue 18.Google ScholarCross Ref
Ulrik Brandes, Marco Gaertler, and Dorothea Wagner. 2003. Experiments on Graph Clustering Algorithms. Springer Berlin Heidelberg, Berlin, Heidelberg, 568--579.Google Scholar
Sylvain Brohée and Jacques van Helden. 2006. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7, 1 (06 Nov 2006), 488.Google Scholar
Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY, USA, 105--110. Google ScholarDigital Library
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages. Google ScholarDigital Library
Pedro Felzenszwalb and Ramin Zabih. 2011. Dynamic Programming and Graph Algorithms in Computer Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 4 (April 2011), 721--740. Google ScholarDigital Library
M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. 2010. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In 2010 Proceedings IEEE INFOCOM. 1--9. Google ScholarDigital Library
Taher Haveliwala. 2003. Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search. IEEE Transactions on Knowledge and Data Engineering 15, 4 (July 2003), 784--796. Google ScholarDigital Library
Andrew D. King, Natasa Pržulj, and Igor Jurisica. 2004. Protein complex prediction via cost-based clustering. Bioinformatics 20, 17 (2004), 3013--3020. Google ScholarDigital Library
Mehmet Koyutürk, Ananth Grama, and Wojciech Szpankowski. 2004. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics 20, suppl-1 (2004), i200--i207. Google ScholarDigital Library
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). USENIX, Hollywood, CA, 31--46. https://www.usenix.org/conference/osdi12/technical-sessions/ presentation/kyrola Google ScholarDigital Library
Guoqing Lei, Rongchun Li, Song Guo, and Fei Xia. 2015. TorusBFS: A Novel Message-passing Parallel Breadth-First Search Architecture on FPGAs. Engineering Science and Technology, an International Journal 5, 5 (10 2015), 313--318.Google Scholar
Kyle Locke. 2011. Parameterizable Content-Addressable Memory. https://www.xilinx.com/support/documentation/application_notes/xapp1151_ Param_CAM.pdf. (2011).Google Scholar
Duane Merrill, Michael Garland, and Andrew Grimshaw. 2012. Scalable GPU Graph Traversal. SIGPLAN Not. 47, 8 (Feb. 2012), 117--128. Google ScholarDigital Library
J Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot Chips 23 Symposium (HCS). 1--24.Google ScholarCross Ref
Picocomputing. {n. d.}. Hybrid Memory Cube (HMC) and Controller IP. http: //picocomputing.com/hybrid-memory-cube-hmc-controller-ip/. ({n. d.}).Google Scholar
Picocomputing. {n. d.}. UltraScale-based SuperProcessor with Hybrid Memory Cube. http://picocomputing.com/ac-510-superprocessor-module. ({n. d.}).Google Scholar
Paul Rosenfeld. 2014. Performance exploration of the hybrid memory cube. Ph.D. Dissertation. Department of Electrical Engineering at University of Maryland.Google Scholar
Yaman Umuroglu, Donn Morrison, and Magnus Jahre. 2015. Hybrid breadthfirst search on a single-chip FPGA-CPU heterogeneous platform. In 2015 25th International Conference on Field Programmable Logic and Applications (FPL). 1--8.Google ScholarCross Ref
Stijn van Dongen. 2000. Graph clustering by flow simulation. Ph.D. Dissertation. University of Utrecht.Google Scholar
Yangzihao Wang, Andrew A. Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2015. Gunrock: A High-Performance Graph Processing Library on the GPU. CoRR abs/1501.05387 (2015). arXiv:1501.05387 http://arxiv. org/abs/1501.05387 Google ScholarDigital Library
Jialiang Zhang, Soroosh Khoram, and Jing Li. 2017. Boosting the Performance of FPGA-based Graph Processor Using Hybrid Memory Cube: A Case for Breadth First Search. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, New York, NY, USA, 207--216. Google ScholarDigital Library

Index Terms

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform
1. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
      1. Emerging architectures
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
      2. Reconfigurable logic applications
2. Theory of computation
  1. Design and analysis of algorithms
    1. Graph algorithms analysis

Recommendations

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Graph traversal is a core primitive for graph analytics and a basis for many higher-level graph analysis methods. However, irregularities in the structure of scale-free graphs (e.g., social network) limit our ability to analyze these important and ...
Read More
Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms

Graph processing is one of the important research topics in the big-data era. To build a general framework for graph processing by using a DRAM-based FPGA board with deep memory hierarchy, one of the reasonable methods is to partition a given big graph ...
Read More
Improving Performance of Graph Processing on FPGA-DRAM Platform by Two-level Vertex Caching
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

In recent years, graph processing attracts lots of attention due to its broad applicability in solving real-world problems. With the flexibility and programmability, FPGA platforms provide the opportunity of processing the graph data with high ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2018
310 pages
ISBN:9781450356145
DOI:10.1145/3174243
General Chair:
Jason H. Anderson
University of Toronto, Canada
,
Program Chair:
Kia Bazargan
University of Minnesota, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 February 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph analytics
graph clustering
hardware accelerators
hybrid memory cube
reconfigurable logic
Qualifiers
- research-article
Conference

Acceptance Rates
FPGA '18 Paper Acceptance Rate10of116submissions,9%Overall Acceptance Rate125of627submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 622
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms

Improving Performance of Graph Processing on FPGA-DRAM Platform by Two-level Vertex Caching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an FPGA-HMC Platform

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

Processing Grid-format Real-world Graphs on DRAM-based FPGA Accelerators with Application-specific Caching Mechanisms

Improving Performance of Graph Processing on FPGA-DRAM Platform by Two-level Vertex Caching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media