skip to main content
research-article

GraphMat: high performance graph analytics made productive

Published:01 July 2015Publication History
Skip Abstract Section

Abstract

Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix operations in the backend. We thus get the productivity benefits of a vertex programming framework without sacrificing performance. GraphMat is a single-node multicore graph framework written in C++ which has enabled us to write a diverse set of graph algorithms with the same effort compared to other vertex programming frameworks. GraphMat performs 1.1-7X faster than high performance frameworks such as GraphLab, CombBLAS and Galois. GraphMat also matches the performance of MapGraph, a GPU-based graph framework, despite running on a CPU platform with significantly lower compute and bandwidth resources. It achieves better multicore scalability (13-15X on 24 cores) than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms. Since GraphMat performance depends mainly on a few scalable and well-understood sparse matrix operations, GraphMat can naturally benefit from the trend of increasing parallelism in future hardware.

References

  1. Apache giraph. http://giraph.apache.org/.Google ScholarGoogle Scholar
  2. Apache spark. https://spark.apache.org/.Google ScholarGoogle Scholar
  3. Combinatorial Blas v 1.3. http://gauss.cs.ucsb.edu/aydin/CombBLAS/html/.Google ScholarGoogle Scholar
  4. Galois v 2.2.0. http://iss.ices.utexas.edu/?p=projects/galois/download.Google ScholarGoogle Scholar
  5. Graphlab v 2.2. http://graphlab.org.Google ScholarGoogle Scholar
  6. SciDB. http://www.scidb.org.Google ScholarGoogle Scholar
  7. Dimacs implementation challenges. http://dimacs.rutgers.edu/Challenges/, 2014.Google ScholarGoogle Scholar
  8. J. Bennett and S. Lanning. The Netflix Prize. In KDD Cup and Workshop at ACM SIGKDD, 2007.Google ScholarGoogle Scholar
  9. A. Buluç and J. R. Gilbert. On the representation and multiplication of hypersparse matrices. In 22nd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, Miami, Florida USA, April 14-18, 2008, pages 1--11, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. Buluç and J. R. Gilbert. Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments. SIAM J. Scientific Computing, 34(4), 2012.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Buluç and J. R. Gilbert. The combinatorial blas: Design, implementation, and applications. Int. J. High Perform. Comput. Appl., 25(4):496--509, Nov. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Ching. Scaling apache giraph to a trillion edges. www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920, 2013.Google ScholarGoogle Scholar
  13. A. A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel gpu methods for single-source shortest paths. In International Parallel and Distributed Processing Symposium, volume 28, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Davis. The University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Fu, M. Personick, and B. Thompson. Mapgraph: A high level api for fast development of high performance graph analytics on gpus. In Proceedings of Workshop on GRAph Data management Experiences and Systems, pages 1--6. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-marl: A dsl for easy and efficient graph analysis. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 349--362, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Ideker, O. Ozier, B. Schwikowski, and A. F. Siegel. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics, 18(1):233--240, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Jindal, S. Madden, M. Castellanos, and M. Hsu. Graph Analytics using the Vertica Relational Database. ArXiv e-prints, Dec. 2014.Google ScholarGoogle Scholar
  19. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM '09, pages 229--238, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30--37, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Kwak, C. Lee, H. Park, and S. B. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In UAI, July 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Mattson, D. Bader, J. Berry, A. Buluc, J. Dongarra, C. Faloutsos, J. Feo, J. Gilbert, J. Gonzalez, B. Hendrickson, J. Kepner, C. Leiserson, A. Lumsdaine, D. Padua, S. Poole, S. Reinhardt, M. Stonebraker, S. Wallach, and A. Yoo. Standards for graph algorithm primitives. In High Performance Extreme Computing Conference (HPEC), 2013 IEEE, pages 1--2, Sept 2013.Google ScholarGoogle ScholarCross RefCross Ref
  24. R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. Introducing the graph 500. Cray User's Group (CUG), 2010.Google ScholarGoogle Scholar
  25. D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 456--471. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Pingali, D. Nguyen, M. Kulkarni, et al. The tao of parallelism in algorithms. In PLDI, pages 12--25, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook. Springer, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  28. N. Satish, N. Sundaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 979--990, New York, NY, USA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Seo, S. Guo, and M. S. Lam. SociaLite: Datalog extensions for efficient social network analysis. ICDE'13, pages 278--289, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Seo, J. Park, J. Shin, and M. S. Lam. Distributed sociaLite: A datalog-based language for large-scale graph analysis. Proceedings of the VLDB Endowment, 6(14), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Tizghadam and A. Leon-Garcia. A graph theoretical approach to traffic engineering and network control problem. In Teletraffic Congress, 2009. ITC 21 2009. 21st International, pages 1--8, Sept 2009.Google ScholarGoogle Scholar
  32. C. Wilson, B. Boe, A. Sala, K. P. N. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys, pages 205--218, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, GRADES '13, pages 2:1--2:6, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In Data Mining (ICDM), 2012 IEEE 12th International Conference on, pages 745--754, Dec 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A.-J. N. Yzelman and D. Roose. High-level strategies for parallel shared-memory sparse matrix-vector multiplication. IEEE Transactions on Parallel and Distributed Systems, 25(1):116--125, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GraphMat: high performance graph analytics made productive

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 8, Issue 11
        July 2015
        264 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 July 2015
        Published in pvldb Volume 8, Issue 11

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader