skip to main content
research-article
Public Access

Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee

Published:01 July 2021Publication History
Skip Abstract Section

Abstract

To hide the complexity of the underlying system, graph processing frameworks ask programmers to specify graph computations in user-defined functions (UDFs) of graph-oriented programming model. Due to the nature of distributed execution, current frameworks cannot precisely enforce the semantics of UDFs, leading to unnecessary computation and communication. It exemplifies a gap between programming model and runtime execution. This article proposes novel graph processing frameworks for distributed system and Processing-in-memory (PIM) architecture that precisely enforces loop-carried dependency; i.e., when a condition is satisfied by a neighbor, all following neighbors can be skipped. Our approach instruments the UDFs to express the loop-carried dependency, then the distributed execution framework enforces the precise semantics by performing dependency propagation dynamically. Enforcing loop-carried dependency requires the sequential processing of the neighbors of each vertex distributed in different nodes. We propose to circulant scheduling in the framework to allow different nodes to process disjoint sets of edges/vertices in parallel while satisfying the sequential requirement. The technique achieves an excellent trade-off between precise semantics and parallelism—the benefits of eliminating unnecessary computation and communication offset the reduced parallelism. We implement a new distributed graph processing framework SympleGraph, and two variants of runtime systems—GraphS and GraphSR—for PIM-based graph processing architecture, which significantly outperform the state-of-the-art.

References

  1. Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA’15). IEEE, 105–117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tero Aittokallio and Benno Schwikowski. 2006. Graph-based methods for analysing networks in cell biology. Brief. Bioinform. 7, 3 (2006), 243–255.Google ScholarGoogle ScholarCross RefCross Ref
  3. Andrei Alexandrescu and Katrin Kirchhoff. 2007. Data-driven graph construction for semi-supervised graph-based learning in NLP. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 204–211.Google ScholarGoogle Scholar
  4. ARM. 2009. ARM Cortex-A5 Processor. Retrieved from http://www.arm.com/products/processors/cortex-a/cortex-a5.php.Google ScholarGoogle Scholar
  5. Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and optimization of the memory hierarchy for graph processing workloads. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’19). IEEE, 373–386.Google ScholarGoogle ScholarCross RefCross Ref
  6. Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner et al. 2018. Relational inductive biases, deep learning, and graph networks. Retrieved from https://arXiv:1806.01261.Google ScholarGoogle Scholar
  7. Scott Beamer, Krste Asanović, and David Patterson. 2012. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA, Article 12, 10 pages. Retrieved from http://dl.acm.org/citation.cfm?id=2388996.2389013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Scott Beamer, Krste Asanović, and David Patterson. 2015. The GAP Benchmark Suite. Retrieved from https://arXiv:cs.DC/1508.03619.Google ScholarGoogle Scholar
  9. Scott Beamer, Aydin Buluc, Krste Asanovic, and David Patterson. 2013. Distributed memory breadth-first search revisited: Enabling bottom-up search. In Proceeding sof the IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum. IEEE, 1618–1627.Google ScholarGoogle Scholar
  10. Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th International Conference on World Wide Web. ACM, 587–596.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: Compression techniques. In Proceedings of the 13th International Conference on World Wide Web. ACM, 595–602.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Aydin Buluc, Scott Beamer, Kamesh Madduri, Krste Asanovic, and David Patterson. 2017. Distributed-memory breadth-first search on massive graphs. Retrieved from https://arXiv:1705.04590.Google ScholarGoogle Scholar
  13. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 442–446.Google ScholarGoogle ScholarCross RefCross Ref
  14. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the 10th European Conference on Computer Systems (EuroSys’15). ACM, New York, NY, Article 1, 15 pages. DOI:https://doi.org/10.1145/2741948.2741970Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thayne Coffman, Seth Greenblatt, and Sherry Marcus. 2004. Graph-based technologies for intelligence analysis. Commun. ACM 47, 3 (Mar. 2004), 45–47. DOI:https://doi.org/10.1145/971617.971643Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hybrid Memory Cube Consortium. 2015. Hybrid Memory Cube Specification Version 2.1. Technical Report.Google ScholarGoogle Scholar
  17. Guohao Dai, Tianhao Huang, Yuze Chi, Jishen Zhao, Guangyu Sun, Yongpan Liu, Yu Wang, Yuan Xie, and Huazhong Yang. 2018. Graphh: A processing-in-memory architecture for large-scale graph processing. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 34, 4 (2018), 640–653.Google ScholarGoogle Scholar
  18. Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’18). ACM, New York, NY, 752–768. DOI:https://doi.org/10.1145/3192366.3192404Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Anton J. Enright and Christos A. Ouzounis. 2001. BioLayout—An automatic graph layout algorithm for similarity visualization. Bioinformatics 17, 9 (2001), 853–854.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wenfei Fan, Jingbo Xu, Yinghui Wu, Wenyuan Yu, Jiaxin Jiang, Zeyu Zheng, Bohan Zhang, Yang Cao, and Chao Tian. 2017. Parallelizing sequential graph computations. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17). ACM, New York, NY, 495–510. DOI:https://doi.org/10.1145/3035918.3035942Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Francois Fouss, Alain Pirotte, Jean-Michel Renders, and Marco Saerens. 2007. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19, 3 (2007), 355–369.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In Proceedings of the International Conference on Parallel Architecture and Compilation (PACT’15). IEEE, 113–124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gurbinder Gill, Roshan Dathathri, Loc Hoang, Andrew Lenharth, and Keshav Pingali. 2018. Abelian: A compiler for graph analytics on distributed, heterogeneous platforms. In Proceedings of the European Conference on Parallel Processing. Springer, 249–264.Google ScholarGoogle ScholarCross RefCross Ref
  24. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, 17–30. Retrieved from http://dl.acm.org/citation.cfm?id=2387880.2387883.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, 599–613. Retrieved from http://dl.acm.org/citation.cfm?id=2685048.2685096.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Amit Goyal, Hal Daumé III, and Raul Guerra. 2012. Fast large-scale approximate graph construction for nlp. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 1069–1080.Google ScholarGoogle Scholar
  27. Graph500. 2010. Graph 500 Benchmarks. Retrieved from http://www.graph500.org.Google ScholarGoogle Scholar
  28. Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, New York, NY, 855–864. DOI:https://doi.org/10.1145/2939672.2939754Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ziyu Guan, Jiajun Bu, Qiaozhu Mei, Chun Chen, and Can Wang. 2009. Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 540–547.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, 1–13.Google ScholarGoogle ScholarCross RefCross Ref
  31. Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-Marl: A DSL for easy and efficient graph analysis. SIGPLAN Not. 47, 4 (Mar. 2012), 349–362. DOI:https://doi.org/10.1145/2248487.2151013Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sungpack Hong, Siegfried Depner, Thomas Manhardt, Jan Van Der Lugt, Merijn Verstraaten, and Hassan Chafi. 2015. PGX.D: A fast distributed graph processing engine. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, Article 58, 12 pages. DOI:https://doi.org/10.1145/2807591.2807620Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sungpack Hong, Nicole C. Rodia, and Kunle Olukotun. 2013. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’13). ACM, New York, NY, Article 92, 11 pages. DOI:https://doi.org/10.1145/2503210.2503246Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Imranul Hoque and Indranil Gupta. 2013. LFGraph: Simple and fast distributed graph analytics. In Proceedings of the First ACM SIGOPS Conference on Timely Results in Operating Systems (TRIOS’13). ACM, New York, NY, Article 9, 17 pages. DOI:https://doi.org/10.1145/2524211.2524218Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. C. Jeffrey, S. Subramanian, C. Yan, J. Emer, and D. Sanchez. 2016. Unlocking ordered parallelism with the swarm architecture. IEEE Micro 36, 3 (2016), 105–117. DOI:https://doi.org/10.1109/MM.2016.12Google ScholarGoogle ScholarCross RefCross Ref
  36. Andrew B. Kahng, Bin Li, Li-Shiuan Peh, and Kambiz Samadi. 2012. ORION 2.0: A power-area simulator for interconnection networks. IEEE Trans. Very Large Scale Integr. Syst. 20, 1 (Jan. 2012), 191–196. DOI:https://doi.org/10.1109/TVLSI.2010.2091686Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Gwangsun Kim, John Kim, Jung Ho Ahn, and Jaeha Kim. 2013. Memory-centric system interconnect design with hybrid memory cubes. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. IEEE Press, 145–156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 591–600. DOI:https://doi.org/10.1145/1772690.1772751Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nicolas Le Novere, Michael Hucka, Huaiyu Mi, Stuart Moodie, Falk Schreiber, Anatoly Sorokin, Emek Demir, Katja Wegner, Mirit I. Aladjem, Sarala M. Wimalaratne, et al. 2009. The systems biology graphical notation. Nature Biotechnology 27, 8 (2009), 735–741.Google ScholarGoogle ScholarCross RefCross Ref
  40. Dong Uk Lee, Kyung Whan Kim, Kwan Weon Kim, Hongjung Kim, Ju Young Kim, Young Jun Park, Jae Hwan Kim, Dae Suk Kim, Heat Bit Park, Jin Wook Shin, et al. 2014. 25.2 A 1.2 V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’14). IEEE, 432–433.Google ScholarGoogle Scholar
  41. Jure Leskovec and Andrej Krevl. 2014. friendster. Retrieved from https://snap.stanford.edu/data/com-Friendster.html.Google ScholarGoogle Scholar
  42. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 1 (2009), 29–123.Google ScholarGoogle ScholarCross RefCross Ref
  43. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). 469–480.Google ScholarGoogle Scholar
  44. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (Apr. 2012), 716–727. DOI:https://doi.org/10.14778/2212351.2212354Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 135–146. DOI:https://doi.org/10.1145/1807167.1807184Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Mugilan Mariappan and Keval Vora. 2019. GraphBolt: Dependency-driven synchronous processing of streaming graphs. In Proceedings of the 14th EuroSys Conference 2019 (EuroSys’19). ACM, New York, NY, Article 25, 16 pages. DOI:https://doi.org/10.1145/3302424.3303974Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. David W. Matula and Leland L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (1983), 417–427.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Julian McAuley and Jure Leskovec. 2012. Learning to discover social circles in ego networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12). Curran Associates, 539–547. Retrieved from http://dl.acm.org/citation.cfm?id=2999134.2999195.Google ScholarGoogle Scholar
  49. Frank McSherry. 2017. COST in the land of databases. Retrieved from https://github.com/frankmcsherry/blog/blob/master/posts/2017-09-23.md.Google ScholarGoogle Scholar
  50. Frank McSherry, Michael Isard, and Derek G Murray. 2015. Scalability! But at what {COST}? In Proceedings of the 15th Workshop on Hot Topics in Operating Systems (HotOS’15).Google ScholarGoogle Scholar
  51. Batul J. Mirza, Benjamin J. Keller, and Naren Ramakrishnan. 2003. Studying recommendation algorithms by graph analysis. J. Intell. Info. Syst. 20, 2 (2003), 131–160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’18). IEEE, 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. GraphPIM: Enabling instruction-level PIM offloading in graph computing frameworks. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’17). IEEE, 457–468.Google ScholarGoogle ScholarCross RefCross Ref
  54. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 456–471. DOI:https://doi.org/10.1145/2517349.2522739Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. The University of Texas at Austin. 2019. Texas Advanced Computing Center (TACC). Retrieved from https://www.tacc.utexas.edu/.Google ScholarGoogle Scholar
  56. Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven Burns, and Ozcan Ozturk. 2016. Energy efficient architecture for graph analytics accelerators. In Proceedings of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA’16). IEEE, 166–177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Sreepathi Pai and Keshav Pingali. 2016. A compiler for throughput optimization of graph algorithms on GPUs. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’16). ACM, New York, NY, 1–19. DOI:https://doi.org/10.1145/2983990.2984015Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 701–710. DOI:https://doi.org/10.1145/2623330.2623732Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. The Lemur Project. 2013. The ClueWeb12 Dataset. Retrieved from http://lemurproject.org/clueweb12/.Google ScholarGoogle Scholar
  60. Meikang Qiu, Lei Zhang, Zhong Ming, Zhi Chen, Xiao Qin, and Laurence T. Yang. 2013. Security-aware optimization for ubiquitous computing systems with SEAT graph approach. J. Comput. Syst. Sci. 79, 5 (2013), 518–529.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). Association for Computing Machinery, New York, NY, 472–488. DOI:https://doi.org/10.1145/2517349.2522740Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Semih Salihoglu and Jennifer Widom. 2013. GPS: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM’13). ACM, New York, NY, Article 22, 12 pages. DOI:https://doi.org/10.1145/2484838.2484843Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 475–486. DOI:https://doi.org/10.1145/2485922.2485963Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Satu Elisa Schaeffer. 2007. Graph clustering. Comput. Sci. Rev. 1, 1 (2007), 27–64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Jiwon Seo, Jongsoo Park, Jaeho Shin, and Monica S. Lam. 2013. Distributed socialite: A datalog-based language for large-scale graph analysis. Proc. VLDB Endow. 6, 14 (Sep. 2013), 1906–1917. DOI:https://doi.org/10.14778/2556549.2556572Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Manjunath Shevgoor, Jung-Sik Kim, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis, and Aniruddha N. Udipi. 2013. Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 198–209.Google ScholarGoogle Scholar
  67. Julian Shun. 2019. K-Core. Retrieved from http://jshun.github.io/ligra/docs/tutorial_kcore.html.Google ScholarGoogle Scholar
  68. Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 135–146. DOI:https://doi.org/10.1145/2442516.2442530Google ScholarGoogle Scholar
  69. Julian Shun, Farbod Roosta-Khorasani, Kimon Fountoulakis, and Michael W. Mahoney. 2016. Parallel local graph clustering. Proc. VLDB Endow. 9, 12 (Aug. 2016), 1041–1052. DOI:https://doi.org/10.14778/2994509.2994522Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. AM Stankovic and MS Calovic. 1989. Graph oriented algorithm for the steady-state security enhancement in distribution networks. IEEE Trans. Power Delivery 4, 1 (1989), 539–544.Google ScholarGoogle ScholarCross RefCross Ref
  71. Lei Tang and Huan Liu. 2010. Graph mining applications to social network analysis. In Managing and Mining Graph Data. Springer, 487–513.Google ScholarGoogle Scholar
  72. Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Sotware-defined cache hierarchies. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 652–665.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Keval Vora. 2019. LUMOS: Dependency-driven disk-based graph processing. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference (USENIX ATC’19). USENIX Association, USA, 429–442.Google ScholarGoogle Scholar
  74. Keval Vora, Rajiv Gupta, and Guoqing Xu. 2017. KickStarter: Fast and accurate computations on streaming graphs via trimmed approximations. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’17). Association for Computing Machinery, New York, NY, 237–251. DOI:https://doi.org/10.1145/3037697.3037748Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Keval Vora, Sai Charan Koduru, and Rajiv Gupta. 2014. ASPIRE: Exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA’14). ACM, New York, NY, 861–878. DOI:https://doi.org/10.1145/2660193.2660227Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Tianyi Wang, Yang Chen, Zengbin Zhang, Tianyin Xu, Long Jin, Pan Hui, Beixing Deng, and Xing Li. 2011. Understanding graph sampling algorithms for social network analysis. In Proceedings of the 31st International Conference on Distributed Computing Systems Workshops. IEEE, 123–128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. English Wikipedia. 2013. enwiki-2013. Retrieved from http://law.di.unimi.it/webdata/enwiki-2013/.Google ScholarGoogle Scholar
  78. Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. 2015. GraM: Scaling graph computation to the trillions. In Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC’15). ACM, New York, NY, 408–421. DOI:https://doi.org/10.1145/2806777.2806849Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Wencong Xiao, Jilong Xue, Youshan Miao, Zhen Li, Cheng Chen, Ming Wu, Wei Li, and Lidong Zhou. 2017. Tux2: Distributed graph computation for machine learning. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’17). USENIX Association, Berkeley, CA, 669–682.Google ScholarGoogle Scholar
  80. Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed aggregation for data-parallel computing: Interfaces and implementations. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). Association for Computing Machinery, New York, NY, 247–260. DOI:https://doi.org/10.1145/1629575.1629600Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Torsten Zesch and Iryna Gurevych. 2007. Analysis of the Wikipedia category graph for NLP applications. In Proceedings of the TextGraphs-2 Workshop (NAACL-HLT’07). 1–8.Google ScholarGoogle Scholar
  82. Mingxing Zhang, Yongwei Wu, Kang Chen, Xuehai Qian, Xue Li, and Weimin Zheng. 2016. Exploring the hidden dimension in graph processing. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Berkeley, CA, 285–300. Retrieved from http://dl.acm.org/citation.cfm?id=3026877.3026900.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing communication for PIM-based graph processing with efficient data partition. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 544–557.Google ScholarGoogle ScholarCross RefCross Ref
  84. Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A high-performance graph DSL. Proc. ACM Program. Lang. 2, OOPSLA, Article 121 (Oct. 2018), 30 pages. DOI:https://doi.org/10.1145/3276491Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Berkeley, CA, 301–316. http://dl.acm.org/citation.cfm?id=3026877.3026901Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-based graph processing. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’52). ACM, New York, NY, 712–725. DOI:https://doi.org/10.1145/3352460.3358256Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency Guarantee

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Computer Systems
      ACM Transactions on Computer Systems  Volume 37, Issue 1-4
      November 2019
      177 pages
      ISSN:0734-2071
      EISSN:1557-7333
      DOI:10.1145/3446674
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 July 2021
      • Accepted: 1 March 2021
      • Revised: 1 December 2020
      • Received: 1 July 2020
      Published in tocs Volume 37, Issue 1-4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format