ABSTRACT
Computations performed by graph algorithms are data driven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot provide the level of efficient random access required by graph computation. On the other hand, memory-based approaches usually do not scale due to the capacity limit of single machines. In this paper, we introduce Trinity, a general purpose graph engine over a distributed memory cloud. Through optimized memory management and network communication, Trinity supports fast graph exploration as well as efficient parallel computing. In particular, Trinity leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance. These enable Trinity to support efficient online query processing and offline analytics on large graphs with just a few commodity machines. Furthermore, Trinity provides a high level specification language called TSL for users to declare data schema and communication protocols, which brings great ease-of-use for general purpose graph management and computing. Our experiments show Trinity's performance in both low latency graph queries as well as high throughput graph analytics on web-scale, billion-node graphs.
- http://graphlab.org/.Google Scholar
- http://hadoop.apache.org/.Google Scholar
- http://incubator.apache.org/giraph/.Google Scholar
- http://neo4j.org/.Google Scholar
- http://www.graph500.org/.Google Scholar
- How to partition a billion-scale graph. Technical report, Microsoft Research, 2012.Google Scholar
- M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. SOSP '07, pages 159--174, 2007. Google ScholarDigital Library
- D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. Fawn: a fast array of wimpy nodes. SOSP '09, pages 1--14. Google ScholarDigital Library
- J. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In IPDPS 2007, pages 1--14, 2007.Google ScholarCross Ref
- D. Borthakur. The Hadoop Distributed File System: Architecture and Design, 2007.Google Scholar
- R. Bramandia, B. Choi, and W. K. Ng. Incremental maintenance of 2-hop labeling of large graphs. TKDE, 22(5):682--698, 2010. Google ScholarDigital Library
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. SDM '04, 2004.Google ScholarCross Ref
- T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: an engineering perspective. PODC '07, pages 398--407, 2007. Google ScholarDigital Library
- J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, pages 913--922, 2008. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI '04, pages 137--150.Google ScholarDigital Library
- E. W. Dijkstra. Shmuel Safra's version of termination detection. Jan. 1987.Google Scholar
- B. Fitzpatrick. Distributed caching with memcached. Linux J., August 2004. Google ScholarDigital Library
- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012. Google ScholarDigital Library
- D. Gregor and A. Lumsdaine. The Parallel BGL: A generic library for distributed graph computations. POOSC '05.Google Scholar
- Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics, 3(2-3):158--182, 2005. Google ScholarDigital Library
- H. Higaki, K. Shima, T. Tachikawa, and M. Takizawa. Checkpoint and rollback in asynchronous distributed systems. INFOCOM '97, pages 998--, 1997.Google Scholar
- B. Iordanov. Hypergraphdb: a generalized graph database. WAIM '10, pages 25--36, 2010. Google ScholarDigital Library
- U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. ICDM '09, pages 229--238, 2009. Google ScholarDigital Library
- G. Karypis and V. Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. Supercomputing '96. Google ScholarDigital Library
- A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In OSDI, pages 31--46, 2012. Google ScholarDigital Library
- A. Lumsdaine, D. Gregor, B. Hendrickson, and J. W. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5--20, 2007.Google ScholarCross Ref
- N. A. Lynch. Distributed Algorithms. 1996. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. SIGMOD '10. Google ScholarDigital Library
- D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in ramcloud. SOSP '11, pages 29--41, 2011. Google ScholarDigital Library
- J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for ramclouds: scalable high-performance storage entirely in dram. SIGOPS Oper. Syst. Rev., 43:92--105, 2010. Google ScholarDigital Library
- T. Schütt, F. Schintke, and A. Reinefeld. Scalaris: reliable transactional p2p key/value store. ERLANG '08, pages 41--48, 2008. Google ScholarDigital Library
- Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow., 5(9):788--799, May 2012. Google ScholarDigital Library
- W. Wu, H. Li, H. Wang, and K. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD, 2012. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. HotCloud'10, pages 10--10, 2010. Google ScholarDigital Library
- J. Zawodny. Redis: Lightweight key/value store that goes the extra mile. Linux Magazine, 2009.Google Scholar
- K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A distributed graph engine for web scale RDF data. In VLDB 2013. Google ScholarDigital Library
- X. Zhao, A. Sala, C. Wilson, H. Zheng, and B. Y. Zhao. Orion: shortest path estimation for large social graphs. WOSN'10, pages 9--9, 2010. Google ScholarDigital Library
Index Terms
- Trinity: a distributed graph engine on a memory cloud
Recommendations
AKIN: a streaming graph partitioning algorithm for distributed graph storage systems
CCGrid '18: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid ComputingMany graph-related applications face the challenge of managing excessive and ever-growing graph data in a distributed environment. Therefore, it is necessary to consider a graph partitioning algorithm to distribute graph data onto multiple machines as ...
Managing and mining large graphs: systems and implementations
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of DataWe are facing challenges at all levels ranging from infrastructures to programming models for managing and mining large graphs. A lot of algorithms on graphs are ad-hoc in the sense that each of them assumes that the underlying graph data can be ...
Efficient Subgraph Matching on Non-volatile Memory
Web Information Systems Engineering – WISE 2017AbstractThe emerging non-volatile memory (NVM) technologies have attracted much attention due to its advantages over the existing DRAM technology such as non-volatility, byte-addressability and high storage density. These promising features make NVM a ...
Comments