research-article

Trinity: a distributed graph engine on a memory cloud

Authors:
Bin Shao

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Haixun Wang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Yatao Li

HKUST, Hong Kong, Hong Kong

HKUST, Hong Kong, Hong Kong
View Profile

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataJune 2013Pages 505–516https://doi.org/10.1145/2463676.2467799

Published:22 June 2013Publication History

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Pages 505–516

ABSTRACT

Computations performed by graph algorithms are data driven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot provide the level of efficient random access required by graph computation. On the other hand, memory-based approaches usually do not scale due to the capacity limit of single machines. In this paper, we introduce Trinity, a general purpose graph engine over a distributed memory cloud. Through optimized memory management and network communication, Trinity supports fast graph exploration as well as efficient parallel computing. In particular, Trinity leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance. These enable Trinity to support efficient online query processing and offline analytics on large graphs with just a few commodity machines. Furthermore, Trinity provides a high level specification language called TSL for users to declare data schema and communication protocols, which brings great ease-of-use for general purpose graph management and computing. Our experiments show Trinity's performance in both low latency graph queries as well as high throughput graph analytics on web-scale, billion-node graphs.

References

http://graphlab.org/.Google Scholar
http://hadoop.apache.org/.Google Scholar
http://incubator.apache.org/giraph/.Google Scholar
http://neo4j.org/.Google Scholar
http://www.graph500.org/.Google Scholar
How to partition a billion-scale graph. Technical report, Microsoft Research, 2012.Google Scholar
M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. SOSP '07, pages 159--174, 2007. Google ScholarDigital Library
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. Fawn: a fast array of wimpy nodes. SOSP '09, pages 1--14. Google ScholarDigital Library
J. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In IPDPS 2007, pages 1--14, 2007.Google ScholarCross Ref
D. Borthakur. The Hadoop Distributed File System: Architecture and Design, 2007.Google Scholar
R. Bramandia, B. Choi, and W. K. Ng. Incremental maintenance of 2-hop labeling of large graphs. TKDE, 22(5):682--698, 2010. Google ScholarDigital Library
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. SDM '04, 2004.Google ScholarCross Ref
T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: an engineering perspective. PODC '07, pages 398--407, 2007. Google ScholarDigital Library
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, pages 913--922, 2008. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI '04, pages 137--150.Google ScholarDigital Library
E. W. Dijkstra. Shmuel Safra's version of termination detection. Jan. 1987.Google Scholar
B. Fitzpatrick. Distributed caching with memcached. Linux J., August 2004. Google ScholarDigital Library
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30, 2012. Google ScholarDigital Library
D. Gregor and A. Lumsdaine. The Parallel BGL: A generic library for distributed graph computations. POOSC '05.Google Scholar
Y. Guo, Z. Pan, and J. Heflin. LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics, 3(2-3):158--182, 2005. Google ScholarDigital Library
H. Higaki, K. Shima, T. Tachikawa, and M. Takizawa. Checkpoint and rollback in asynchronous distributed systems. INFOCOM '97, pages 998--, 1997.Google Scholar
B. Iordanov. Hypergraphdb: a generalized graph database. WAIM '10, pages 25--36, 2010. Google ScholarDigital Library
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. ICDM '09, pages 229--238, 2009. Google ScholarDigital Library
G. Karypis and V. Kumar. Parallel multilevel k-way partitioning scheme for irregular graphs. Supercomputing '96. Google ScholarDigital Library
A. Kyrola, G. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a pc. In OSDI, pages 31--46, 2012. Google ScholarDigital Library
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. W. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(1):5--20, 2007.Google ScholarCross Ref
N. A. Lynch. Distributed Algorithms. 1996. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. SIGMOD '10. Google ScholarDigital Library
D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in ramcloud. SOSP '11, pages 29--41, 2011. Google ScholarDigital Library
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for ramclouds: scalable high-performance storage entirely in dram. SIGOPS Oper. Syst. Rev., 43:92--105, 2010. Google ScholarDigital Library
T. Schütt, F. Schintke, and A. Reinefeld. Scalaris: reliable transactional p2p key/value store. ERLANG '08, pages 41--48, 2008. Google ScholarDigital Library
Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow., 5(9):788--799, May 2012. Google ScholarDigital Library
W. Wu, H. Li, H. Wang, and K. Zhu. Probase: A probabilistic taxonomy for text understanding. In SIGMOD, 2012. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. HotCloud'10, pages 10--10, 2010. Google ScholarDigital Library
J. Zawodny. Redis: Lightweight key/value store that goes the extra mile. Linux Magazine, 2009.Google Scholar
K. Zeng, J. Yang, H. Wang, B. Shao, and Z. Wang. A distributed graph engine for web scale RDF data. In VLDB 2013. Google ScholarDigital Library
X. Zhao, A. Sala, C. Wilson, H. Zheng, and B. Y. Zhao. Orion: shortest path estimation for large social graphs. WOSN'10, pages 9--9, 2010. Google ScholarDigital Library

Index Terms

Trinity: a distributed graph engine on a memory cloud

Recommendations

AKIN: a streaming graph partitioning algorithm for distributed graph storage systems
CCGrid '18: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Many graph-related applications face the challenge of managing excessive and ever-growing graph data in a distributed environment. Therefore, it is necessary to consider a graph partitioning algorithm to distribute graph data onto multiple machines as ...
Read More
Managing and mining large graphs: systems and implementations
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

We are facing challenges at all levels ranging from infrastructures to programming models for managing and mining large graphs. A lot of algorithms on graphs are ad-hoc in the sense that each of them assumes that the underlying graph data can be ...
Read More
Efficient Subgraph Matching on Non-volatile Memory
Web Information Systems Engineering – WISE 2017
Abstract
The emerging non-volatile memory (NVM) technologies have attracted much attention due to its advantages over the existing DRAM technology such as non-volatility, byte-addressability and high storage density. These promising features make NVM a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
June 2013
1322 pages
ISBN:9781450320375
DOI:10.1145/2463676
General Chairs:
Kenneth Ross
Columbia University
,
Divesh Srivastava
AT&T Research
,
Program Chair:
Dimitris Papadias
HKUST
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed system
graph database
memory cloud
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '13 Paper Acceptance Rate76of372submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 312
  Total Citations
  View Citations
- 2,025
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Trinity: a distributed graph engine on a memory cloud

SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

AKIN: a streaming graph partitioning algorithm for distributed graph storage systems

Managing and mining large graphs: systems and implementations

Efficient Subgraph Matching on Non-volatile Memory