tutorial

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

Authors:
Haoyuan Li

University of California, Berkeley

University of California, Berkeley
View Profile

,
Ali Ghodsi

University of California, Berkeley

University of California, Berkeley
View Profile

,
Matei Zaharia

MIT, Databricks

MIT, Databricks
View Profile

,
Scott Shenker

University of California, Berkeley

University of California, Berkeley
View Profile

,
Ion Stoica

University of California, Berkeley

University of California, Berkeley
View Profile

SOCC '14: Proceedings of the ACM Symposium on Cloud ComputingNovember 2014Pages 1–15https://doi.org/10.1145/2670979.2670985

Published:03 November 2014Publication History

SOCC '14: Proceedings of the ACM Symposium on Cloud Computing

Pages 1–15

ABSTRACT

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique, into the storage layer. The key challenge in making a long-running lineage-based storage system is timely data recovery in case of failures. Tachyon addresses this issue by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under commonly used resource schedulers. Our evaluation shows that Tachyon outperforms in-memory HDFS by 110x for writes. It also improves the end-to-end latency of a realistic workflow by 4x. Tachyon is open source and is deployed at multiple companies.

References

Apache Cassandra. http://cassandra.apache.org/.Google Scholar
Apache Hadoop. http://hadoop.apache.org/.Google Scholar
Apache HBase. http://hbase.apache.org/.Google Scholar
Apache Oozie. http://incubator.apache.org/oozie/.Google Scholar
Apache Crunch. http://crunch.apache.org/.Google Scholar
Dell. http://www.dell.com/us/business/p/servers.Google Scholar
Luigi. https://github.com/spotify/luigi.Google Scholar
Apache Mahout. http://mahout.apache.org/.Google Scholar
L. Alvisi and K. Marzullo. Message logging: Pessimistic, optimistic, causal, and optimal. Software Engineering, IEEE Transactions on, 24(2):149--159, 1998. Google ScholarDigital Library
L. Alvisi, K. Bhatia, and K. Marzullo. Causality tracking in causal message-logging protocols. Distributed Computing, 15 (1):1--15, 2002. Google ScholarDigital Library
G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Disk-Locality in Datacenter Computing Considered Irrelevant. In USENIX HotOS 2011. Google ScholarDigital Library
G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In NSDI 2012. Google ScholarDigital Library
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. Fawn: A fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pages 1--14. ACM, 2009. Google ScholarDigital Library
E. B. Nightingale, J. Elson, J. Fan, O. Hofmann, J. Howell, and Y. Suzue. Flat Datacenter Storage. In OSDI 2012. Google ScholarDigital Library
J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Léon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, volume 11, pages 223--234, 2011.Google Scholar
J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in the batch-aware distributed file system. In NSDI, volume 4, pages 365--378, 2004. Google ScholarDigital Library
R. Bose and J. Frew. Lineage Retrieval for Scientic Data Processing: A Survey. In ACM Computing Surveys 2005. Google ScholarDigital Library
R. Bose and J. Frew. Lineage retrieval for scientific data processing: a survey. ACM Computing Surveys (CSUR), 37 (1):1--28, 2005. Google ScholarDigital Library
C. Chambers et al. FlumeJava: easy, efficient data-parallel pipelines. In PLDI 2010. Google ScholarDigital Library
Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proceedings of the VLDB Endowment, 5(12):1802--1813, 2012. Google ScholarDigital Library
J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in Databases: Why, How, and Where. In Foundations and Trends in Databases 2007. Google ScholarDigital Library
M. Chowdhury, S. Kandula, and I. Stoica. Leveraging endpoint flexibility in data-intensive clusters. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pages 231--242. ACM, 2013. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI 2004. Google ScholarDigital Library
E. Elnozahy, D. Johnson, and W. Zwaenepoel. The Performance of Consistent Checkpointing. In 11th Symposium on Reliable Distributed Systems 1994.Google Scholar
R. Escriva, B. Wong, and E. G. Sirer. Hyperdex: A distributed, searchable key-value store. ACM SIGCOMM Computer Communication Review, 42(4):25--36, 2012. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 2003. Google ScholarDigital Library
P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic Management of Data and Computation in Data Centers. In OSDI 2010. Google ScholarDigital Library
P. J. Guo and D. Engler. CDE: Using system call interposition to automatically create portable software packages. In Proceedings of the 2011 USENIX Annual Technical Conference, pages 247--252, 2011. Google ScholarDigital Library
J. Hyde. Discardable Memory and Materialized Queries. http://hortonworks.com/blog/dmmq/.Google Scholar
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 41(3):59--72, 2007. Google ScholarDigital Library
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In SOSP, November 2009. Google ScholarDigital Library
H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Reliable, memory speed storage for cluster computing frameworks. Technical Report UCB/EECS-2014-135, EECS Department, University of California, Berkeley, Jun 2014.Google Scholar
D. Locke, L. Sha, R. Rajikumar, J. Lehoczky, and G. Burns. Priority inversion and its control: An experimental investigation. In ACM SIGAda Ada Letters, volume 8, pages 39--42. ACM, 1988. Google ScholarDigital Library
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716--727, 2012. Google ScholarDigital Library
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146. ACM, 2010. Google ScholarDigital Library
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 3 (1-2):330--339, 2010. Google ScholarDigital Library
E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. In ACM SIGOPS Operating Systems Review, volume 39, pages 191--205. ACM, 2005. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110. Google ScholarDigital Library
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, et al. The case for ramcloud. Communications of the ACM, 54(7):121--130, 2011. Google ScholarDigital Library
J. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. In Technical Report, University of Tennessee, 1997. Google Scholar
J. S. Plank and W. R. Elwasif. Experimental assessment of workstation failures and their impact on checkpointing systems. In 28th International Symposium on Fault-Tolerant Computing, 1997. Google ScholarDigital Library
R. Power and J. Li. Piccolo: Building Fast, Distributed Programs with Partitioned Tables. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, pages 293--306. USENIX Association, 2010. Google ScholarDigital Library
S. Radia. Discardable Distributed Memory: Supporting Memory Storage in HDFS. http://hortonworks.com/blog/ddm/.Google Scholar
C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing. ACM, 2012. Google ScholarDigital Library
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10. IEEE, 2010. Google ScholarDigital Library
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive a petabyte scale data warehouse using hadoop. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 996--1005. IEEE, 2010.Google ScholarCross Ref
A. Vahdat and T. E. Anderson. Transparent result caching. In USENIX Annual Technical Conference, 1998. Google ScholarDigital Library
N. H. Vaidya. Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme. In IEEE Trans. Computers 1997. Google ScholarDigital Library
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 307--320. USENIX Association, 2006. Google ScholarDigital Library
J. W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17:530--531, Sept 1974. ISSN 0001-0782. Google ScholarDigital Library
Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, pages 1--14. USENIX Association, 2008. Google ScholarDigital Library
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys 10, 2010. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012. Google ScholarDigital Library
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 423--438. ACM, 2013. Google ScholarDigital Library

Index Terms

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

Recommendations

Tachyon: a constraint-based temporal reasoning model and its implementation

We provide an overview of Tachyon, an implementation of a constraint-based model for representing and reasoning about qualitative and quantitative aspects of time. Tachyon's data model provides substantial expressiveness, fast computation over convex ...
Read More
Tachyon Common Lisp: an efficient and portable implementation of CLtL2
LFP '92: Proceedings of the 1992 ACM conference on LISP and functional programming

Tachyon Common Lisp is an efficient and portable implementation of Common Lisp 2nd Edition. The design objective of Tachyon is to apply both advanced optimization technology developed for RISC processors and Lisp optimization techniques. The compiler ...
Read More
Tachyon Common Lisp: an efficient and portable implementation of CLtL2

Tachyon Common Lisp is an efficient and portable implementation of Common Lisp 2nd Edition. The design objective of Tachyon is to apply both advanced optimization technology developed for RISC processors and Lisp optimization techniques. The compiler ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOCC '14: Proceedings of the ACM Symposium on Cloud Computing
November 2014
383 pages
ISBN:9781450332521
DOI:10.1145/2670979
Conference Chairs:
Ed Lazowska,
Doug Terry,
Program Chairs:
Remzi H. Arpaci-Dusseau,
Johannes Gehrke
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- tutorial
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 175
  Total Citations
  View Citations
- 2,331
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.