skip to main content
10.1145/2670979.2670985acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

Published:03 November 2014Publication History

ABSTRACT

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique, into the storage layer. The key challenge in making a long-running lineage-based storage system is timely data recovery in case of failures. Tachyon addresses this issue by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under commonly used resource schedulers. Our evaluation shows that Tachyon outperforms in-memory HDFS by 110x for writes. It also improves the end-to-end latency of a realistic workflow by 4x. Tachyon is open source and is deployed at multiple companies.

References

  1. Apache Cassandra. http://cassandra.apache.org/.Google ScholarGoogle Scholar
  2. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  3. Apache HBase. http://hbase.apache.org/.Google ScholarGoogle Scholar
  4. Apache Oozie. http://incubator.apache.org/oozie/.Google ScholarGoogle Scholar
  5. Apache Crunch. http://crunch.apache.org/.Google ScholarGoogle Scholar
  6. Dell. http://www.dell.com/us/business/p/servers.Google ScholarGoogle Scholar
  7. Luigi. https://github.com/spotify/luigi.Google ScholarGoogle Scholar
  8. Apache Mahout. http://mahout.apache.org/.Google ScholarGoogle Scholar
  9. L. Alvisi and K. Marzullo. Message logging: Pessimistic, optimistic, causal, and optimal. Software Engineering, IEEE Transactions on, 24(2):149--159, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Alvisi, K. Bhatia, and K. Marzullo. Causality tracking in causal message-logging protocols. Distributed Computing, 15 (1):1--15, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Disk-Locality in Datacenter Computing Considered Irrelevant. In USENIX HotOS 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Ananthanarayanan, A. Ghodsi, A. Wang, D. Borthakur, S. Kandula, S. Shenker, and I. Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In NSDI 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. Fawn: A fast array of wimpy nodes. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pages 1--14. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. B. Nightingale, J. Elson, J. Fan, O. Hofmann, J. Howell, and Y. Suzue. Flat Datacenter Storage. In OSDI 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Léon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, volume 11, pages 223--234, 2011.Google ScholarGoogle Scholar
  16. J. Bent, D. Thain, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Explicit control in the batch-aware distributed file system. In NSDI, volume 4, pages 365--378, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Bose and J. Frew. Lineage Retrieval for Scientic Data Processing: A Survey. In ACM Computing Surveys 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Bose and J. Frew. Lineage retrieval for scientific data processing: a survey. ACM Computing Surveys (CSUR), 37 (1):1--28, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Chambers et al. FlumeJava: easy, efficient data-parallel pipelines. In PLDI 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Chen, S. Alspaugh, and R. Katz. Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. Proceedings of the VLDB Endowment, 5(12):1802--1813, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in Databases: Why, How, and Where. In Foundations and Trends in Databases 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Chowdhury, S. Kandula, and I. Stoica. Leveraging endpoint flexibility in data-intensive clusters. In Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM, pages 231--242. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Elnozahy, D. Johnson, and W. Zwaenepoel. The Performance of Consistent Checkpointing. In 11th Symposium on Reliable Distributed Systems 1994.Google ScholarGoogle Scholar
  25. R. Escriva, B. Wong, and E. G. Sirer. Hyperdex: A distributed, searchable key-value store. ACM SIGCOMM Computer Communication Review, 42(4):25--36, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. K. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic Management of Data and Computation in Data Centers. In OSDI 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. J. Guo and D. Engler. CDE: Using system call interposition to automatically create portable software packages. In Proceedings of the 2011 USENIX Annual Technical Conference, pages 247--252, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Hyde. Discardable Memory and Materialized Queries. http://hortonworks.com/blog/dmmq/.Google ScholarGoogle Scholar
  30. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 41(3):59--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair scheduling for distributed computing clusters. In SOSP, November 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Reliable, memory speed storage for cluster computing frameworks. Technical Report UCB/EECS-2014-135, EECS Department, University of California, Berkeley, Jun 2014.Google ScholarGoogle Scholar
  33. D. Locke, L. Sha, R. Rajikumar, J. Lehoczky, and G. Burns. Priority inversion and its control: An experimental investigation. In ACM SIGAda Ada Letters, volume 8, pages 39--42. ACM, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 5(8):716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 135--146. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 3 (1-2):330--339, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. In ACM SIGOPS Operating Systems Review, volume 39, pages 191--205. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, et al. The case for ramcloud. Communications of the ACM, 54(7):121--130, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Plank. An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. In Technical Report, University of Tennessee, 1997. Google ScholarGoogle Scholar
  41. J. S. Plank and W. R. Elwasif. Experimental assessment of workstation failures and their impact on checkpointing systems. In 28th International Symposium on Fault-Tolerant Computing, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. Power and J. Li. Piccolo: Building Fast, Distributed Programs with Partitioned Tables. In Proceedings of the 9th USENIX conference on Operating systems design and implementation, pages 293--306. USENIX Association, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Radia. Discardable Distributed Memory: Supporting Memory Storage in HDFS. http://hortonworks.com/blog/ddm/.Google ScholarGoogle Scholar
  44. C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of the Third ACM Symposium on Cloud Computing. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive a petabyte scale data warehouse using hadoop. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 996--1005. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  47. A. Vahdat and T. E. Anderson. Transparent result caching. In USENIX Annual Technical Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. N. H. Vaidya. Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme. In IEEE Trans. Computers 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 307--320. USENIX Association, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. W. Young. A first order approximation to the optimum checkpoint interval. Commun. ACM, 17:530--531, Sept 1974. ISSN 0001-0782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, pages 1--14. USENIX Association, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys 10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 423--438. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SOCC '14: Proceedings of the ACM Symposium on Cloud Computing
              November 2014
              383 pages
              ISBN:9781450332521
              DOI:10.1145/2670979

              Copyright © 2014 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 3 November 2014

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • tutorial
              • Research
              • Refereed limited

              Acceptance Rates

              Overall Acceptance Rate169of722submissions,23%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader