skip to main content
article

The quantcast file system

Published:01 August 2013Publication History
Skip Abstract Section

Abstract

The Quantcast File System (QFS) is an efficient alternative to the Hadoop Distributed File System (HDFS). QFS is written in C++, is plugin compatible with Hadoop MapReduce, and offers several efficiency improvements relative to HDFS: 50% disk space savings through erasure coding instead of replication, a resulting doubling of write throughput, a faster name node, support for faster sorting and logging through a concurrent append feature, a native command line client much faster than hadoop fs, and global feedback-directed I/O device management. As QFS works out of the box with Hadoop, migrating data from HDFS to QFS involves simply executing hadoop distcp. QFS is being developed fully open source and is available under an Apache license from https://github.com/quantcast/qfs. Multi-petabyte QFS instances have been in heavy production use since 2011.

References

  1. GPFS. http://en.wikipedia.org/wiki/GPFS.Google ScholarGoogle Scholar
  2. QFS Repository. http://quantcast.github.com/qfs.Google ScholarGoogle Scholar
  3. XFS. http://en.wikipedia.org/wiki/XFS.Google ScholarGoogle Scholar
  4. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137-150. USENIX Association, 2004. Google ScholarGoogle Scholar
  5. H. K. et al. HDFS RAID. http://wiki.apache.org/hadoop/HDFS-RAID, 2010.Google ScholarGoogle Scholar
  6. A. Fikes. Storage architecture and challenges (google). http://tinyurl.com/6vbhgzn.Google ScholarGoogle Scholar
  7. S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In M. L. Scott and L. L. Peterson, editors, SOSP, pages 29-43. ACM, 2003. Google ScholarGoogle Scholar
  8. E. B. Nightingale, J. Elson, J. Fan, O. Hofmann, J. Howell, and Y. Suzue. Flat datacenter storage. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI'12, pages 1-15, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarGoogle Scholar
  9. M. Peterson. Using hadoop to expand data warehousing (neustar). http://tinyurl.com/cpjc7ko, 2013.Google ScholarGoogle Scholar
  10. S. Rao et al. The kosmos file system. https://code.google.com/p/kosmosfs, 2010.Google ScholarGoogle Scholar
  11. S. Rao, R. Ramakrishnan, A. Silberstein, M. Ovsiannikov, and D. Reeves. Sailfish: A framework for large scale data processing. In ACM Symposium on Cloud Computing, 2012. Google ScholarGoogle Scholar
  12. S. Rus, M. Ovsiannikov, and J. Kelly. Quantsort: Revolution in map-reduce performance and operation. http://tinyurl.com/c4hkftm, 2011.Google ScholarGoogle Scholar
  13. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In M. G. Khatib, X. He, and M. Factor, editors, MSST, pages 1-10. IEEE Computer Society, 2010. Google ScholarGoogle Scholar

Index Terms

  1. The quantcast file system
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image Proceedings of the VLDB Endowment
                Proceedings of the VLDB Endowment  Volume 6, Issue 11
                August 2013
                237 pages

                Publisher

                VLDB Endowment

                Publication History

                • Published: 1 August 2013
                Published in pvldb Volume 6, Issue 11

                Qualifiers

                • article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader