Abstract
The Quantcast File System (QFS) is an efficient alternative to the Hadoop Distributed File System (HDFS). QFS is written in C++, is plugin compatible with Hadoop MapReduce, and offers several efficiency improvements relative to HDFS: 50% disk space savings through erasure coding instead of replication, a resulting doubling of write throughput, a faster name node, support for faster sorting and logging through a concurrent append feature, a native command line client much faster than hadoop fs, and global feedback-directed I/O device management. As QFS works out of the box with Hadoop, migrating data from HDFS to QFS involves simply executing hadoop distcp. QFS is being developed fully open source and is available under an Apache license from https://github.com/quantcast/qfs. Multi-petabyte QFS instances have been in heavy production use since 2011.
- GPFS. http://en.wikipedia.org/wiki/GPFS.Google Scholar
- QFS Repository. http://quantcast.github.com/qfs.Google Scholar
- XFS. http://en.wikipedia.org/wiki/XFS.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137-150. USENIX Association, 2004. Google Scholar
- H. K. et al. HDFS RAID. http://wiki.apache.org/hadoop/HDFS-RAID, 2010.Google Scholar
- A. Fikes. Storage architecture and challenges (google). http://tinyurl.com/6vbhgzn.Google Scholar
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In M. L. Scott and L. L. Peterson, editors, SOSP, pages 29-43. ACM, 2003. Google Scholar
- E. B. Nightingale, J. Elson, J. Fan, O. Hofmann, J. Howell, and Y. Suzue. Flat datacenter storage. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI'12, pages 1-15, Berkeley, CA, USA, 2012. USENIX Association. Google Scholar
- M. Peterson. Using hadoop to expand data warehousing (neustar). http://tinyurl.com/cpjc7ko, 2013.Google Scholar
- S. Rao et al. The kosmos file system. https://code.google.com/p/kosmosfs, 2010.Google Scholar
- S. Rao, R. Ramakrishnan, A. Silberstein, M. Ovsiannikov, and D. Reeves. Sailfish: A framework for large scale data processing. In ACM Symposium on Cloud Computing, 2012. Google Scholar
- S. Rus, M. Ovsiannikov, and J. Kelly. Quantsort: Revolution in map-reduce performance and operation. http://tinyurl.com/c4hkftm, 2011.Google Scholar
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In M. G. Khatib, X. He, and M. Factor, editors, MSST, pages 1-10. IEEE Computer Society, 2010. Google Scholar
Index Terms
- The quantcast file system
Recommendations
The Zebra striped network file system
Zebra is a network file system that increases throughput by striping the file data across multiple servers. Rather than striping each file separately, Zebra forms all the new data from each client into a single stream, which it then stripes using an ...
Comments