Abstract
Distributed processing frameworks, such as Yahoo!'s Hadoop and Google's MapReduce, have been successful at harnessing expansive datacenter resources for large-scale data analysis. However, their effect on datacenter energy efficiency has not been scrutinized. Moreover, the filesystem component of these frameworks effectively precludes scale-down of clusters deploying these frameworks (i.e. operating at reduced capacity). This paper presents our early work on modifying Hadoop to allow scale-down of operational clusters. We find that running Hadoop clusters in fractional configurations can save between 9% and 50% of energy consumption, and that there is a tradeoff between performance energy consumption. We also outline further research into the energy-efficiency of these frameworks.
- Lustre: A Scalable, High Performance File System. http://lustre.org/.Google Scholar
- Apache. Hadoop. http://hadoop.apache.org/.Google Scholar
- Luiz André Barroso and Urs Hölzle. The Case for Energy-Proportional Computing. Computer, 40(12), 2007. Google ScholarDigital Library
- Standard Performance Evaluation Corporation. Specpower_ssj2008. http://www.spec.org/power_ssj2008/.Google Scholar
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 2008. Google ScholarDigital Library
- Chang Fay et al. Bigtable: A Distributed Storage System for Structured Data. OSDI, USENIX, 2006. Google ScholarDigital Library
- Xiaobo Fan, W. Weber, and L.A. Barroso. Power Provisioning for a Warehouse-sized Computer. ISCA, ACM, 2007. Google ScholarDigital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. SIGOPS Oper. Syst. Rev., 2003. Google ScholarDigital Library
- Intelligent Platform Management Interface. http://www.intel.com/design/servers/ipmi/.Google Scholar
- David Meisner, B.T. Gold, and T.F. Wenisch. PowerNap: Eliminating Server Idle Power. ASPLOS, ACM, 2009. Google ScholarDigital Library
- Suzanne Rivoire, Parthasarathy Ranganathan, and Christos Kozyrakis. A Comparison of High-Level Full-System Power Models. HotPower, 2008. Google ScholarDigital Library
- Amazon Web Services. http://aws.amazon.com/.Google Scholar
- VMotion. http://vmware.com/products/vi/vc/vmotion.html.Google Scholar
Index Terms
- On the energy (in)efficiency of Hadoop clusters
Recommendations
Energy Efficiency Aware Task Assignment with DVFS in Heterogeneous Hadoop Clusters
While Hadoop ecosystems become increasingly important for practitioners of large-scale data analysis, they also incur tremendous energy cost. This trend is driving up the need for designing energy-efficient Hadoop clusters in order to reduce the ...
Energy-efficient hadoop for big data analytics and computing: A systematic review and research insights
AbstractAs the demands for big data analytics keep growing rapidly in scientific applications and online services, MapReduce and its open-source implementation Hadoop gained popularity in both academia and enterprises. Hadoop provides a highly feasible ...
Highlights- This paper presents the new viewpoints/insights in improving the energy efficiency of Hadoop.
- Present valuable and feasible solutions towards improving the energy efficiency of Hadoop.
- Propose five categories of optimizing the ...
Comments