ABSTRACT
The energy consumed by data centers is starting to make up a significant fraction of the world's energy consumption and carbon emissions. A large fraction of the consumed energy is spent on data center cooling, which has motivated a large body of work on temperature management in data centers. Interestingly, a key aspect of temperature management has not been well understood: controlling the setpoint temperature at which to run a data center's cooling system. Most data centers set their thermostat based on (conservative) suggestions by manufacturers, as there is limited understanding of how higher temperatures will affect the system. At the same time, studies suggest that increasing the temperature setpoint by just one degree could save 2-5% of the energy consumption. This paper provides a multi-faceted study of temperature management in data centers. We use a large collection of field data from different production environments to study the impact of temperature on hardware reliability, including the reliability of the storage subsystem, the memory subsystem and server reliability as a whole. We also use an experimental testbed based on a thermal chamber and a large array of benchmarks to study two other potential issues with higher data center temperatures: the effect on server performance and power. Based on our findings, we make recommendations for temperature management in data centers, that create the potential for saving energy, while limiting negative effects on system reliability and performance.
- Operational Data to Support and Enable Computer Science Research, Los Alamos National Laboratory. http://institute.lanl.gov/data/fdata/.Google Scholar
- SciNet. http://www.scinet.utoronto.ca/.Google Scholar
- S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403--410, 1990.Google ScholarCross Ref
- D. Anderson, J. Dykes, and E. Riedel. More than an interface--SCSI vs. ATA. In Proc. of FAST 2003. Google ScholarDigital Library
- L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proc. of SIGMETRICS'07, 2007. Google ScholarDigital Library
- C. Belady, A. Rawson, J. Pfleuger, and T. Cader. The Green Grid Data Center Power Efficiency Metrics: PUE & DCiE. Technical report, Green Grid, 2008.Google Scholar
- D. J. Bradley, R. E. Harper, and S. W. Hunter. Workload-based power management for parallel computer systems. IBM J.Res.Dev., 47:703--718, 2003. Google ScholarDigital Library
- J. Brandon. Going Green In The Data Center: Practical Steps For Your SME To Become More Environmentally Friendly. Processor, 29, Sept. 2007.Google Scholar
- California Energy Commission. Summertime energy-saving tips for businesses. consumerenergycenter.org/tips/business_summer.html.Google Scholar
- G. Cole. Estimating drive reliability in desktop computers and consumer electronics systems. TP-338.1. Seagate. 2000.Google Scholar
- H. J. Curnow, B. A. Wichmann, and T. Si. A synthetic benchmark. The Computer Journal, 19:43--49, 1976.Google ScholarCross Ref
- A. E. Darling, L. Carey, and W.-c. Feng. The design, implementation, and evaluation of mpiblast. In In Proc. of ClusterWorld 2003, 2003.Google Scholar
- N. El-Sayed, I. Stefanovici, G. Amvrosiadis, A. A. Hwang, and B. Schroeder. Temperature management in data centers: Why some (might) like it hot. Technical Report TECHNICAL REPORT CSRG-615, University of Toronto, 2012.Google ScholarDigital Library
- K. Flautner and T. Mudge. Vertigo: automatic performance-setting for linux. In Proc. of OSDI, 2002. Google ScholarDigital Library
- A. Gandhi, M. Harchol-Balter, R. Das, and C. Lefurgy. Optimal power allocation in server farms. In Proc. of Sigmetrics '09, 2009. Google ScholarDigital Library
- JEDEC Global Standards for the Microelectronics Industry. Arrhenius equation for reliability. http://www.jedec.org/standards-documents/ dictionary/terms/arrhenius-equation-reliability.Google Scholar
- J. M. Kaplan, W. Forrest, and N. Kindler. Revolutionizing data center energy efficiency. Technical report, McKinsey & Company, July 2008.Google Scholar
- J. Katcher. Postmark: a new file system benchmark. Network Appliance Tech Report TR3022, Oct. 1997.Google Scholar
- Lawrence Berkeley National Labs. Benchmarking Data Centers. http://hightech.lbl.gov/ benchmarking-dc.html, December 2007.Google Scholar
- J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Comp. Soc. TCCA Newsletter, pages 19--25, Dec. 1995.Google Scholar
- L. McVoy and C. Staelin. lmbench: portable tools for performance analysis. In Proc. of USENIX ATC, 1996. Google ScholarDigital Library
- D. C. Niemi. Unixbench. http://www.tux.org/pub/tux/niemi/unixbench/.Google Scholar
- C. D. Patel, C. E. Bash, R. Sharma, and M. Beitelmal. Smart cooling of data centers. In Proc. of IPACK, 2003.Google ScholarCross Ref
- M. Patterson. The effect of data center temperature on energy efficiency. In Proc. of ITHERM, May 2008.Google ScholarCross Ref
- E. Pinheiro, R. Bianchini, E. V. Carrera, and T. Heath. Load balancing and unbalancing for power and performance in cluster-based systems. In Proc. of Workshop on Compilers and Operating Systems for Low Power (COLP), 2001.Google Scholar
- E. Pinheiro, W. D. Weber, and L. A. Barroso. Failure trends in a large disk drive population. In Proc. of Usenix FAST 2007. Google ScholarDigital Library
- S. J. Plimpton, R. Brightwell, C. Vaughan, K. Underwood, and M. Davis. A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark. In IEEE Cluster Computing, Sept. 2006.Google ScholarCross Ref
- K. Rajamani and C. Lefurgy. On evaluating request-distribution schemes for saving energy in server clusters. In Proc. of the IEEE ISPASS, 2003. Google ScholarDigital Library
- Rich Miller. Google: Raise your data center temperature. http://www.datacenterknowledge.com/ archives/2008/10/14/google-raise-your-data-center- temperature/, 2008.Google Scholar
- B. Schroeder and G. Gibson. A large-scale study of failures in high-performance computing systems. In Proc. of DSN'06, 2006. Google ScholarDigital Library
- B. Schroeder and G. A. Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proc. of USENIX FAST, 2007. Google ScholarDigital Library
- B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: a large-scale field study. In Proc. of SIGMETRICS '09, 2009. Google ScholarDigital Library
- R. Sharma, C. Bash, C. Patel, R. Friedrich, and J. Chase. Balance of power: dynamic thermal management for internet data centers. IEEE Internet Computing, 9(1):42--49, 2005. Google ScholarDigital Library
- C. Staelin. Lmbench3: Measuring scalability. Technical report, HP Laboratories Israel, 2002.Google Scholar
- R. F. Sullivan. Alternating Cold and Hot Aisles Provides More Reliable Cooling for Server Farms. In Uptime Institute, 2000.Google Scholar
- T10 Technical Committee. SCSI Block Commands -- 3, Rev.25. Work. Draft T10/1799-D, ANSI INCITS.Google Scholar
- T13 Technical Committee. ATA 8 - ATA/ATAPI Command Set, Rev.4a. Work. Draft T13/1699-D, ANSI INCITS.Google Scholar
- Transaction Processing Performance Council. TPC Benchmark C - Rev. 5.11. Standard, Feb. 2010.Google Scholar
- Transaction Processing Performance Council. TPC Benchmark H - Rev. 2.14.2. Standard, June 2011.Google Scholar
- R. P. Weicker. Dhrystone: a synthetic systems programming benchmark. Communications of the ACM, 27(10):1013--1030, Oct. 1984. Google ScholarDigital Library
Index Terms
- Temperature management in data centers: why some (might) like it hot
Recommendations
Temperature management in data centers: why some (might) like it hot
Performance evaluation reviewThe energy consumed by data centers is starting to make up a significant fraction of the world's energy consumption and carbon emissions. A large fraction of the consumed energy is spent on data center cooling, which has motivated a large body of work ...
Energy Efficient Free Cooling System for Data Centers
CLOUDCOM '11: Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and ScienceA data center is a facility used to keep computer related equipments. It is estimated that heat production rate of the data center is doubled in every two years and hence the inevitability of the cooling system gets increased. In due course power ...
Analyzing the Cooling Behavior of Hot and Cold Aisle Containment in Data Centers
EIDWT '13: Proceedings of the 2013 Fourth International Conference on Emerging Intelligent Data and Web TechnologiesDifferent types of air distribution schemes have different impacts on the energy efficiency of cooling systems. The analysis results of this paper demonstrate that sealing an appropriate aisle (hot aisle or cold aisle) can significantly reduce the energy ...
Comments