skip to main content
10.1145/2254756.2254778acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Temperature management in data centers: why some (might) like it hot

Published:11 June 2012Publication History

ABSTRACT

The energy consumed by data centers is starting to make up a significant fraction of the world's energy consumption and carbon emissions. A large fraction of the consumed energy is spent on data center cooling, which has motivated a large body of work on temperature management in data centers. Interestingly, a key aspect of temperature management has not been well understood: controlling the setpoint temperature at which to run a data center's cooling system. Most data centers set their thermostat based on (conservative) suggestions by manufacturers, as there is limited understanding of how higher temperatures will affect the system. At the same time, studies suggest that increasing the temperature setpoint by just one degree could save 2-5% of the energy consumption. This paper provides a multi-faceted study of temperature management in data centers. We use a large collection of field data from different production environments to study the impact of temperature on hardware reliability, including the reliability of the storage subsystem, the memory subsystem and server reliability as a whole. We also use an experimental testbed based on a thermal chamber and a large array of benchmarks to study two other potential issues with higher data center temperatures: the effect on server performance and power. Based on our findings, we make recommendations for temperature management in data centers, that create the potential for saving energy, while limiting negative effects on system reliability and performance.

References

  1. Operational Data to Support and Enable Computer Science Research, Los Alamos National Laboratory. http://institute.lanl.gov/data/fdata/.Google ScholarGoogle Scholar
  2. SciNet. http://www.scinet.utoronto.ca/.Google ScholarGoogle Scholar
  3. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403--410, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. Anderson, J. Dykes, and E. Riedel. More than an interface--SCSI vs. ATA. In Proc. of FAST 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, and J. Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proc. of SIGMETRICS'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Belady, A. Rawson, J. Pfleuger, and T. Cader. The Green Grid Data Center Power Efficiency Metrics: PUE & DCiE. Technical report, Green Grid, 2008.Google ScholarGoogle Scholar
  7. D. J. Bradley, R. E. Harper, and S. W. Hunter. Workload-based power management for parallel computer systems. IBM J.Res.Dev., 47:703--718, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Brandon. Going Green In The Data Center: Practical Steps For Your SME To Become More Environmentally Friendly. Processor, 29, Sept. 2007.Google ScholarGoogle Scholar
  9. California Energy Commission. Summertime energy-saving tips for businesses. consumerenergycenter.org/tips/business_summer.html.Google ScholarGoogle Scholar
  10. G. Cole. Estimating drive reliability in desktop computers and consumer electronics systems. TP-338.1. Seagate. 2000.Google ScholarGoogle Scholar
  11. H. J. Curnow, B. A. Wichmann, and T. Si. A synthetic benchmark. The Computer Journal, 19:43--49, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. E. Darling, L. Carey, and W.-c. Feng. The design, implementation, and evaluation of mpiblast. In In Proc. of ClusterWorld 2003, 2003.Google ScholarGoogle Scholar
  13. N. El-Sayed, I. Stefanovici, G. Amvrosiadis, A. A. Hwang, and B. Schroeder. Temperature management in data centers: Why some (might) like it hot. Technical Report TECHNICAL REPORT CSRG-615, University of Toronto, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Flautner and T. Mudge. Vertigo: automatic performance-setting for linux. In Proc. of OSDI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Gandhi, M. Harchol-Balter, R. Das, and C. Lefurgy. Optimal power allocation in server farms. In Proc. of Sigmetrics '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. JEDEC Global Standards for the Microelectronics Industry. Arrhenius equation for reliability. http://www.jedec.org/standards-documents/ dictionary/terms/arrhenius-equation-reliability.Google ScholarGoogle Scholar
  17. J. M. Kaplan, W. Forrest, and N. Kindler. Revolutionizing data center energy efficiency. Technical report, McKinsey & Company, July 2008.Google ScholarGoogle Scholar
  18. J. Katcher. Postmark: a new file system benchmark. Network Appliance Tech Report TR3022, Oct. 1997.Google ScholarGoogle Scholar
  19. Lawrence Berkeley National Labs. Benchmarking Data Centers. http://hightech.lbl.gov/ benchmarking-dc.html, December 2007.Google ScholarGoogle Scholar
  20. J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Comp. Soc. TCCA Newsletter, pages 19--25, Dec. 1995.Google ScholarGoogle Scholar
  21. L. McVoy and C. Staelin. lmbench: portable tools for performance analysis. In Proc. of USENIX ATC, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. C. Niemi. Unixbench. http://www.tux.org/pub/tux/niemi/unixbench/.Google ScholarGoogle Scholar
  23. C. D. Patel, C. E. Bash, R. Sharma, and M. Beitelmal. Smart cooling of data centers. In Proc. of IPACK, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Patterson. The effect of data center temperature on energy efficiency. In Proc. of ITHERM, May 2008.Google ScholarGoogle ScholarCross RefCross Ref
  25. E. Pinheiro, R. Bianchini, E. V. Carrera, and T. Heath. Load balancing and unbalancing for power and performance in cluster-based systems. In Proc. of Workshop on Compilers and Operating Systems for Low Power (COLP), 2001.Google ScholarGoogle Scholar
  26. E. Pinheiro, W. D. Weber, and L. A. Barroso. Failure trends in a large disk drive population. In Proc. of Usenix FAST 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. J. Plimpton, R. Brightwell, C. Vaughan, K. Underwood, and M. Davis. A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark. In IEEE Cluster Computing, Sept. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  28. K. Rajamani and C. Lefurgy. On evaluating request-distribution schemes for saving energy in server clusters. In Proc. of the IEEE ISPASS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rich Miller. Google: Raise your data center temperature. http://www.datacenterknowledge.com/ archives/2008/10/14/google-raise-your-data-center- temperature/, 2008.Google ScholarGoogle Scholar
  30. B. Schroeder and G. Gibson. A large-scale study of failures in high-performance computing systems. In Proc. of DSN'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Schroeder and G. A. Gibson. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proc. of USENIX FAST, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Schroeder, E. Pinheiro, and W.-D. Weber. DRAM errors in the wild: a large-scale field study. In Proc. of SIGMETRICS '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Sharma, C. Bash, C. Patel, R. Friedrich, and J. Chase. Balance of power: dynamic thermal management for internet data centers. IEEE Internet Computing, 9(1):42--49, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Staelin. Lmbench3: Measuring scalability. Technical report, HP Laboratories Israel, 2002.Google ScholarGoogle Scholar
  35. R. F. Sullivan. Alternating Cold and Hot Aisles Provides More Reliable Cooling for Server Farms. In Uptime Institute, 2000.Google ScholarGoogle Scholar
  36. T10 Technical Committee. SCSI Block Commands -- 3, Rev.25. Work. Draft T10/1799-D, ANSI INCITS.Google ScholarGoogle Scholar
  37. T13 Technical Committee. ATA 8 - ATA/ATAPI Command Set, Rev.4a. Work. Draft T13/1699-D, ANSI INCITS.Google ScholarGoogle Scholar
  38. Transaction Processing Performance Council. TPC Benchmark C - Rev. 5.11. Standard, Feb. 2010.Google ScholarGoogle Scholar
  39. Transaction Processing Performance Council. TPC Benchmark H - Rev. 2.14.2. Standard, June 2011.Google ScholarGoogle Scholar
  40. R. P. Weicker. Dhrystone: a synthetic systems programming benchmark. Communications of the ACM, 27(10):1013--1030, Oct. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Temperature management in data centers: why some (might) like it hot

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
            June 2012
            450 pages
            ISBN:9781450310970
            DOI:10.1145/2254756
            • cover image ACM SIGMETRICS Performance Evaluation Review
              ACM SIGMETRICS Performance Evaluation Review  Volume 40, Issue 1
              Performance evaluation review
              June 2012
              433 pages
              ISSN:0163-5999
              DOI:10.1145/2318857
              Issue’s Table of Contents

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 June 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate459of2,691submissions,17%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader