ABSTRACT
Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads.
In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor micro-architecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture. Moreover, while today's predominant micro-architecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.
- Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and David A. Wood. DBMSs on a modern processor: where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases, September 1999. Google ScholarDigital Library
- Alexa, The Web Information Company. http://www.alexa.com/.Google Scholar
- Apache Mahout: scalable machine-learning and data-mining library. http://mahout.apache.org/.Google Scholar
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008. Google ScholarDigital Library
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, volume 7, November 2006. Google ScholarDigital Library
- Liviu Ciortea, Cristian Zamfir, Stefan Bucur, Vitaly Chipounov, and George Candea. Cloud9: a software testing service. ACM SIGOPS Operating Systems Review, 43:5--10, January 2010. Google ScholarDigital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, June 2010. Google ScholarDigital Library
- John D. Davis, James Laudon, and Kunle Olukotun. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, September 2005. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, volume 6, November 2004. Google ScholarDigital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, October 2007. Google ScholarDigital Library
- PowerEdge M1000e Blade Enclosure. http://www.dell.com/us/enterprise/p/poweredge-m1000e/pd.aspx.Google Scholar
- Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark silicon and the end of multicore scaling. In Proceeding of the 38th Annual International Symposium on Computer Architecture, June 2011. Google ScholarDigital Library
- EuroCloud Server. http://www.eurocloudserver.com.Google Scholar
- Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. A performance counter architecture for computing accurate CPI components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006. Google ScholarDigital Library
- Facebook Statistics. https://www.facebook.com/press/info.php?statistics.Google Scholar
- Xiaobo Fan, Wolf-Dietrich Weber, and Luiz André Barroso. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture, June 2007. Google ScholarDigital Library
- Google Data Centers. http://www.google.com/intl/en/corporate/datacenter/.Google Scholar
- Zvika Guz, Oved Itzhak, Idit Keidar, Avinoam Kolod, Avi Mendelson, and Uri C. Weiser. Threads vs. caches: modeling the behavior of parallel workloads. In International Conference on Computer Design, October 2010.Google ScholarCross Ref
- Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture, June 2010. Google ScholarDigital Library
- Nikos Hardavellas, Michael Ferdman, Anastasia Ailamaki, and Babak Falsafi. Toward Dark Silicon in Servers. In IEEE Micro, 31(4):6--15, July-Aug, 2011. Google ScholarDigital Library
- Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. Reactive NUCA: near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture, June 2009. Google ScholarDigital Library
- Nikos Hardavellas, Ippokratis Pandis, Ryan Johnson, Naju Mancheril, Anastassia Ailamaki, and Babak Falsafi. Database servers on chip multiprocessors: limitations and opportunities. In The 3rd Biennial Conference on Innovative Data Systems Research, January 2007.Google Scholar
- Faban Harness and Benchmark Framework. http://java.net/projects/faban/.Google Scholar
- Mark Horowitz, Elad Alon, Dinesh Patil, Samuel Naffziger, Rajesh Kumar, and Kerry Bernstein. Scaling, power, and the future of CMOS. In Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International, December 2005.Google Scholar
- Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang. The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In 26th International Conference on Data Engineering Workshops, March 2010.Google ScholarCross Ref
- Intel VTune Amplifier XE Performance Profiler. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.Google Scholar
- Tejas S. Karkhanis and James E. Smith. A first-order superscalar processor model. In Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004. Google ScholarDigital Library
- Kimberly Keeton, David A. Patterson, Yong Qiang He, Roger C. Raphael, and Walter E. Baker. Performance characterization of a quad Pentium Pro SMP using OLTP workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998. Google ScholarDigital Library
- Taeho Kgil, Shaun D'Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Trevor Mudge, Steven Reinhardt, and Krisztian Flautner. PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006. Google ScholarDigital Library
- Christos Kozyrakis, Aman Kansal, Sriram Sankar, and Kushagra Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30(4):8--19, July-Aug, 2010. Google ScholarDigital Library
- Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang. CloudCmp: comparing public cloud providers. In Proceedings of the 10th Annual Conference on Internet Measurement, November 2010. Google ScholarDigital Library
- Ang Li, Xiaowei Yang, Srikanth Kandula, and Ming Zhang. CloudCMP: Shopping for a cloud made easy. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, June 2010. Google ScholarDigital Library
- Kevin Lim, Parthasarathy Ranganathan, Jichuan Chang, Chandrakant Patel, Trevor Mudge, and Steven Reinhardt. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proceedings of the 35th Annual International Symposium on Computer Architecture, June 2008. Google ScholarDigital Library
- Jack L. Lo, Luiz André Barroso, Susan J. Eggers, Kourosh Gharachorloo, Henry M. Levy, and Sujay S. Parekh. An analysis of database workload performance on simultaneous multithreaded processors. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998. Google ScholarDigital Library
- Open Compute Project. http://opencompute.org/.Google Scholar
- Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, and Luiz André Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998. Google ScholarDigital Library
- Parthasarathy Ranganathan and Norman Jouppi. Enterprise IT trends and implications for architecture research. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture, February 2005. Google ScholarDigital Library
- Vijay Janapa Reddi, Benjamin C. Lee, Trishul Chilimbi, and Kushagra Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In Proceedings of the 37th Annual International Symposium on Computer Architecture, June 2010. Google ScholarDigital Library
- SeaMicro Packs 768 Cores Into its Atom Server. http://www.datacenterknowledge.com/archives/2011/07/18/seamicro-packs-768-cores-into-its-atom-server/.Google Scholar
- Will Sobel, Shanti Subramanyam, Akara Sucharitakul, Jimmy Nguyen, Hubert Wong, Arthur Klepchukov, Sheetal Patil, Armando Fox, and David Patterson. Cloudstone: multi-platform, multi-language benchmark and measurement tools for web 2.0. In the 1st Workshop on Cloud Computing and Its Applications, October 2008.Google Scholar
- Vijayaraghavan Soundararajan and Jennifer M. Anderson. The impact of management operations on the virtualized datacenter. In Proceedings of the 37th Annual International Symposium on Computer Architecture, June 2010. Google ScholarDigital Library
- Lingjia Tang, Jason Mars, Veil Vachharajani, Robert Hundt, and Mary Lou Soffa. The impact of memory subsystem resource sharing on datacenter applications. In Proceeding of the 38th Annual International Symposium on Computer Architecture, June 2011. Google ScholarDigital Library
- The Apache Cassandra Project. http://cassandra.apache.org/.Google Scholar
- TPC Transaction Processing Performance Council. http://www.tpc.org/default.asp.Google Scholar
- Nathan Tuck and Dean M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, September 2003. Google ScholarDigital Library
- Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael B. Taylor. Conservation cores: reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, March 2010. Google ScholarDigital Library
- Thomas F. Wenisch, Roland E. Wunderlich, Michael Ferdman, Anastassia Ailamaki, Babak Falsafi, and James C. Hoe. Simflex: Statistical sampling of computer system simulation. IEEE Micro, 26:18--31, July 2006. Google ScholarDigital Library
Index Terms
- Clearing the clouds: a study of emerging scale-out workloads on modern hardware
Recommendations
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS '12Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the ...
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS '12Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the ...
Power-efficient distributed scheduling of virtual machines using workload-aware consolidation techniques
There is growing demand on datacenters to serve more clients with reasonable response times, demanding more hardware resources, and higher energy consumption. Energy-aware datacenters have thus been amongst the forerunners to deploy virtualization ...
Comments