research-article

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

Authors:
Michael Ferdman

Stony Brook University

Stony Brook University
View Profile

,
Almutaz Adileh

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Onur Kocberber

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Stavros Volos

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Mohammad Alisafaee

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Djordje Jevdjic

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Cansu Kaynak

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Adrian Daniel Popescu

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Anastasia Ailamaki

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

,
Babak Falsafi

École Polytechnique Fédérale de Lausanne

École Polytechnique Fédérale de Lausanne
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 30 Issue 4Article No.: 15pp 1–24https://doi.org/10.1145/2382553.2382557

Published:01 November 2012Publication History

ACM Transactions on Computer Systems

Abstract

Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads.

In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that today’s predominant processor microarchitecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core microarchitecture. Moreover, while today’s predominant microarchitecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key microarchitectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.

References

Ailamaki, A., Dewitt, D. J., Hill, M. D., and Wood, D. A. 1999. DBMSs on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases. Google ScholarDigital Library
Alexa. 2012. The Web Information Company. http://www.alexa.com/.Google Scholar
Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
Cassandra. 2012. The Apache Cassandra Project. http://cassandra.apache.org/.Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7. Google ScholarDigital Library
Ciortea, L., Zamfir, C., Bucur, S., Chipounov, V., and Candea, G. 2010. Cloud9: A software testing service. ACM SIGOPS Operating Systems Review 43, 5--10. Google ScholarDigital Library
Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. Google ScholarDigital Library
Davis, J. D., Laudon, J., and Olukotun, K. 2005. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, vol. 6. Google ScholarDigital Library
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. Google ScholarDigital Library
Dell. 2012. PowerEdge M1000e Blade Enclosure. http://www.dell.com/us/enterprise/p/poweredge-m1000e/pd.aspx.Google Scholar
Esmaeilzadeh, H., Blem, E., Amant, R. S., Sankaralingam, K., and Burger, D. 2011. Dark silicon and the end of multicore scaling. In Proceeding of the 38th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
EuroCloud. 2012. EuroCloud Server. http://www.eurocloudserver.com.Google Scholar
Eyerman, S., Eeckhout, L., Karkhanis, T., and Smith, J. E. 2006. A performance counter architecture for computing accurate CPI components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Faban. 2012. Faban Harness and Benchmark Framework. http://java.net/projects/faban/.Google Scholar
Facebook. 2012. Facebook Statistics. https://www.facebook.com/press/info.php?statistics.Google Scholar
Fan, X., Weber, W.-D., and Barroso, L. A. 2007. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Google. 2012. Google Data Centers. http://www.google.com/intl/en/corporate/datacenter/.Google Scholar
Guz, Z., Itzhak, O., Keidar, I., Kolod, A., Mendelson, A., and Weiser, U. C. 2012. Threads vs. Caches: Modeling the behavior of parallel workloads. In Proceedings of the International Conference on Computer Design.Google Scholar
Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., and Horowitz, M. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Hardavellas, N., Pandis, I., Johnson, R., Mancheril, N., Ailamaki, A., and Falsafi, B. 2007. Database servers on chip multiprocessors: Limitations and opportunities. In The 3rd Biennial Conference on Innovative Data Systems Research.Google Scholar
Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Hardavellas, N., Ferdman, M., Ailamaki, A., and Falsafi, B. 2011. Toward Dark Silicon in Servers. IEEE Micro 31, 4, 6--15. Google ScholarDigital Library
Horowitz, M., Alon, E., Patil, D., Naffziger, S., Kumar, R., and Bernstein, K. 2005. Scaling, power, and the future of CMOS. In Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International.Google Scholar
Huang, S., Huang, J., Dai, J., Xie, T., and Huang, B. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In Proceedings of the 26th International Conference on Data Engineering Workshops.Google Scholar
Intel. 2012. Intel VTune Amplifier XE Performance Profiler. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.Google Scholar
Karkhanis, T. S. and Smith, J. E. 2004. A first-order superscalar processor model. In Proceedings of the 31st Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Keeton, K., Patterson, D. A., He, Q. Y., Raphael, R. C., and Baker, W. E. 1998. Performance characterization of a quad Pentium Pro SMP using OLTP workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Kgil, T., D’Souza, S., Saidi, A., Binkert, N., Dreslinski, R., Mudge, T., Reinhardt, S., and Flautner, K. 2006. PicoServer: Using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Kozyrakis, C., Kansal, A., Sankar, S., and Vaid, K. 2010. Server engineering insights for large-scale online services. IEEE Micro 30, 4, 8--19. Google ScholarDigital Library
Li, A., Yang, X., Kandula, S., and Zhang, M. 2010a. CloudCmp: Comparing public cloud providers. In Proceedings of the 10th Annual Conference on Internet Measurement. Google ScholarDigital Library
Li, A., Yang, X., Kandula, S., and Zhang, M. 2010b. CloudCmp: Shopping for a cloud made easy. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Google ScholarDigital Library
Lim, K., Ranganathan, P., Chang, J., Patel, C., Mudge, T., and Reinhardt, S. 2008. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proceedings of the 35th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Lo, J. L., Barroso, L. A., Eggers, S. J., Gharachorloo, K., Levy, H. M., and Parekh, S. S. 1998. An analysis of database workload performance on simultaneous multithreaded processors. In Proceedings of the 25th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Mahout. 2012. Apache Mahout: Scalable machine-learning and data-mining library. http://mahout.apache.org/.Google Scholar
OpenCompute. 2012. Open Compute Project. http://opencompute.org/.Google Scholar
Ranganathan, P., Gharachorloo, K., Adve, S. V., and Barroso, L. A. 1998. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Ranganathan, P. and Jouppi, N. 2005. Enterprise IT trends and implications for architecture research. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
Reddi, V. J., Lee, B. C., Chilimbi, T., and Vaid, K. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
SeaMicro. 2011. SeaMicro Packs 768 Cores Into its Atom Server. http://www.datacenterknowledge.com/archives/2011/07/18/seamicro-packs-768-cores-into-its-atom-server/.Google Scholar
Sobel, W., Subramanyam, S., Sucharitakul, A., Nguyen, J., Wong, H., Klepchukov, A., Patil, S., Fox, A., and Patterson, D. 2008. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for web 2.0. In Proceedings of the 1st Workshop on Cloud Computing and Its Applications.Google Scholar
Soundararajan, V. and Anderson, J. M. 2010. The impact of management operations on the virtualized datacenter. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Tang, L., Mars, J., Vachharajani, V., Hundt, R., and Soffa, M. L. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceeding of the 38th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
TPC. 2012. Transaction Processing Performance Council. http://www.tpc.org/.Google Scholar
Tuck, N. and Tullsen, D. M. 2003. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., and Taylor, M. B. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Wenisch, T. F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. 2006. Simflex: Statistical sampling of computer system simulation. IEEE Micro 26, 18--31. Google ScholarDigital Library

Index Terms

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
1. General and reference
  1. Cross-computing tools and techniques
    1. Design

Recommendations

Scale-out processors
ISCA '12

Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment. Emerging applications (e.g., data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated ...
Read More
Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Precisely predicting performance degradation due to colocating multiple executing applications on a single machine is critical for improving utilization in modern warehouse-scale computers (WSCs). Bubble-Up is the first mechanism for such precise ...
Read More
Virtualizing HPC applications using modern hypervisors
FederatedClouds '12: Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit

In this paper we explore the prospects of virtualization technologies being applied to high performance computing tasks. We use an extensive set of HPC benchmarks to evaluate virtualization overhead, including HPC Challenge, NAS Parallel Benchmarks and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Computer Systems Volume 30, Issue 4
November 2012
136 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2382553
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2012
- Accepted: 1 September 2012
- Received: 1 July 2012
Published in tocs Volume 30, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 886
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Scale-out processors

Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Virtualizing HPC applications using modern hypervisors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Scale-out processors

Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up

Virtualizing HPC applications using modern hypervisors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media