Abstract
Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads.
In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that today’s predominant processor microarchitecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core microarchitecture. Moreover, while today’s predominant microarchitecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key microarchitectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.
- Ailamaki, A., Dewitt, D. J., Hill, M. D., and Wood, D. A. 1999. DBMSs on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases. Google ScholarDigital Library
- Alexa. 2012. The Web Information Company. http://www.alexa.com/.Google Scholar
- Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
- Cassandra. 2012. The Apache Cassandra Project. http://cassandra.apache.org/.Google Scholar
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7. Google ScholarDigital Library
- Ciortea, L., Zamfir, C., Bucur, S., Chipounov, V., and Candea, G. 2010. Cloud9: A software testing service. ACM SIGOPS Operating Systems Review 43, 5--10. Google ScholarDigital Library
- Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. Google ScholarDigital Library
- Davis, J. D., Laudon, J., and Olukotun, K. 2005. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
- Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, vol. 6. Google ScholarDigital Library
- DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. Google ScholarDigital Library
- Dell. 2012. PowerEdge M1000e Blade Enclosure. http://www.dell.com/us/enterprise/p/poweredge-m1000e/pd.aspx.Google Scholar
- Esmaeilzadeh, H., Blem, E., Amant, R. S., Sankaralingam, K., and Burger, D. 2011. Dark silicon and the end of multicore scaling. In Proceeding of the 38th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- EuroCloud. 2012. EuroCloud Server. http://www.eurocloudserver.com.Google Scholar
- Eyerman, S., Eeckhout, L., Karkhanis, T., and Smith, J. E. 2006. A performance counter architecture for computing accurate CPI components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Faban. 2012. Faban Harness and Benchmark Framework. http://java.net/projects/faban/.Google Scholar
- Facebook. 2012. Facebook Statistics. https://www.facebook.com/press/info.php?statistics.Google Scholar
- Fan, X., Weber, W.-D., and Barroso, L. A. 2007. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Google. 2012. Google Data Centers. http://www.google.com/intl/en/corporate/datacenter/.Google Scholar
- Guz, Z., Itzhak, O., Keidar, I., Kolod, A., Mendelson, A., and Weiser, U. C. 2012. Threads vs. Caches: Modeling the behavior of parallel workloads. In Proceedings of the International Conference on Computer Design.Google Scholar
- Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., and Horowitz, M. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Hardavellas, N., Pandis, I., Johnson, R., Mancheril, N., Ailamaki, A., and Falsafi, B. 2007. Database servers on chip multiprocessors: Limitations and opportunities. In The 3rd Biennial Conference on Innovative Data Systems Research.Google Scholar
- Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Hardavellas, N., Ferdman, M., Ailamaki, A., and Falsafi, B. 2011. Toward Dark Silicon in Servers. IEEE Micro 31, 4, 6--15. Google ScholarDigital Library
- Horowitz, M., Alon, E., Patil, D., Naffziger, S., Kumar, R., and Bernstein, K. 2005. Scaling, power, and the future of CMOS. In Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International.Google Scholar
- Huang, S., Huang, J., Dai, J., Xie, T., and Huang, B. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In Proceedings of the 26th International Conference on Data Engineering Workshops.Google Scholar
- Intel. 2012. Intel VTune Amplifier XE Performance Profiler. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.Google Scholar
- Karkhanis, T. S. and Smith, J. E. 2004. A first-order superscalar processor model. In Proceedings of the 31st Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Keeton, K., Patterson, D. A., He, Q. Y., Raphael, R. C., and Baker, W. E. 1998. Performance characterization of a quad Pentium Pro SMP using OLTP workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Kgil, T., D’Souza, S., Saidi, A., Binkert, N., Dreslinski, R., Mudge, T., Reinhardt, S., and Flautner, K. 2006. PicoServer: Using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Kozyrakis, C., Kansal, A., Sankar, S., and Vaid, K. 2010. Server engineering insights for large-scale online services. IEEE Micro 30, 4, 8--19. Google ScholarDigital Library
- Li, A., Yang, X., Kandula, S., and Zhang, M. 2010a. CloudCmp: Comparing public cloud providers. In Proceedings of the 10th Annual Conference on Internet Measurement. Google ScholarDigital Library
- Li, A., Yang, X., Kandula, S., and Zhang, M. 2010b. CloudCmp: Shopping for a cloud made easy. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Google ScholarDigital Library
- Lim, K., Ranganathan, P., Chang, J., Patel, C., Mudge, T., and Reinhardt, S. 2008. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proceedings of the 35th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Lo, J. L., Barroso, L. A., Eggers, S. J., Gharachorloo, K., Levy, H. M., and Parekh, S. S. 1998. An analysis of database workload performance on simultaneous multithreaded processors. In Proceedings of the 25th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Mahout. 2012. Apache Mahout: Scalable machine-learning and data-mining library. http://mahout.apache.org/.Google Scholar
- OpenCompute. 2012. Open Compute Project. http://opencompute.org/.Google Scholar
- Ranganathan, P., Gharachorloo, K., Adve, S. V., and Barroso, L. A. 1998. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Ranganathan, P. and Jouppi, N. 2005. Enterprise IT trends and implications for architecture research. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
- Reddi, V. J., Lee, B. C., Chilimbi, T., and Vaid, K. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- SeaMicro. 2011. SeaMicro Packs 768 Cores Into its Atom Server. http://www.datacenterknowledge.com/archives/2011/07/18/seamicro-packs-768-cores-into-its-atom-server/.Google Scholar
- Sobel, W., Subramanyam, S., Sucharitakul, A., Nguyen, J., Wong, H., Klepchukov, A., Patil, S., Fox, A., and Patterson, D. 2008. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for web 2.0. In Proceedings of the 1st Workshop on Cloud Computing and Its Applications.Google Scholar
- Soundararajan, V. and Anderson, J. M. 2010. The impact of management operations on the virtualized datacenter. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Tang, L., Mars, J., Vachharajani, V., Hundt, R., and Soffa, M. L. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceeding of the 38th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- TPC. 2012. Transaction Processing Performance Council. http://www.tpc.org/.Google Scholar
- Tuck, N. and Tullsen, D. M. 2003. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
- Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., and Taylor, M. B. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Wenisch, T. F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. 2006. Simflex: Statistical sampling of computer system simulation. IEEE Micro 26, 18--31. Google ScholarDigital Library
Index Terms
- Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
Recommendations
Scale-out processors
ISCA '12Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment. Emerging applications (e.g., data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated ...
Increasing Utilization in Modern Warehouse-Scale Computers Using Bubble-Up
Precisely predicting performance degradation due to colocating multiple executing applications on a single machine is critical for improving utilization in modern warehouse-scale computers (WSCs). Bubble-Up is the first mechanism for such precise ...
Virtualizing HPC applications using modern hypervisors
FederatedClouds '12: Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summitIn this paper we explore the prospects of virtualization technologies being applied to high performance computing tasks. We use an extensive set of HPC benchmarks to evaluate virtualization overhead, including HPC Challenge, NAS Parallel Benchmarks and ...
Comments