skip to main content
research-article

Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

Published:01 November 2012Publication History
Skip Abstract Section

Abstract

Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads.

In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that today’s predominant processor microarchitecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core microarchitecture. Moreover, while today’s predominant microarchitecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key microarchitectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.

References

  1. Ailamaki, A., Dewitt, D. J., Hill, M. D., and Wood, D. A. 1999. DBMSs on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexa. 2012. The Web Information Company. http://www.alexa.com/.Google ScholarGoogle Scholar
  3. Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cassandra. 2012. The Apache Cassandra Project. http://cassandra.apache.org/.Google ScholarGoogle Scholar
  5. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ciortea, L., Zamfir, C., Bucur, S., Chipounov, V., and Candea, G. 2010. Cloud9: A software testing service. ACM SIGOPS Operating Systems Review 43, 5--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Davis, J. D., Laudon, J., and Olukotun, K. 2005. Maximizing CMP throughput with mediocre cores. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, vol. 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dell. 2012. PowerEdge M1000e Blade Enclosure. http://www.dell.com/us/enterprise/p/poweredge-m1000e/pd.aspx.Google ScholarGoogle Scholar
  12. Esmaeilzadeh, H., Blem, E., Amant, R. S., Sankaralingam, K., and Burger, D. 2011. Dark silicon and the end of multicore scaling. In Proceeding of the 38th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. EuroCloud. 2012. EuroCloud Server. http://www.eurocloudserver.com.Google ScholarGoogle Scholar
  14. Eyerman, S., Eeckhout, L., Karkhanis, T., and Smith, J. E. 2006. A performance counter architecture for computing accurate CPI components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Faban. 2012. Faban Harness and Benchmark Framework. http://java.net/projects/faban/.Google ScholarGoogle Scholar
  16. Facebook. 2012. Facebook Statistics. https://www.facebook.com/press/info.php?statistics.Google ScholarGoogle Scholar
  17. Fan, X., Weber, W.-D., and Barroso, L. A. 2007. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Google. 2012. Google Data Centers. http://www.google.com/intl/en/corporate/datacenter/.Google ScholarGoogle Scholar
  19. Guz, Z., Itzhak, O., Keidar, I., Kolod, A., Mendelson, A., and Weiser, U. C. 2012. Threads vs. Caches: Modeling the behavior of parallel workloads. In Proceedings of the International Conference on Computer Design.Google ScholarGoogle Scholar
  20. Hameed, R., Qadeer, W., Wachs, M., Azizi, O., Solomatnikov, A., Lee, B.C., Richardson, S., Kozyrakis, C., and Horowitz, M. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hardavellas, N., Pandis, I., Johnson, R., Mancheril, N., Ailamaki, A., and Falsafi, B. 2007. Database servers on chip multiprocessors: Limitations and opportunities. In The 3rd Biennial Conference on Innovative Data Systems Research.Google ScholarGoogle Scholar
  22. Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hardavellas, N., Ferdman, M., Ailamaki, A., and Falsafi, B. 2011. Toward Dark Silicon in Servers. IEEE Micro 31, 4, 6--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Horowitz, M., Alon, E., Patil, D., Naffziger, S., Kumar, R., and Bernstein, K. 2005. Scaling, power, and the future of CMOS. In Electron Devices Meeting, 2005. IEDM Technical Digest. IEEE International.Google ScholarGoogle Scholar
  25. Huang, S., Huang, J., Dai, J., Xie, T., and Huang, B. 2010. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In Proceedings of the 26th International Conference on Data Engineering Workshops.Google ScholarGoogle Scholar
  26. Intel. 2012. Intel VTune Amplifier XE Performance Profiler. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.Google ScholarGoogle Scholar
  27. Karkhanis, T. S. and Smith, J. E. 2004. A first-order superscalar processor model. In Proceedings of the 31st Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Keeton, K., Patterson, D. A., He, Q. Y., Raphael, R. C., and Baker, W. E. 1998. Performance characterization of a quad Pentium Pro SMP using OLTP workloads. In Proceedings of the 25th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kgil, T., D’Souza, S., Saidi, A., Binkert, N., Dreslinski, R., Mudge, T., Reinhardt, S., and Flautner, K. 2006. PicoServer: Using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kozyrakis, C., Kansal, A., Sankar, S., and Vaid, K. 2010. Server engineering insights for large-scale online services. IEEE Micro 30, 4, 8--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Li, A., Yang, X., Kandula, S., and Zhang, M. 2010a. CloudCmp: Comparing public cloud providers. In Proceedings of the 10th Annual Conference on Internet Measurement. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Li, A., Yang, X., Kandula, S., and Zhang, M. 2010b. CloudCmp: Shopping for a cloud made easy. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lim, K., Ranganathan, P., Chang, J., Patel, C., Mudge, T., and Reinhardt, S. 2008. Understanding and designing new server architectures for emerging warehouse-computing environments. In Proceedings of the 35th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lo, J. L., Barroso, L. A., Eggers, S. J., Gharachorloo, K., Levy, H. M., and Parekh, S. S. 1998. An analysis of database workload performance on simultaneous multithreaded processors. In Proceedings of the 25th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mahout. 2012. Apache Mahout: Scalable machine-learning and data-mining library. http://mahout.apache.org/.Google ScholarGoogle Scholar
  36. OpenCompute. 2012. Open Compute Project. http://opencompute.org/.Google ScholarGoogle Scholar
  37. Ranganathan, P., Gharachorloo, K., Adve, S. V., and Barroso, L. A. 1998. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ranganathan, P. and Jouppi, N. 2005. Enterprise IT trends and implications for architecture research. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Reddi, V. J., Lee, B. C., Chilimbi, T., and Vaid, K. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. SeaMicro. 2011. SeaMicro Packs 768 Cores Into its Atom Server. http://www.datacenterknowledge.com/archives/2011/07/18/seamicro-packs-768-cores-into-its-atom-server/.Google ScholarGoogle Scholar
  41. Sobel, W., Subramanyam, S., Sucharitakul, A., Nguyen, J., Wong, H., Klepchukov, A., Patil, S., Fox, A., and Patterson, D. 2008. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for web 2.0. In Proceedings of the 1st Workshop on Cloud Computing and Its Applications.Google ScholarGoogle Scholar
  42. Soundararajan, V. and Anderson, J. M. 2010. The impact of management operations on the virtualized datacenter. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tang, L., Mars, J., Vachharajani, V., Hundt, R., and Soffa, M. L. 2011. The impact of memory subsystem resource sharing on datacenter applications. In Proceeding of the 38th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. TPC. 2012. Transaction Processing Performance Council. http://www.tpc.org/.Google ScholarGoogle Scholar
  45. Tuck, N. and Tullsen, D. M. 2003. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Venkatesh, G., Sampson, J., Goulding, N., Garcia, S., Bryksin, V., Lugo-Martinez, J., Swanson, S., and Taylor, M. B. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wenisch, T. F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. 2006. Simflex: Statistical sampling of computer system simulation. IEEE Micro 26, 18--31. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Computer Systems
      ACM Transactions on Computer Systems  Volume 30, Issue 4
      November 2012
      136 pages
      ISSN:0734-2071
      EISSN:1557-7333
      DOI:10.1145/2382553
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 November 2012
      • Accepted: 1 September 2012
      • Received: 1 July 2012
      Published in tocs Volume 30, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader