skip to main content
10.1145/2000064.2000108acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Dark silicon and the end of multicore scaling

Published:04 June 2011Publication History

ABSTRACT

Since 2005, processor designers have increased core counts to exploit Moore's Law scaling, rather than focusing on single-core performance. The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling just as single-core scaling has been curtailed. This paper models multicore scaling limits by combining device scaling, single-core scaling, and multicore scaling to measure the speedup potential for a set of parallel workloads for the next five technology generations. For device scaling, we use both the ITRS projections and a set of more conservative device scaling parameters. To model single-core scaling, we combine measurements from over 150 processors to derive Pareto-optimal frontiers for area/performance and power/performance. Finally, to model multicore scaling, we build a detailed performance model of upper-bound performance and lower-bound core power. The multicore designs we study include single-threaded CPU-like and massively threaded GPU-like multicore chip organizations with symmetric, asymmetric, dynamic, and composed topologies. The study shows that regardless of chip organization and topology, multicore scaling is power limited to a degree not widely appreciated by the computing community. Even at 22 nm (just one year from now), 21% of a fixed-size chip must be powered off, and at 8 nm, this number grows to more than 50%. Through 2024, only 7.9x average speedup is possible across commonly used parallel workloads, leaving a nearly 24-fold gap from a target of doubled performance per generation.

Skip Supplemental Material Section

Supplemental Material

isca_8a_2.mp4

mp4

119.8 MB

References

  1. G. M. Amdahl. Validity of the single processor approach to achieving large-scale computing capabilities. In AFIPS '67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Azizi, A. Mahesri, B. C. Lee, S. J. Patel, and M. Horowitz. Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis. In ISCA '10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing CUDA workloads using a detailed GPU simulator. In ISPASS '09.Google ScholarGoogle Scholar
  4. M. Bhadauria, V. Weaver, and S. McKee. Understanding PARSEC performance on contemporary CMPs. In IISWC '09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT '08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Borkar. Thousand core chips: a technology perspective. In DAC '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Borkar. The exascale challenge. Keynote at International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2010.Google ScholarGoogle Scholar
  8. K. Chakraborty. Over-provisioned Multicore Systems. PhD thesis, University of Wisconsin-Madison, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Cho and R. Melhem. Corollaries to Amdahl's law for energy. Computer Architecture Letters, 7 (1), January 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai. Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPUs? In phMICRO '10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc. Design of ion-implanted mosfet's with very small physical dimensions. IEEE Journal of Solid-State Circuits, 9, October 1974.Google ScholarGoogle ScholarCross RefCross Ref
  12. H. Esmaeilzadeh, T. Cao, Y. Xi, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: measured power, performance, and scaling. In ASPLOS '11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Guz, E. Bolotin, I. Keidar, A. Kolodny, A. Mendelson, and U. C. Weiser. Many-core vs. many-thread machines: Stay away from the valley. IEEE Computer Architecture Letters, 8, January 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Hempstead, G.-Y. Wei, and D. Brooks. Navigo: An early-stage model to study power-contrained architectures and specialization. In MoBS '09.Google ScholarGoogle Scholar
  15. M. D. Hill and M. R. Marty. Amdahl's law in the multicore era. Computer, 41 (7), July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein. Scaling, power, and the future of CMOS. In IEDM '05.Google ScholarGoogle Scholar
  17. E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: accommodating software diversity in chip multiprocessors. In ISCA '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ITRS. International technology roadmap for semiconductors, 2010 update, 2011. URL http://www.itrs.net.Google ScholarGoogle Scholar
  19. C. Kim, S. Sethumadhavan, M. S. Govindan, N. Ranganathan, D. Gulati, D. Burger, and S. W. Keckler. Composable lightweight processors. In MICRO '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lee, Jung, and Shin}LeeJ.-G. Lee, E. Jung, and W. Shin. An asymptotic performance/energy analysis and optimization of multi-core architectures. In ICDCN '09,. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lee:ISCA10V. W. Lee et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In ISCA '10,. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Loh. The cost of uncore in throughput-oriented many-core processors. In ALTA '08.Google ScholarGoogle Scholar
  23. G. E. Moore. Cramming more components onto integrated circuits. phElectronics, 38 (8), April 1965.Google ScholarGoogle Scholar
  24. K. Nose and T. Sakurai. Optimization of VDD and VTH for low-power and high speed applications. In phASP-DAC '00. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. SPEC. Standard performance evaluation corporation, 2011. URL http://www.spec.org.Google ScholarGoogle Scholar
  26. A. M. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS '09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor. Conservation cores: reducing the energy of mature computations. In ASPLOS '10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. H. Woo and H.-H. S. Lee. Extending Amdahl's law for energy-efficient computing in the many-core era. Computer, 41 (12), December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dark silicon and the end of multicore scaling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
        June 2011
        488 pages
        ISBN:9781450304726
        DOI:10.1145/2000064
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 39, Issue 3
          ISCA '11
          June 2011
          462 pages
          ISSN:0163-5964
          DOI:10.1145/2024723
          Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 June 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate543of3,203submissions,17%

        Upcoming Conference

        ISCA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader