skip to main content
10.1145/1508244.1508274acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Accelerating critical section execution with asymmetric multi-core architectures

Published:07 March 2009Publication History

ABSTRACT

To improve the performance of a single application on Chip Multiprocessors (CMPs), the application must be split into threads which execute concurrently on multiple cores. In multi-threaded applications, critical sections are used to ensure that only one thread accesses shared data at any given time. Critical sections can serialize the execution of threads, which significantly reduces performance and scalability.

This paper proposes Accelerated Critical Sections (ACS), a technique that leverages the high-performance core(s) of an Asymmetric Chip Multiprocessor (ACMP) to accelerate the execution of critical sections. In ACS, selected critical sections are executed by a high-performance core, which can execute the critical section faster than the other, smaller cores. As a result, ACS reduces serialization: it lowers the likelihood of threads waiting for a critical section to finish. Our evaluation on a set of 12 critical-section-intensive workloads shows that ACS reduces the average execution time by 34% compared to an equal-area 32T-core symmetric CMP and by 23% compared to an equal-area ACMP. Moreover, for 7 out of the 12 workloads, ACS improves scalability by increasing the number of threads at which performance saturates.

References

  1. MySQL database engine 5.0.1. http://www.mysql.com, 2008.Google ScholarGoogle Scholar
  2. Opening Tables scalability in MySQL. MySQL Performance Blog. http://www.mysqlperformanceblog.com/2006/11/21/opening--tablesscalability, 2006.Google ScholarGoogle Scholar
  3. SQLite database engine version 3.5.8. http:/www.sqlite.org, 2008.Google ScholarGoogle Scholar
  4. SysBench: a system performance benchmark version 0.4.8. http://sysbench.sourceforge.net, 2008.Google ScholarGoogle Scholar
  5. S. Adve et al. Replacing locks by higher-level primitives. Technical Report TR94-237, Rice University, 1994.Google ScholarGoogle Scholar
  6. G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS, 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. H. Bailey et al. NAS parallel benchmarks. Technical Report Tech. Rep. RNR-94-007, NASA Ames Research Center, 1994.Google ScholarGoogle Scholar
  8. A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM Trans. Comput. Syst., 2(1):39--59, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Brunschen et al. OdinMP/CCp -- a portable implementation of OpenMP for C. Concurrency: Prac. and Exp., 12(12), 2000.Google ScholarGoogle Scholar
  10. D. Culler, J. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. J. Dorta et al. The OpenMP source code repository. In Euromicro, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Gochman et al. The Intel Pentium M processor: Microarchitecture and performance. 7(2):21--36, May 2003.Google ScholarGoogle Scholar
  13. G. Grohoski. Distinguished Engineer, Sun Microsystems. Personal communication, November 2007.Google ScholarGoogle Scholar
  14. M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA-20, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hill and M. Marty. Amdahl's law in the multicore era. IEEE Computer, 41(7), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Hoffmann et al. Using hardware operations to reduce the synchronization overhead of task pools. ICPP, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Intel. Prescott New Instructions Software Dev. Guide. http://cachewww.intel.com/cd/00/00/06/67/66753 66753.pdf, 2004.Google ScholarGoogle Scholar
  18. Intel. Source code for Intel threading building blocks.Google ScholarGoogle Scholar
  19. Intel. Pentium Processor User's Manual Volume 1: Pentium Processor Data Book, 1993.Google ScholarGoogle Scholar
  20. Intel. IA-32 Intel Architecture Software Dev. Guide, 2008.Google ScholarGoogle Scholar
  21. E. Ipek et al. Core fusion: accommodating software diversity in chip multiprocessors. In ISCA-34, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Kongetira et al. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE Micro, 25(2):21--29, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Kredel. Source code for traveling salesman problem (tsp). http://krum.rz.uni-mannheim.de/ba-pp-2007/java/index.html.Google ScholarGoogle Scholar
  24. R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan. Heterogeneous chip multiprocessors. IEEE Computer, 38(11), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Lamport. A new solution of Dijkstra's concurrent programming problem. CACM, 17(8):453--455, August 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In ISCA, pages 241--251, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. L. Lawler and D. E. Wood. Branch-and-bound methods: A survey. Operations Research, 14(4):699--719, 1966.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Liao et al. OpenUH: an optimizing, portable OpenMP compiler. Concurr. Comput. : Pract. Exper., 19(18):2317--2332, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. F. Martínez and J. Torrellas. Speculative synchronization: applying thread-level speculation to explicitly parallel applications. In ASPLOS-X, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Morad et al. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. Comp Arch Lttrs, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Narayanan et al. MineBench: A Benchmark Suite for Data Mining Workloads. In IISWC, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  32. Y. Nishitani et al. Implementation and evaluation of OpenMP for Hitachi SR8000. In ISHPC-3, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Rajwar and J. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In MICRO-34, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based programs. In ASPLOS-X, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Ranganathan et al. The interaction of software prefetching with ILP processors in shared-memory systems. In ISCA-24, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Rossbach et al. TxLinux: using and managing hardware transactional memory in an operating system. In SOSP'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Sato et al. Design of OpenMP compiler for an SMP cluster. In EWOMP, Sept. 1999.Google ScholarGoogle Scholar
  38. L. Seiler et al. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Sridharan et al. Thread migration to improve synchronization performance. In Workshop on OSIHPA, 2006.Google ScholarGoogle Scholar
  40. The Standard Performance Evaluation Corporation. Welcome to SPEC. http://www.specbench.org/.Google ScholarGoogle Scholar
  41. M. Suleman et al. ACMP: Balancing Hardware Efficiency and Programmer Efficiency. Technical report, HPS, February 2007.Google ScholarGoogle Scholar
  42. M. Suleman et al. An Asymmetric Multi-core Architecture for Accelerating Critical Sections. Technical Report TR-HPS-2008-003, 2008.Google ScholarGoogle Scholar
  43. M. Suleman et al. Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs. In ASPLOS XIII, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. M. Tendler et al. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5--26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tornado Web Server. Source code. http://tornado.sourceforge.net/.Google ScholarGoogle Scholar
  46. P. Trancoso and J. Torrellas. The impact of speeding up critical sections with data prefetching and forwarding. In ICPP, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  47. M. Tremblay et al. A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC Processor. In ISSCC, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  48. D. M. Tullsen et al. Simultaneous multithreading: Maximizing onchip parallelism. In ISCA-22, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed ip routing lookups. In SIGCOMM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wikipedia. Fifteen puzzle. http://en.wikipedia.org/wiki/Fifteen puzzle.Google ScholarGoogle Scholar
  51. S. C. Woo et al. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA-22, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. P. Zhao and J. N. Amaral. Ablego: a function outlining and partial inlining framework. Softw. Pract. Exper., 37(5):465--491, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accelerating critical section execution with asymmetric multi-core architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
      March 2009
      358 pages
      ISBN:9781605584065
      DOI:10.1145/1508244
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 44, Issue 3
        ASPLOS 2009
        March 2009
        346 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1508284
        Issue’s Table of Contents
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 37, Issue 1
        ASPLOS 2009
        March 2009
        346 pages
        ISSN:0163-5964
        DOI:10.1145/2528521
        Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 March 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate535of2,713submissions,20%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader