research-article

Accelerating critical section execution with asymmetric multi-core architectures

Authors:
M. Aater Suleman

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Onur Mutlu

Carnegie Mellon University, Seattle, WA, USA

Carnegie Mellon University, Seattle, WA, USA
View Profile

,
Moinuddin K. Qureshi

IBM Research, Yorktown Hieghts, NY, USA

IBM Research, Yorktown Hieghts, NY, USA
View Profile

,
Yale N. Patt

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systemsMarch 2009Pages 253–264https://doi.org/10.1145/1508244.1508274

Published:07 March 2009Publication History

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Pages 253–264

ABSTRACT

To improve the performance of a single application on Chip Multiprocessors (CMPs), the application must be split into threads which execute concurrently on multiple cores. In multi-threaded applications, critical sections are used to ensure that only one thread accesses shared data at any given time. Critical sections can serialize the execution of threads, which significantly reduces performance and scalability.

This paper proposes Accelerated Critical Sections (ACS), a technique that leverages the high-performance core(s) of an Asymmetric Chip Multiprocessor (ACMP) to accelerate the execution of critical sections. In ACS, selected critical sections are executed by a high-performance core, which can execute the critical section faster than the other, smaller cores. As a result, ACS reduces serialization: it lowers the likelihood of threads waiting for a critical section to finish. Our evaluation on a set of 12 critical-section-intensive workloads shows that ACS reduces the average execution time by 34% compared to an equal-area 32T-core symmetric CMP and by 23% compared to an equal-area ACMP. Moreover, for 7 out of the 12 workloads, ACS improves scalability by increasing the number of threads at which performance saturates.

References

MySQL database engine 5.0.1. http://www.mysql.com, 2008.Google Scholar
Opening Tables scalability in MySQL. MySQL Performance Blog. http://www.mysqlperformanceblog.com/2006/11/21/opening--tablesscalability, 2006.Google Scholar
SQLite database engine version 3.5.8. http:/www.sqlite.org, 2008.Google Scholar
SysBench: a system performance benchmark version 0.4.8. http://sysbench.sourceforge.net, 2008.Google Scholar
S. Adve et al. Replacing locks by higher-level primitives. Technical Report TR94-237, Rice University, 1994.Google Scholar
G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In AFIPS, 1967. Google ScholarDigital Library
D. H. Bailey et al. NAS parallel benchmarks. Technical Report Tech. Rep. RNR-94-007, NASA Ames Research Center, 1994.Google Scholar
A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM Trans. Comput. Syst., 2(1):39--59, 1984. Google ScholarDigital Library
C. Brunschen et al. OdinMP/CCp -- a portable implementation of OpenMP for C. Concurrency: Prac. and Exp., 12(12), 2000.Google Scholar
D. Culler, J. Singh, and A. Gupta. Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufmann, 1998. Google ScholarDigital Library
A. J. Dorta et al. The OpenMP source code repository. In Euromicro, 2005. Google ScholarDigital Library
S. Gochman et al. The Intel Pentium M processor: Microarchitecture and performance. 7(2):21--36, May 2003.Google Scholar
G. Grohoski. Distinguished Engineer, Sun Microsystems. Personal communication, November 2007.Google Scholar
M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA-20, 1993. Google ScholarDigital Library
M. Hill and M. Marty. Amdahl's law in the multicore era. IEEE Computer, 41(7), 2008. Google ScholarDigital Library
R. Hoffmann et al. Using hardware operations to reduce the synchronization overhead of task pools. ICPP, 2004 Google ScholarDigital Library
Intel. Prescott New Instructions Software Dev. Guide. http://cachewww.intel.com/cd/00/00/06/67/66753 66753.pdf, 2004.Google Scholar
Intel. Source code for Intel threading building blocks.Google Scholar
Intel. Pentium Processor User's Manual Volume 1: Pentium Processor Data Book, 1993.Google Scholar
Intel. IA-32 Intel Architecture Software Dev. Guide, 2008.Google Scholar
E. Ipek et al. Core fusion: accommodating software diversity in chip multiprocessors. In ISCA-34, 2007. Google ScholarDigital Library
P. Kongetira et al. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE Micro, 25(2):21--29, 2005. Google ScholarDigital Library
H. Kredel. Source code for traveling salesman problem (tsp). http://krum.rz.uni-mannheim.de/ba-pp-2007/java/index.html.Google Scholar
R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan. Heterogeneous chip multiprocessors. IEEE Computer, 38(11), 2005. Google ScholarDigital Library
L. Lamport. A new solution of Dijkstra's concurrent programming problem. CACM, 17(8):453--455, August 1974. Google ScholarDigital Library
J. Laudon and D. Lenoski. The SGI Origin: A ccNUMA Highly Scalable Server. In ISCA, pages 241--251, 1997. Google ScholarDigital Library
E. L. Lawler and D. E. Wood. Branch-and-bound methods: A survey. Operations Research, 14(4):699--719, 1966.Google ScholarDigital Library
C. Liao et al. OpenUH: an optimizing, portable OpenMP compiler. Concurr. Comput. : Pract. Exper., 19(18):2317--2332, 2007. Google ScholarDigital Library
J. F. Martínez and J. Torrellas. Speculative synchronization: applying thread-level speculation to explicitly parallel applications. In ASPLOS-X, 2002.Google ScholarDigital Library
T. Morad et al. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. Comp Arch Lttrs, 2006. Google ScholarDigital Library
R. Narayanan et al. MineBench: A Benchmark Suite for Data Mining Workloads. In IISWC, 2006.Google ScholarCross Ref
Y. Nishitani et al. Implementation and evaluation of OpenMP for Hitachi SR8000. In ISHPC-3, 2000. Google ScholarDigital Library
R. Rajwar and J. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In MICRO-34, 2001. Google ScholarDigital Library
R. Rajwar and J. R. Goodman. Transactional lock-free execution of lock-based programs. In ASPLOS-X, 2002. Google ScholarDigital Library
P. Ranganathan et al. The interaction of software prefetching with ILP processors in shared-memory systems. In ISCA-24, 1997. Google ScholarDigital Library
C. Rossbach et al. TxLinux: using and managing hardware transactional memory in an operating system. In SOSP'07, 2007. Google ScholarDigital Library
M. Sato et al. Design of OpenMP compiler for an SMP cluster. In EWOMP, Sept. 1999.Google Scholar
L. Seiler et al. Larrabee: a many-core x86 architecture for visual computing. ACM Trans. Graph., 2008. Google ScholarDigital Library
S. Sridharan et al. Thread migration to improve synchronization performance. In Workshop on OSIHPA, 2006.Google Scholar
The Standard Performance Evaluation Corporation. Welcome to SPEC. http://www.specbench.org/.Google Scholar
M. Suleman et al. ACMP: Balancing Hardware Efficiency and Programmer Efficiency. Technical report, HPS, February 2007.Google Scholar
M. Suleman et al. An Asymmetric Multi-core Architecture for Accelerating Critical Sections. Technical Report TR-HPS-2008-003, 2008.Google Scholar
M. Suleman et al. Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs. In ASPLOS XIII, 2008. Google ScholarDigital Library
J. M. Tendler et al. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5--26, 2002. Google ScholarDigital Library
Tornado Web Server. Source code. http://tornado.sourceforge.net/.Google Scholar
P. Trancoso and J. Torrellas. The impact of speeding up critical sections with data prefetching and forwarding. In ICPP, 1996.Google ScholarCross Ref
M. Tremblay et al. A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC Processor. In ISSCC, 2008.Google ScholarCross Ref
D. M. Tullsen et al. Simultaneous multithreading: Maximizing onchip parallelism. In ISCA-22, 1995. Google ScholarDigital Library
M. Waldvogel, G. Varghese, J. Turner, and B. Plattner. Scalable high speed ip routing lookups. In SIGCOMM, 1997. Google ScholarDigital Library
Wikipedia. Fifteen puzzle. http://en.wikipedia.org/wiki/Fifteen puzzle.Google Scholar
S. C. Woo et al. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA-22, 1995. Google ScholarDigital Library
P. Zhao and J. N. Amaral. Ablego: a function outlining and partial inlining framework. Softw. Pract. Exper., 37(5):465--491, 2007. Google ScholarDigital Library

Index Terms

Accelerating critical section execution with asymmetric multi-core architectures
1. Computer systems organization
  1. Architectures

Recommendations

Accelerating critical section execution with asymmetric multi-core architectures
ASPLOS 2009

To improve the performance of a single application on Chip Multiprocessors (CMPs), the application must be split into threads which execute concurrently on multiple cores. In multi-threaded applications, critical sections are used to ensure that only ...
Read More
Accelerating critical section execution with asymmetric multi-core architectures
ASPLOS 2009

To improve the performance of a single application on Chip Multiprocessors (CMPs), the application must be split into threads which execute concurrently on multiple cores. In multi-threaded applications, critical sections are used to ensure that only ...
Read More
Accelerating Critical Section Execution with Asymmetric Multicore Architectures

Contention for critical sections can reduce performance and scalability by causing thread serialization. The proposed accelerated critical sections mechanism reduces this limitation. ACS executes critical sections on the high-performance core of an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
March 2009
358 pages
ISBN:9781605584065
DOI:10.1145/1508244
General Chair:
Mary Lou Soffa
University of Virginia, USA
,
Program Chair:
Mary Jane Irwin
Penn State University, USA
ACM SIGPLAN Notices Volume 44, Issue 3
ASPLOS 2009
March 2009
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1508284
Issue’s Table of Contents
ACM SIGARCH Computer Architecture News Volume 37, Issue 1
ASPLOS 2009
March 2009
346 pages
ISSN:0163-5964
DOI:10.1145/2528521
Issue’s Table of Contents
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 March 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cmp
critical sections
heterogeneous cores
locks
multi-core
parallel programming
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Upcoming Conference
ASPLOS '24

Sponsor:

sigarch

sigarch

sigarch

29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 27 - May 1, 2024

La Jolla , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 212
  Total Citations
  View Citations
- 1,861
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerating critical section execution with asymmetric multi-core architectures

ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Accelerating critical section execution with asymmetric multi-core architectures

Accelerating critical section execution with asymmetric multi-core architectures

Accelerating Critical Section Execution with Asymmetric Multicore Architectures