research-article

Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

Authors:
Daniel Hackenberg

Technische Universität Dresden, Dresden, Germany

Technische Universität Dresden, Dresden, Germany
View Profile

,
Daniel Molka

Technische Universität Dresden, Dresden, Germany

Technische Universität Dresden, Dresden, Germany
View Profile

,
Wolfgang E. Nagel

Technische Universität Dresden, Dresden, Germany

Technische Universität Dresden, Dresden, Germany
View Profile

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on MicroarchitectureDecember 2009Pages 413–422https://doi.org/10.1145/1669112.1669165

Published:12 December 2009Publication History

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Pages 413–422

ABSTRACT

Across a broad range of applications, multicore technology is the most important factor that drives today's microprocessor performance improvements. Closely coupled is a growing complexity of the memory subsystems with several cache levels that need to be exploited efficiently to gain optimal application performance. Many important implementation details of these memory subsystems are undocumented. We therefore present a set of sophisticated benchmarks for latency and bandwidth measurements to arbitrary locations in the memory subsystem. We consider the coherency state of cache lines to analyze the cache coherency protocols and their performance impact. The potential of our approach is demonstrated with an in-depth comparison of ccNUMA multiprocessor systems with AMD (Shanghai) and Intel (Nehalem-EP) quad-core x86-64 processors that both feature integrated memory controllers and coherent point-to-point interconnects. Using our benchmarks we present fundamental memory performance data and architectural properties of both processors. Our comparison reveals in detail how the microarchitectural differences tremendously affect the performance of the memory subsystem.

References

SPEC CPU2006 published results page: http://www.spec.org/cpu2006/results/.Google Scholar
AMD. AMD64 Architecture Programmer's Manual Volume 2: System Programming, revision: 3.14 edition, September 2007. Publication # 24593.Google Scholar
AMD. Software Optimization Guide For AMD Family 10h Processors, revision: 3.04 edition, September 2007. Publication # 40546.Google Scholar
V. Babka and P. Tůma. Investigating cache parameters of x86 family processors. In SPEC Benchmark Workshop, pages 77--96, 2009. Google ScholarDigital Library
P. Conway and B. Hughes. The AMD opteron northbridge architecture. Micro, IEEE, 27(2):10--21, 2007. Google ScholarDigital Library
J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core opteron processor. In IEEE International Solid-State Circuits Conference, pages 102--103, 2007.Google Scholar
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Fourth edition, 2006. Google ScholarDigital Library
Intel. Intel 64 and IA-32 Architectures Optimization Reference Manual, March 2009.Google Scholar
Intel. An Introduction to the Intel QuickPath Interconnect, January 2009.Google Scholar
G. Juckeland, S. Börner, M. Kluge, S. Kölling, W. E. Nagel, S. Pflüger, and H. Röding. BenchIT - performance measurements and comparison for scientific applications. In PARCO, pages 501--508, 2003.Google Scholar
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pages 19--25, December 1995.Google Scholar
L. Peng, J.-K. Peir, T. K. Prakash, C. Staelin, Y.-K. Chen, and D. Koppelman. Memory hierarchy performance measurement of commercial dual-core desktop processors. Journal of Systems Architecture, 54(8):816--828, 2008. Google ScholarDigital Library

Index Terms

Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

Recommendations

Performance Analysis of Cache Coherence Protocols for Multi-core Architectures: A System Attribute Perspective
AICTC '16: Proceedings of the International Conference on Advances in Information Communication Technology & Computing

Shared memory multi-core processors are becoming dominant in todays computer architectures. Caching of shared data may produce a problem of replication in multiple caches. Replication provides reduction in contention for shared data items along with ...
Read More
wrBench: Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems
Abstract
Cache performance is a critical design constraint for modern many-core systems. Since the cache often works in a “black-box” manner, it is difficult for the software to reason about the cache behavior to match the running software to the ...
Read More
Evaluating the performance of four snooping cache coherency protocols
Special Issue: Proceedings of the 16th annual international symposium on Computer Architecture

Write-invalidate and write-broadcast coherency protocols have been criticized for being unable to achieve good bus performance across all cache configurations. In particular, write-invalidate performance can suffer as block size increases; and large ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
December 2009
601 pages
ISBN:9781605587981
DOI:10.1145/1669112
General Chairs:
David Albonesi
Cornell
,
Margaret Martonosi
Princeton
,
Program Chairs:
David August
Princeton/Parakinetics
,
José Martínez
Cornell
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 December 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Nehalem
Shanghai
benchmark
coherency
multi-core
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 85
  Total Citations
  View Citations
- 2,051
  Total Downloads
- Downloads (Last 12 months)92
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance Analysis of Cache Coherence Protocols for Multi-core Architectures: A System Attribute Perspective

wrBench: Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems

Evaluating the performance of four snooping cache coherency protocols

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance Analysis of Cache Coherence Protocols for Multi-core Architectures: A System Attribute Perspective

wrBench: Comparing Cache Architectures and Coherency Protocols on ARMv8 Many-Core Systems

Evaluating the performance of four snooping cache coherency protocols

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media