Article

Reducing traffic generated by conflict misses in caches

Authors:
Pepijn J. de Langen

Delft University of Technology, Delft, The Netherlands

Delft University of Technology, Delft, The Netherlands
View Profile

,
Ben Juurlink

Delft University of Technology, Delft, The Netherlands

Delft University of Technology, Delft, The Netherlands
View Profile

CF '04: Proceedings of the 1st conference on Computing frontiersApril 2004Pages 235–239https://doi.org/10.1145/977091.977123

Published:14 April 2004Publication History

CF '04: Proceedings of the 1st conference on Computing frontiers

Pages 235–239

ABSTRACT

Off-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a cache on the same die as the processor core. Because small caches dissipate less power and are cheaper than large caches, a small cache is preferable to a large cache. Furthermore, because set-associative caches consume more power than direct-mapped caches, a direct-mapped cache is preferable to a set-associative one. Small, direct-mapped caches generally incur many conflict misses, however. In this paper we propose and evaluate a structure called the Conflict Detection Table (CDT). This table can be used to determine if a memory access is expected to hit the cache. If a hit is expected and a miss occurs, then a conflict is detected and appropriate action can be taken. In addition, we propose two cache structures that employ this technique: the Bypass in Case of Conflict (BCC) cache and the Sub-block in Case of Conflict (SCC) cache. The BCC cache bypasses the cache when a conflict is detected, whereas the SCC cache fetches a sub-block of the missing cache block in such a case. Experimental results using several embedded workloads show that the BCC and SCC cache reduce the amount of traffic significantly in many cases. Furthermore, overall they incur the same number of cache misses as the direct-mapped cache. This shows that the BCC and SCC cache reduce the amount of power consumed with a negligible reduction in performance.

References

F. Catthoor, K. Danckaert, C. Kulkarni, E. Brockmeyer, P. Kjeldsberg, T. Van Achteren, and T. Omnes. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, 2002. Google ScholarDigital Library
F. Catthoor, F. Franssen, S. Wuytack, L. Nachtergaele, and H. De Man. Global Communication and Memory Optimizing Transformations for Low-Power Signal Processing Systems. In Proc. VLSI Signal Processing Workshop, 1994.Google ScholarCross Ref
P. de Langen and B. Juurlink. Off-Chip Memory Traffic Measurements of Low-Power Embedded Systems. In Proc. ProRISC Workshop on Circuits, Systems and Signal Processing, pages 351--358, 2002.Google Scholar
A. González, C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality. In Proc. Int. Conf. on Supercomputing, pages 338--347, 1995. Google ScholarDigital Library
M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Proc. Annual Workshop on Workload Characterization, 2001. Google ScholarDigital Library
J. Hennessy and D. Patterson. Computer Architecture (3rd ed.): A Quantitative Approach. Morgan Kaufmann Publishers Inc., 2003. Google ScholarDigital Library
T. L. Johnson and W. mei W. Hwu. Run-Time Adaptive Cache Hierarchy Management via Reference Analysis. In Proc. Int. Symp. on Computer Architecture, pages 315--326, 1997. Google ScholarDigital Library
N. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proc. Int. Symp. on Computer Architecture, pages 364--373, 1990. Google ScholarDigital Library
C. Lee, M. Potkonjak, and W. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons Systems. In Int. Symp. on Microarchitecture, pages 330--335, 1997. Google ScholarDigital Library
G. Memik, G. Reinman, and W. Mangione-Smith. Reducing Energy and Delay Using Efficient Victim Caches. In Proc. Int. Symp. on Low Power Electronics and Design, pages 262--265. ACM Press, 2003. Google ScholarDigital Library
G. Reinman and N. Jouppi. An Integrated Cache Timing and Power Model. Technical report, COMPAQ Western Research Lab, Palo Alto, California, 1999.Google Scholar
E. Tam. Improving Cache Performance Via Active Management. PhD thesis, University of Michigan, Ann Arbor, 1999. Google ScholarDigital Library

Index Terms

Reducing traffic generated by conflict misses in caches
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Runtime identification of cache conflict misses: The adaptive miss buffer

This paper describes the miss classification table, a simple mechanism that enables the processor or memory controller to identify each cache miss as either a conflict miss or a capacity (non-conflict) miss. The miss classification table works by ...
Read More
Reducing the second-level cache conflict misses using a set folding technique

The cache memory has a direct effect on the performance of a computer system. Instructions and data are fetched from a fast cache instead of a slow memory to save hundreds of cycles. Reducing the cache miss ratio will definitely improve the execution ...
Read More
Effective padding of multidimensional arrays to avoid cache conflict misses
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation

Caches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CF '04: Proceedings of the 1st conference on Computing frontiers
April 2004
522 pages
ISBN:1581137419
DOI:10.1145/977091
General Chair:
Stamatis Vassiliadis
Delft University of Technology, The Netherlands
,
Program Chairs:
Jean-Luc Gaudiot
University of California at Irvine, USA
,
Vincenzo Piuri
University of Milan, Italy
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 April 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
caches
conflict misses
embedded processors
power reduction
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate240of680submissions,35%
Upcoming Conference
CF '24

Sponsor:

sigmicro

21st ACM International Conference on Computing Frontiers

May 7 - 9, 2024

Ischia , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 342
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reducing traffic generated by conflict misses in caches

CF '04: Proceedings of the 1st conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Runtime identification of cache conflict misses: The adaptive miss buffer

Reducing the second-level cache conflict misses using a set folding technique

Effective padding of multidimensional arrays to avoid cache conflict misses

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reducing traffic generated by conflict misses in caches

CF '04: Proceedings of the 1st conference on Computing frontiers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Runtime identification of cache conflict misses: The adaptive miss buffer

Reducing the second-level cache conflict misses using a set folding technique

Effective padding of multidimensional arrays to avoid cache conflict misses

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media