ABSTRACT
Off-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a cache on the same die as the processor core. Because small caches dissipate less power and are cheaper than large caches, a small cache is preferable to a large cache. Furthermore, because set-associative caches consume more power than direct-mapped caches, a direct-mapped cache is preferable to a set-associative one. Small, direct-mapped caches generally incur many conflict misses, however. In this paper we propose and evaluate a structure called the Conflict Detection Table (CDT). This table can be used to determine if a memory access is expected to hit the cache. If a hit is expected and a miss occurs, then a conflict is detected and appropriate action can be taken. In addition, we propose two cache structures that employ this technique: the Bypass in Case of Conflict (BCC) cache and the Sub-block in Case of Conflict (SCC) cache. The BCC cache bypasses the cache when a conflict is detected, whereas the SCC cache fetches a sub-block of the missing cache block in such a case. Experimental results using several embedded workloads show that the BCC and SCC cache reduce the amount of traffic significantly in many cases. Furthermore, overall they incur the same number of cache misses as the direct-mapped cache. This shows that the BCC and SCC cache reduce the amount of power consumed with a negligible reduction in performance.
- F. Catthoor, K. Danckaert, C. Kulkarni, E. Brockmeyer, P. Kjeldsberg, T. Van Achteren, and T. Omnes. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Academic Publishers, 2002. Google ScholarDigital Library
- F. Catthoor, F. Franssen, S. Wuytack, L. Nachtergaele, and H. De Man. Global Communication and Memory Optimizing Transformations for Low-Power Signal Processing Systems. In Proc. VLSI Signal Processing Workshop, 1994.Google ScholarCross Ref
- P. de Langen and B. Juurlink. Off-Chip Memory Traffic Measurements of Low-Power Embedded Systems. In Proc. ProRISC Workshop on Circuits, Systems and Signal Processing, pages 351--358, 2002.Google Scholar
- A. González, C. Aliagas, and M. Valero. A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality. In Proc. Int. Conf. on Supercomputing, pages 338--347, 1995. Google ScholarDigital Library
- M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In Proc. Annual Workshop on Workload Characterization, 2001. Google ScholarDigital Library
- J. Hennessy and D. Patterson. Computer Architecture (3rd ed.): A Quantitative Approach. Morgan Kaufmann Publishers Inc., 2003. Google ScholarDigital Library
- T. L. Johnson and W. mei W. Hwu. Run-Time Adaptive Cache Hierarchy Management via Reference Analysis. In Proc. Int. Symp. on Computer Architecture, pages 315--326, 1997. Google ScholarDigital Library
- N. Jouppi. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In Proc. Int. Symp. on Computer Architecture, pages 364--373, 1990. Google ScholarDigital Library
- C. Lee, M. Potkonjak, and W. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons Systems. In Int. Symp. on Microarchitecture, pages 330--335, 1997. Google ScholarDigital Library
- G. Memik, G. Reinman, and W. Mangione-Smith. Reducing Energy and Delay Using Efficient Victim Caches. In Proc. Int. Symp. on Low Power Electronics and Design, pages 262--265. ACM Press, 2003. Google ScholarDigital Library
- G. Reinman and N. Jouppi. An Integrated Cache Timing and Power Model. Technical report, COMPAQ Western Research Lab, Palo Alto, California, 1999.Google Scholar
- E. Tam. Improving Cache Performance Via Active Management. PhD thesis, University of Michigan, Ann Arbor, 1999. Google ScholarDigital Library
Index Terms
- Reducing traffic generated by conflict misses in caches
Recommendations
Runtime identification of cache conflict misses: The adaptive miss buffer
This paper describes the miss classification table, a simple mechanism that enables the processor or memory controller to identify each cache miss as either a conflict miss or a capacity (non-conflict) miss. The miss classification table works by ...
Reducing the second-level cache conflict misses using a set folding technique
The cache memory has a direct effect on the performance of a computer system. Instructions and data are fetched from a fast cache instead of a slow memory to save hundreds of cycles. Reducing the cache miss ratio will definitely improve the execution ...
Effective padding of multidimensional arrays to avoid cache conflict misses
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationCaches are used to significantly improve performance. Even with high degrees of set associativity, the number of accessed data elements mapping to the same set in a cache can easily exceed the degree of associativity. This can cause conflict misses and ...
Comments