Abstract
We report on a trace-driven simulation study to examine the effect of a two-level cache hierarchy in uniprocessors. A simulation model of a multiple-cycle-per-instruction processor was constructed to estimate the total cycles required to execute a synthetic benchmark. Results show that a second-level cache can be used to increase system performance when main memory access times are large relative to CPU cycle time. For example, the addition of a 4-cycle, 64K second-level cache following a 1-cycle, 8K first-level cache increases performance by 15 percent when used in a system with a 15-cycle primary memory. Second level caches are shown to be particularly effective when used behind small on-chip caches; adding an 8K second-level to a 1K first-level increases performance by 26 percent, assuming similar parameters. We also evaluate the performance impact of different write strategies and separate I and D caches.
- 1 Anant Agarwal, Paul Chow, Mark Horowitz, John Acken, Arturo Saltz, and John Hennessy. On-chip instruction caches for high performance processors. In Conference on Advanced Research in VLSI, March 1987.Google Scholar
- 2 Anant Agarwal, Richard L. Sites, and Mark Horowitz. ATUM: a new technique for capturing address traces using microcode. In Proceedings of the 131h International Symposium on Computer Architecture, pages 119-127, ACM SIGARCH, June 1986. Google ScholarDigital Library
- 3 David Archer, David Deverell, Thomas Fox, Paul Gronowski, Anil Jain, Michael Leary, Daniel Miner, Andrew Olesin, Shawn Persels, Paul Rubinfeld, and Robert Supnik. A CMOS VAX microprocessor with on-chip cache and memory management. IEEE JournaE of Solid State Circuits, SC-22(5):849-852, October 1987.Google ScholarCross Ref
- 4 Jean-Loup Baer and Wen-Hann Wang. Architectural choices for multi-level cache hierarchies. In Proceedings of the 1987 International Conference on Parallel Processing, pages 258-261, August 1987. Also (in an expanded form) Department of Computer Science Report TR-87-Ol- 04, University of Washington, January 1987.Google Scholar
- 5 Fayte Briggs and Michei Dubois. Effectiveness of private caches in multiprocessor systems with parallel-pipelined memories. IEEE Transactions on Computers, C-32(1):48- 59, January 1983.Google ScholarDigital Library
- 6 Paul Chow and Mark Horowitz. Architectural tradeoffs in the design of MIPS-X. In Proceedings of the 14th International Symposium on Computer Architecture, pages 300- 308, June 1987. Google ScholarDigital Library
- 7 Douglas W. Clark. Cache performance in the VAX-11/780. A Chl Tmnsaclions on Computer Systems, l( 1):24-37, February 1983. Google ScholarDigital Library
- 8 Douglas W. Clark. Pipelining and performance in the 7AX 8800 processor. In Proceedings of the Second Internaional Colzjerence on Architectural Support for Programning Languages and Operating Systems, pages 173-177, October 1987. Google ScholarCross Ref
- 9 Douglas W. Clark and Joel S. Emer. Performance of the VAX-II/780 translation buffer: simulation and measurenent. ACAi Tronsac2ions on Computer Systems, 3(1):31- 62, February 1985. Google ScholarDigital Library
- 10 Daniel J. Colglazicr. A Performance Analysis of hlultiprocessors Using Two-Level Caches. Master's thesis, University of Illinois, 1984.Google Scholar
- 11 David R. Ditzcl, Hubert R. McLellan, and Alan D. Beren- 3aum. The hardware architecture of the CRISP microprocessor. In Proceedings of the 14th International Symposium Dn Computer Architecture, pages 309-319, June 1987. Google ScholarDigital Library
- 12 R. P. Fletcher, R. A. Ileller, and D. M. Stein. MP-shared cache with store-through local caches. IBM Technical Disclosure Bulletin, 25(10):5133-5135, March 1983.Google Scholar
- 13 James R. Goodman. Using cache memory to reduce processor memory traffic. In Proceedings of the lUth International Symposium on Computer Architecture, pages 124- 131, June 1983. Google ScholarDigital Library
- 14 Akira Hattori, Minoru Koshino, and Shigemi Kamimoto. Three-level hierarchical storage system for the FACOM M-380/382. In Proceedings Information Prcxessing IFIP, pages F93-697, 1983.Google Scholar
- 15 Mark Hill, et al. Design decisions in SPUR. Computer, 8-22, November 1986. Google ScholarDigital Library
- 16 Henry M. Levy and Peter II. Lipman. Virtual memory management in the VAX/VMS operating system. Computer, 35-41, March 1982.Google Scholar
- 17 John S. Liptay. Structural aspects of the System/3GO hlodei 85, Part II - The cache. IBM Systems Journal, 7(l):15-21, 1968.Google Scholar
- 18 Robert T. Short. A Study of Multilevel Cache Memories. hIaster's thesis, Department of Computer Science, University of Washington, January 1987.Google Scholar
- 19 Alan J. Smith. Cache memories. ACAI Computing Surveys, 14(3):473-530, September 1982. Google ScholarDigital Library
- 20 Alan J. Smith. Line (block) size choices for CPU cache memories. IEEE Transactions on Computers, C- 3G(9):1063-1075, September 1987. Google ScholarDigital Library
- 21 Alan J. Smith. Problems, directions and issues in memory hierarchies. In Proceedings of the Eighteenth Annual Hawaii Znnternational Conference on System Sciences, pages 4'38-476, 1985.Google Scholar
- 22 F. J. Sparacio. Data processing system with second level cache. IBM Technical Disclosure Bulletin, 21(6):2468- 24G9, November 1378.Google Scholar
- 23 William D. Strecker. Cache memories for PDP-11 family computers. In Proceedings of the 3rd Annual Symposium on Computer Architecture, pages 155-158, January 1976. Google ScholarDigital Library
- 24 Andrew W. Wilson, Jr. Hierarchical cache/bus architecture for shared memory multiprocessors. In Proceedings of the 14th International Symposium on Computer Architecture, pages 244-252, June 1987. Google ScholarDigital Library
- 25 Phil C. C. Yeh, Janek II. Patel, and Edward S. Davidson. Shared cache for multiple-stream computer systems. IEEE Transactions on Computers, C-32(1):38-47, January 1983.Google ScholarDigital Library
Index Terms
- A simulation study of two-level caches
Recommendations
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesThe replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
Revisiting level-0 caches in embedded processors
CASES '12: Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systemsLevel-0 (L0) caches have been proposed in the past as an inexpensive way to improve performance and reduce energy consumption in resource-constrained embedded processors. This paper proposes new L0 data cache organizations using the assumption that an ...
Bypass and insertion algorithms for exclusive last-level caches
ISCA '11Inclusive last-level caches (LLCs) waste precious silicon estate due to cross-level replication of cache blocks. As the industry moves toward cache hierarchies with larger inner levels, this wasted cache space leads to bigger performance losses compared ...
Comments