Increasing cache port efficiency for dynamic superscalar microprocessors

Authors:
Kenneth M. Wilson

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Kunle Olukotun

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

,
Mendel Rosenblum

Computer Systems Laboratory, Stanford University, Stanford, CA

Computer Systems Laboratory, Stanford University, Stanford, CA
View Profile

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architectureMay 1996Pages 147–157https://doi.org/10.1145/232973.232989

Published:01 May 1996Publication History

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

Pages 147–157

ABSTRACT

The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single cache port by using additional buffering in the processor, and by taking maximum advantage of a wider cache port. We evaluate these techniques using realistic applications that include the operating system. Our techniques using a single-ported cache achieve 91% of the performance of a dual-ported cache.

References

Aspr93.Tom Asprey, Gregory S. AveriI1, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter, "Performance Features of the PA7100 Microprocessor", IEEE Micro, June 1993, pp. 22-35. Google ScholarDigital Library
Benn95.James Bennett and Mike Flynn, "Performance Factors for Superscalar Processors", Technical Report CSL-TR-95-661, Computer Systems Laboratory, Stanford University, Feb. 1995. Google ScholarDigital Library
Chap91.Terry I. Chappell, Barbara A. Chappell, Stanley E. Schuster, James W. Allen, Stephen P. Klepner, Rajiv V. Joshi, and Robert L. Franch, "A 2-ns Cycle, 3.8- ns Access 512-kb CMOS ECL SRAM with a Fully Pipelined Architecture", IEEE Journal of Solid-State Circuits, VoI. 26, No. 11, November 1991, pp. 1577-1585.Google ScholarCross Ref
Chen92.Tien-Fu Chen and Jean-Loup Baer, "Reducing Memory Latency via Nonblocking and Prefetching Caches", ASPLOS-V, Boston, Massachusetts, October 12- 15, 1992. Google ScholarDigital Library
Chen94.Chung-Ho Chen and Arun K. Somani, "A Unified Architectural Tradeoff Methodology", ISCA-21, Chicago, Illinois, April 18-21, 1994, pp. 348-357. Google ScholarDigital Library
Conte92.Thomas A. Conte, "Tradeoffs in Processor/Memory interfaces for Superscalar Processors, Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Or 1992. Google ScholarDigital Library
Cvet94.Zarka Cvetanovic and Dileep Bhandarkar, "Characterization of Alpha AXP Performance Using TP and SPEC Workloads, The 21~t Annual International Symposium on Computer Architecture, April 18-2I, 1994, pp. 60-70. Google ScholarDigital Library
Fark94.Keith I. Farkas and Norman P. Jouppi, "Complexity/Performance Tradeoffs with Non-Blocking Loads", ISCA-2I, Chicago, Illinois, April 18-21, 1994, pp. 211- 222. Google ScholarDigital Library
Farr94.Mathew Farrens, Gary Tyson, and Andrew R. Pleszkun, "A Study of Single- Chip Processor/Cache Organizations for Large Numbers of Transistors", ISCA-21, Chicago, Illinois, April 18-21, 1994, pp. 338-347. Google ScholarDigital Library
Gee93.Jeffrey D. Gee, Mark D. Hill, Dionisios N. Pnevmatikatos, and Alan Jay Smith, "Cache Performance of the SPEC92 Benchmark Suite", IEEE Micro, August 1993, pp. 17-27. Google ScholarDigital Library
Gray93.Jim Gray, Ed., "The Benchmark Handbook for Database and Transaction Prossing System" , Morgan Kaufman Publishers, 1993. Google ScholarDigital Library
Gwen94.Linley Gwennap, "MIPS R 10000 Uses Decoupled Architecture", Mxcroprocessor Report, Volume 8, Number 14, October 24, 1994, pp 18-22.Google Scholar
Henn90.John L. Hennessy and David A. Patterson, "Computer Architecture a Quantitative Approach", Morgan Kaufmann Publishers, Inc, 1990. Google ScholarDigital Library
John91.Mike Johnson, "Superscalar Microprocessor Design", Prentice.Hall Inc, 1991.Google Scholar
Joup90.Norman P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prfetch buffers", Proc 17th Annual Int'l Symposium on Computer Architecture (Cat. No. 90CH2887-8), IEEE Computer Society Press, Los Alamitos, CA, Seattle. May 28.31, 1990, pp. 364-373. Google ScholarDigital Library
Joup93.Norman P. Jouppi, "Cache Write Policies and Performance", ISCA-20, San Diego, Callforma, May 16-19, 1993. Google ScholarDigital Library
Krof81.David Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization", ISCA-8, 1993 pp. 81-87. Google ScholarDigital Library
Kusk94.Jeff Kuskin, David Ofelt, Mark Heinnch, John Heinlein, Richard Simoni, Kourosh Gharachorloo, John Chapin, David Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, and John L. Hennessy, "The Stanford FLASH multiprocessor", Proceedings of the 21st International Symposium on Compu(er Architecrare, pp. 302-313, April 1994. Google ScholarDigital Library
Mayn94.Ann Marie Grizzaffi Maynard, Colette M. Donnelly, and Bret R. Olszewski, "Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads", ASPLOS-VI, San Jose, CA, October 4-7, 1994.Google ScholarDigital Library
McLe93.Edward McLellan, "The Alpha AXP Architecture and 21064 F'rocessor", IEEE Micro, June 1993, pp. 36-47. Google ScholarDigital Library
Rose95.Mendel Rosenblum, Edouard Bugnion, Stephen Alan Herrod, Emmett WitcheI, and Anoop Gupta, "The Impact of Architectural Trends on Operating System Performance", To Appear in The 15th ACM Symposium on Operating Systems Principles, Copper Mountain Resort, Colorado, Dec. 3-6, 1995. Google ScholarDigital Library
Rose95b.Mendel Rosenblum, Stephen A. Herrod, Emmett Wltchel, and Anoop Gupta, "Complete Computer System Simulation: The SimOS Approach", IEEE Parallel and Distrubuted Technology, Volume 3, Number 4, Fall 1995. Google ScholarDigital Library
MIPS94.MIPS Technologies, Incorporated, "R10000 Microprocessor Product Overwew", MIPS Open RISC Technology, MIPS Technologies, incorporated, October 1994.Google Scholar
NEC94.NEC Corporation, "16M bit Synchronous DRAM, prelinunary data sheet", NEC Corporation, March 1994.Google Scholar
Oluk92.Kunle Olukotun, Trevor Mudge, and Richard Brown, "Performance Optimization of Pipelined Primary Caches", ISCA-19, Gold Coast, Australia, May 19-21, 1992, pp 181-190 Google ScholarDigital Library
Przy88.Przybylski, S., M. Horowitz, and J. Hennessy, "Performance Tradeoffs m Cache Design", Proceedings of the 15th Annual International Symposium on Computer Architecture, June 1988. pp 290-298. Google ScholarDigital Library
Rau93.B. Ramakrishna Rau and Joseph A. Fisher, "Instructaon-Level PaJ:allel Processing: History, Overview, and Perspective", Journal of Supercomputing, 7, 1993, pp. 9-50. Google ScholarDigital Library
Sohi91.Gurindar S. Sohi and Manoj Franklin, "High-Bandwidth Data Memory Systems for Superscalar Processors", ASPLOS-IV, Santa Clara, CA, Apnl 8-I 1, 1991. Google ScholarDigital Library
SPEC95.SPEC, "SPEC Benchmark Specifications - 101 .tomcatv", SPEC95 benchmarks release, 1995.Google Scholar
Toma67.Tomasulo, R. M., "An Efficient Algorithm for Exploiting Multiple Arithmetic Units.", IBM Journal of Research and Development, Vol. 11 (January 1967), pp. 25-33.Google ScholarDigital Library
Uht86.Uht, A K., "An Efficient Hardware Algorithm to Extract Concum~ncy from General Purpose Code", Proceedings of the Nineteenth Annual Hawaii International Conference on System Sciences, 1986, pp. 41-50.Google Scholar
Upto94.Michael Upton, Thomas Huff, Trevor Mudge, and Richard Brown, "Resource Allocation m a High Clock Rate Microprocessor", ASPLOS-VI, San Jose, CA, October 4-7, 1994, pp. 98-109 Google ScholarDigital Library
Wall93.David W. Wall, "Limits of Instruction-Level Parallelism", WRL Research Report 93/6, Western Research Laboratory, 250 University Ave., Palo Alto, CA,Google Scholar
Wilt94.Steven J. E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for On-Chip Caches", WRL Research Report 93/5, Western Research Laboratory, 250 University Ave., Palo Alto, CA, 94301Google Scholar
Witc96.Emmett Witchel and Mendel Rosenblum, "Embra: Fast and Flexible Machine Simulation", To appear in the Proceedings of ACM SIGMETRICS '96: Conference on Measurement and Modeling of Computer Systems, Philadelphia, PA, 1996 Google ScholarDigital Library

Index Terms

Increasing cache port efficiency for dynamic superscalar microprocessors
1. Hardware
  1. Hardware test
    1. Test-pattern generation and fault simulation
  2. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Increasing cache port efficiency for dynamic superscalar microprocessors
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)

The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single ...
Read More
High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors
Read More
High bandwidth cache design for superscalar processors
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture
May 1996
318 pages
ISBN:0897917863
DOI:10.1145/232973
Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle
ACM SIGARCH Computer Architecture News Volume 24, Issue 2
Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)
May 1996
303 pages
ISSN:0163-5964
DOI:10.1145/232974
Chairman:
Jean-Loup Baer
Univ. of Washington, Seattle
Issue’s Table of Contents
Copyright © 1996 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1996
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate543of3,203submissions,17%
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 76
  Total Citations
  View Citations
- 513
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Increasing cache port efficiency for dynamic superscalar microprocessors

High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors

High bandwidth cache design for superscalar processors