skip to main content
10.1145/232973.232989acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free Access

Increasing cache port efficiency for dynamic superscalar microprocessors

Authors Info & Claims
Published:01 May 1996Publication History

ABSTRACT

The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single cache port by using additional buffering in the processor, and by taking maximum advantage of a wider cache port. We evaluate these techniques using realistic applications that include the operating system. Our techniques using a single-ported cache achieve 91% of the performance of a dual-ported cache.

References

  1. Aspr93.Tom Asprey, Gregory S. AveriI1, Eric DeLano, Russ Mason, Bill Weiner, and Jeff Yetter, "Performance Features of the PA7100 Microprocessor", IEEE Micro, June 1993, pp. 22-35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Benn95.James Bennett and Mike Flynn, "Performance Factors for Superscalar Processors", Technical Report CSL-TR-95-661, Computer Systems Laboratory, Stanford University, Feb. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chap91.Terry I. Chappell, Barbara A. Chappell, Stanley E. Schuster, James W. Allen, Stephen P. Klepner, Rajiv V. Joshi, and Robert L. Franch, "A 2-ns Cycle, 3.8- ns Access 512-kb CMOS ECL SRAM with a Fully Pipelined Architecture", IEEE Journal of Solid-State Circuits, VoI. 26, No. 11, November 1991, pp. 1577-1585.Google ScholarGoogle ScholarCross RefCross Ref
  4. Chen92.Tien-Fu Chen and Jean-Loup Baer, "Reducing Memory Latency via Nonblocking and Prefetching Caches", ASPLOS-V, Boston, Massachusetts, October 12- 15, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chen94.Chung-Ho Chen and Arun K. Somani, "A Unified Architectural Tradeoff Methodology", ISCA-21, Chicago, Illinois, April 18-21, 1994, pp. 348-357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Conte92.Thomas A. Conte, "Tradeoffs in Processor/Memory interfaces for Superscalar Processors, Proceedings of the 25th Annual International Symposium on Microarchitecture, Portland, Or 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cvet94.Zarka Cvetanovic and Dileep Bhandarkar, "Characterization of Alpha AXP Performance Using TP and SPEC Workloads, The 21~t Annual International Symposium on Computer Architecture, April 18-2I, 1994, pp. 60-70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fark94.Keith I. Farkas and Norman P. Jouppi, "Complexity/Performance Tradeoffs with Non-Blocking Loads", ISCA-2I, Chicago, Illinois, April 18-21, 1994, pp. 211- 222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Farr94.Mathew Farrens, Gary Tyson, and Andrew R. Pleszkun, "A Study of Single- Chip Processor/Cache Organizations for Large Numbers of Transistors", ISCA-21, Chicago, Illinois, April 18-21, 1994, pp. 338-347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gee93.Jeffrey D. Gee, Mark D. Hill, Dionisios N. Pnevmatikatos, and Alan Jay Smith, "Cache Performance of the SPEC92 Benchmark Suite", IEEE Micro, August 1993, pp. 17-27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gray93.Jim Gray, Ed., "The Benchmark Handbook for Database and Transaction Prossing System" , Morgan Kaufman Publishers, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gwen94.Linley Gwennap, "MIPS R 10000 Uses Decoupled Architecture", Mxcroprocessor Report, Volume 8, Number 14, October 24, 1994, pp 18-22.Google ScholarGoogle Scholar
  13. Henn90.John L. Hennessy and David A. Patterson, "Computer Architecture a Quantitative Approach", Morgan Kaufmann Publishers, Inc, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. John91.Mike Johnson, "Superscalar Microprocessor Design", Prentice.Hall Inc, 1991.Google ScholarGoogle Scholar
  15. Joup90.Norman P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prfetch buffers", Proc 17th Annual Int'l Symposium on Computer Architecture (Cat. No. 90CH2887-8), IEEE Computer Society Press, Los Alamitos, CA, Seattle. May 28.31, 1990, pp. 364-373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joup93.Norman P. Jouppi, "Cache Write Policies and Performance", ISCA-20, San Diego, Callforma, May 16-19, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Krof81.David Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization", ISCA-8, 1993 pp. 81-87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kusk94.Jeff Kuskin, David Ofelt, Mark Heinnch, John Heinlein, Richard Simoni, Kourosh Gharachorloo, John Chapin, David Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, and John L. Hennessy, "The Stanford FLASH multiprocessor", Proceedings of the 21st International Symposium on Compu(er Architecrare, pp. 302-313, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mayn94.Ann Marie Grizzaffi Maynard, Colette M. Donnelly, and Bret R. Olszewski, "Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads", ASPLOS-VI, San Jose, CA, October 4-7, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. McLe93.Edward McLellan, "The Alpha AXP Architecture and 21064 F'rocessor", IEEE Micro, June 1993, pp. 36-47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rose95.Mendel Rosenblum, Edouard Bugnion, Stephen Alan Herrod, Emmett WitcheI, and Anoop Gupta, "The Impact of Architectural Trends on Operating System Performance", To Appear in The 15th ACM Symposium on Operating Systems Principles, Copper Mountain Resort, Colorado, Dec. 3-6, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rose95b.Mendel Rosenblum, Stephen A. Herrod, Emmett Wltchel, and Anoop Gupta, "Complete Computer System Simulation: The SimOS Approach", IEEE Parallel and Distrubuted Technology, Volume 3, Number 4, Fall 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. MIPS94.MIPS Technologies, Incorporated, "R10000 Microprocessor Product Overwew", MIPS Open RISC Technology, MIPS Technologies, incorporated, October 1994.Google ScholarGoogle Scholar
  24. NEC94.NEC Corporation, "16M bit Synchronous DRAM, prelinunary data sheet", NEC Corporation, March 1994.Google ScholarGoogle Scholar
  25. Oluk92.Kunle Olukotun, Trevor Mudge, and Richard Brown, "Performance Optimization of Pipelined Primary Caches", ISCA-19, Gold Coast, Australia, May 19-21, 1992, pp 181-190 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Przy88.Przybylski, S., M. Horowitz, and J. Hennessy, "Performance Tradeoffs m Cache Design", Proceedings of the 15th Annual International Symposium on Computer Architecture, June 1988. pp 290-298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rau93.B. Ramakrishna Rau and Joseph A. Fisher, "Instructaon-Level PaJ:allel Processing: History, Overview, and Perspective", Journal of Supercomputing, 7, 1993, pp. 9-50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sohi91.Gurindar S. Sohi and Manoj Franklin, "High-Bandwidth Data Memory Systems for Superscalar Processors", ASPLOS-IV, Santa Clara, CA, Apnl 8-I 1, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. SPEC95.SPEC, "SPEC Benchmark Specifications - 101 .tomcatv", SPEC95 benchmarks release, 1995.Google ScholarGoogle Scholar
  30. Toma67.Tomasulo, R. M., "An Efficient Algorithm for Exploiting Multiple Arithmetic Units.", IBM Journal of Research and Development, Vol. 11 (January 1967), pp. 25-33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Uht86.Uht, A K., "An Efficient Hardware Algorithm to Extract Concum~ncy from General Purpose Code", Proceedings of the Nineteenth Annual Hawaii International Conference on System Sciences, 1986, pp. 41-50.Google ScholarGoogle Scholar
  32. Upto94.Michael Upton, Thomas Huff, Trevor Mudge, and Richard Brown, "Resource Allocation m a High Clock Rate Microprocessor", ASPLOS-VI, San Jose, CA, October 4-7, 1994, pp. 98-109 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wall93.David W. Wall, "Limits of Instruction-Level Parallelism", WRL Research Report 93/6, Western Research Laboratory, 250 University Ave., Palo Alto, CA,Google ScholarGoogle Scholar
  34. Wilt94.Steven J. E. Wilton and Norman P. Jouppi, "An Enhanced Access and Cycle Time Model for On-Chip Caches", WRL Research Report 93/5, Western Research Laboratory, 250 University Ave., Palo Alto, CA, 94301Google ScholarGoogle Scholar
  35. Witc96.Emmett Witchel and Mendel Rosenblum, "Embra: Fast and Flexible Machine Simulation", To appear in the Proceedings of ACM SIGMETRICS '96: Conference on Measurement and Modeling of Computer Systems, Philadelphia, PA, 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Increasing cache port efficiency for dynamic superscalar microprocessors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISCA '96: Proceedings of the 23rd annual international symposium on Computer architecture
        May 1996
        318 pages
        ISBN:0897917863
        DOI:10.1145/232973
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 24, Issue 2
          Special Issue: Proceedings of the 23rd annual international symposium on Computer architecture (ISCA '96)
          May 1996
          303 pages
          ISSN:0163-5964
          DOI:10.1145/232974
          Issue’s Table of Contents

        Copyright © 1996 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 1996

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate543of3,203submissions,17%

        Upcoming Conference

        ISCA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader