skip to main content
10.1145/1327171.1327175acmconferencesArticle/Chapter ViewAbstractPublication PagesmedeaConference Proceedingsconference-collections
research-article

Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation

Published:16 September 2007Publication History

ABSTRACT

This paper proposes an instruction pre-execution scheme that reduces latency and early scheduling of loads for a high performance processor. Our scheme exploits the difference between the available amount of instruction-level parallelism with an unlimited number of physical registers and that with an actual number of physical registers. We introduce a scheme called two-step physical register deallocation. Our scheme deallocates physical registers at the renaming stage as a first step, and eliminates pipeline stalls caused by a physical register shortage. Instructions wait for the final deallocation as a second step in the instruction window. While waiting, the scheme allows pre-execution of instructions. This enables prefetching of load data and early calculation of memory effective addresses. In particular, our execution-based scheme has the strength on prefetch of data with an irregular access pattern. Considering the strength of an automatic prefetcher for a regular access pattern, combining it with our scheme offers the best use of our scheme. The evaluation results show that the combined scheme significantly improve performance over a processor with an automatic prefetcher.

References

  1. D. Balkan, J. Sharkey, F. Ponomarev, and A. Aggarwal. Address-value decoupling for early register deallocation. ICPP-35, pages 337--346, August 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. ASPLOS-8, pages 139--149, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous subordinate microthreading(SSMT). ISCA-26, pages 186--195, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. F. Chen and J. L. Baer. Reducing memory latancy via non-blocking and prefetching caches. ASPLOS-5, pages 51--61, October 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. D. Collins, D. M. Tullsen, H. Wang, Y. Lee, D. Lavery, J. P. Shen, and C. Hughes. Speculative precomputation: Long-range prefetching of delinquent loads. ISCA-28, pages 14--25, July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. Memory-system design considerations for dynamically-scheduled processors. ISCA-24, pages 133--143, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. González, J. González, and M. Valero. Virtual-physical registers. HPCA-4, pages 175--184, February 1998.Google ScholarGoogle Scholar
  8. Intel Corporation. Intel Pentium 4 Processor Optimization Reference Manual, 1999.Google ScholarGoogle Scholar
  9. D. Joseph and D. Grunwald. Prefetching using Markov predictors. ISCA-24, pages 252--263, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully associative cache and prefetch buffers. ISCA-17, pages 364--373, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Lai, C. Fide, and B. Falsafi. Dead-block prediction and dead-block correlating prefetchers. ISCA-28, pages 144--154, July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. ASPLOS-7, pages 138--147, October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Monreal, A. González, M. Valero, J. González, and V. Viñals. Delaying physical register allocation through virtual-physical registers. MICRO-32, pages 186--192, November 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Moudgill, K. Pinagli, and S. Vassiliadis. Register renaming and dynamic speculation: an alternative approach. MICRO-26, pages 202--213, December 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Mutlu, H. Kim, and Y. N. Patt. Address-value delta (AVD) prediction: Increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns. MICRO-38, pages 223--244, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: An effective alternative to large instruction windows. HPCA-9, pages 129--140, Feburary 2003.Google ScholarGoogle Scholar
  17. S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. ISCA-21, pages 24--33, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Reinman and B. Calder. Predictive techniques for aggressive load speculation. MICRO-31, pages 127--137, December 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Roth and G. S. Sohi. Speculative data-driven multithreading. HPCA-7, pages 37--48, January 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Sherwood, S. Sair, and B. Calder. Predictor-directed stream buffers. MICRO-33, pages 42--53, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual flow pipelines. ASPLOS-11, pages 107--119, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News, volume 23(1), pages 20--24, March 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
    September 2007
    113 pages
    ISBN:9781595938077
    DOI:10.1145/1327171

    Copyright © 2007 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 16 September 2007

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate6of9submissions,67%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader