ABSTRACT
This paper proposes an instruction pre-execution scheme that reduces latency and early scheduling of loads for a high performance processor. Our scheme exploits the difference between the available amount of instruction-level parallelism with an unlimited number of physical registers and that with an actual number of physical registers. We introduce a scheme called two-step physical register deallocation. Our scheme deallocates physical registers at the renaming stage as a first step, and eliminates pipeline stalls caused by a physical register shortage. Instructions wait for the final deallocation as a second step in the instruction window. While waiting, the scheme allows pre-execution of instructions. This enables prefetching of load data and early calculation of memory effective addresses. In particular, our execution-based scheme has the strength on prefetch of data with an irregular access pattern. Considering the strength of an automatic prefetcher for a regular access pattern, combining it with our scheme offers the best use of our scheme. The evaluation results show that the combined scheme significantly improve performance over a processor with an automatic prefetcher.
- D. Balkan, J. Sharkey, F. Ponomarev, and A. Aggarwal. Address-value decoupling for early register deallocation. ICPP-35, pages 337--346, August 2006. Google ScholarDigital Library
- B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. ASPLOS-8, pages 139--149, October 1998. Google ScholarDigital Library
- R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous subordinate microthreading(SSMT). ISCA-26, pages 186--195, May 1999. Google ScholarDigital Library
- T. F. Chen and J. L. Baer. Reducing memory latancy via non-blocking and prefetching caches. ASPLOS-5, pages 51--61, October 1992. Google ScholarDigital Library
- J. D. Collins, D. M. Tullsen, H. Wang, Y. Lee, D. Lavery, J. P. Shen, and C. Hughes. Speculative precomputation: Long-range prefetching of delinquent loads. ISCA-28, pages 14--25, July 2001. Google ScholarDigital Library
- I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. Memory-system design considerations for dynamically-scheduled processors. ISCA-24, pages 133--143, June 1997. Google ScholarDigital Library
- A. González, J. González, and M. Valero. Virtual-physical registers. HPCA-4, pages 175--184, February 1998.Google Scholar
- Intel Corporation. Intel Pentium 4 Processor Optimization Reference Manual, 1999.Google Scholar
- D. Joseph and D. Grunwald. Prefetching using Markov predictors. ISCA-24, pages 252--263, June 1997. Google ScholarDigital Library
- N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully associative cache and prefetch buffers. ISCA-17, pages 364--373, May 1990. Google ScholarDigital Library
- A. Lai, C. Fide, and B. Falsafi. Dead-block prediction and dead-block correlating prefetchers. ISCA-28, pages 144--154, July 2001. Google ScholarDigital Library
- M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. ASPLOS-7, pages 138--147, October 1996. Google ScholarDigital Library
- T. Monreal, A. González, M. Valero, J. González, and V. Viñals. Delaying physical register allocation through virtual-physical registers. MICRO-32, pages 186--192, November 1999. Google ScholarDigital Library
- M. Moudgill, K. Pinagli, and S. Vassiliadis. Register renaming and dynamic speculation: an alternative approach. MICRO-26, pages 202--213, December 1993. Google ScholarDigital Library
- O. Mutlu, H. Kim, and Y. N. Patt. Address-value delta (AVD) prediction: Increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns. MICRO-38, pages 223--244, November 2005. Google ScholarDigital Library
- O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: An effective alternative to large instruction windows. HPCA-9, pages 129--140, Feburary 2003.Google Scholar
- S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. ISCA-21, pages 24--33, April 1994. Google ScholarDigital Library
- G. Reinman and B. Calder. Predictive techniques for aggressive load speculation. MICRO-31, pages 127--137, December 1998. Google ScholarDigital Library
- A. Roth and G. S. Sohi. Speculative data-driven multithreading. HPCA-7, pages 37--48, January 2001. Google ScholarDigital Library
- T. Sherwood, S. Sair, and B. Calder. Predictor-directed stream buffers. MICRO-33, pages 42--53, June 2000. Google ScholarDigital Library
- S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual flow pipelines. ASPLOS-11, pages 107--119, October 2004. Google ScholarDigital Library
- W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News, volume 23(1), pages 20--24, March 1995. Google ScholarDigital Library
Recommendations
Execution History Guided Instruction Prefetching
The increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever. A major deficiency of existing prefetching methods is that most of them require an extra port to I-...
Comments