research-article

Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation

Authors:
Akihiro Yamamoto

Nagoya University

Nagoya University
View Profile

,
Yusuke Tanaka

Nagoya University

Nagoya University
View Profile

,
Hideki Ando

Nagoya University

Nagoya University
View Profile

,
Toshio Shimada

Nagoya University

Nagoya University
View Profile

MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architectureSeptember 2007Pages 33–40https://doi.org/10.1145/1327171.1327175

Published:16 September 2007Publication History

MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture

Pages 33–40

ABSTRACT

This paper proposes an instruction pre-execution scheme that reduces latency and early scheduling of loads for a high performance processor. Our scheme exploits the difference between the available amount of instruction-level parallelism with an unlimited number of physical registers and that with an actual number of physical registers. We introduce a scheme called two-step physical register deallocation. Our scheme deallocates physical registers at the renaming stage as a first step, and eliminates pipeline stalls caused by a physical register shortage. Instructions wait for the final deallocation as a second step in the instruction window. While waiting, the scheme allows pre-execution of instructions. This enables prefetching of load data and early calculation of memory effective addresses. In particular, our execution-based scheme has the strength on prefetch of data with an irregular access pattern. Considering the strength of an automatic prefetcher for a regular access pattern, combining it with our scheme offers the best use of our scheme. The evaluation results show that the combined scheme significantly improve performance over a processor with an automatic prefetcher.

References

D. Balkan, J. Sharkey, F. Ponomarev, and A. Aggarwal. Address-value decoupling for early register deallocation. ICPP-35, pages 337--346, August 2006. Google ScholarDigital Library
B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. ASPLOS-8, pages 139--149, October 1998. Google ScholarDigital Library
R. Chappell, J. Stark, S. Kim, S. Reinhardt, and Y. Patt. Simultaneous subordinate microthreading(SSMT). ISCA-26, pages 186--195, May 1999. Google ScholarDigital Library
T. F. Chen and J. L. Baer. Reducing memory latancy via non-blocking and prefetching caches. ASPLOS-5, pages 51--61, October 1992. Google ScholarDigital Library
J. D. Collins, D. M. Tullsen, H. Wang, Y. Lee, D. Lavery, J. P. Shen, and C. Hughes. Speculative precomputation: Long-range prefetching of delinquent loads. ISCA-28, pages 14--25, July 2001. Google ScholarDigital Library
I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. Memory-system design considerations for dynamically-scheduled processors. ISCA-24, pages 133--143, June 1997. Google ScholarDigital Library
A. González, J. González, and M. Valero. Virtual-physical registers. HPCA-4, pages 175--184, February 1998.Google Scholar
Intel Corporation. Intel Pentium 4 Processor Optimization Reference Manual, 1999.Google Scholar
D. Joseph and D. Grunwald. Prefetching using Markov predictors. ISCA-24, pages 252--263, June 1997. Google ScholarDigital Library
N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully associative cache and prefetch buffers. ISCA-17, pages 364--373, May 1990. Google ScholarDigital Library
A. Lai, C. Fide, and B. Falsafi. Dead-block prediction and dead-block correlating prefetchers. ISCA-28, pages 144--154, July 2001. Google ScholarDigital Library
M. H. Lipasti, C. B. Wilkerson, and J. P. Shen. Value locality and load value prediction. ASPLOS-7, pages 138--147, October 1996. Google ScholarDigital Library
T. Monreal, A. González, M. Valero, J. González, and V. Viñals. Delaying physical register allocation through virtual-physical registers. MICRO-32, pages 186--192, November 1999. Google ScholarDigital Library
M. Moudgill, K. Pinagli, and S. Vassiliadis. Register renaming and dynamic speculation: an alternative approach. MICRO-26, pages 202--213, December 1993. Google ScholarDigital Library
O. Mutlu, H. Kim, and Y. N. Patt. Address-value delta (AVD) prediction: Increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns. MICRO-38, pages 223--244, November 2005. Google ScholarDigital Library
O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: An effective alternative to large instruction windows. HPCA-9, pages 129--140, Feburary 2003.Google Scholar
S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. ISCA-21, pages 24--33, April 1994. Google ScholarDigital Library
G. Reinman and B. Calder. Predictive techniques for aggressive load speculation. MICRO-31, pages 127--137, December 1998. Google ScholarDigital Library
A. Roth and G. S. Sohi. Speculative data-driven multithreading. HPCA-7, pages 37--48, January 2001. Google ScholarDigital Library
T. Sherwood, S. Sair, and B. Calder. Predictor-directed stream buffers. MICRO-33, pages 42--53, June 2000. Google ScholarDigital Library
S. T. Srinivasan, R. Rajwar, H. Akkary, A. Gandhi, and M. Upton. Continual flow pipelines. ASPLOS-11, pages 107--119, October 2004. Google ScholarDigital Library
W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. ACM SIGARCH Computer Architecture News, volume 23(1), pages 20--24, March 1995. Google ScholarDigital Library

Recommendations

Execution History Guided Instruction Prefetching

The increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever. A major deficiency of existing prefetching methods is that most of them require an extra port to I-...
Read More
Decoupled memory access architectures with speculative pre-execution
Read More
Pre-execution via speculative data-driven multithreading
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
September 2007
113 pages
ISBN:9781595938077
DOI:10.1145/1327171
Conference Chairs:
Pierfrancesco Foglia
University of Pisa
,
Cosimo Antonio Prete
University of Pisa
,
Sandro Bartolini
University of Siena
,
Roberto Giorgi
University of Siena
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate6of9submissions,67%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 126
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation

MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture

ABSTRACT

References

Cited By

Recommendations

Execution History Guided Instruction Prefetching

Decoupled memory access architectures with speculative pre-execution

Pre-execution via speculative data-driven multithreading