Early load address resolution via register tracking

Authors:
Michael Bekerman

HAL Computer Systems and Intel Corporation

HAL Computer Systems and Intel Corporation
View Profile

,
Adi Yoaz

Intel Corporation

Intel Corporation
View Profile

,
Freddy Gabbay

Mellanox Technologies Inc. and Intel Corporation

Mellanox Technologies Inc. and Intel Corporation
View Profile

,
Stephan Jourdan

Intel Corporation

Intel Corporation
View Profile

,
Maxim Kalaev

Intel Corporation

Intel Corporation
View Profile

,
Ronny Ronen

Intel Corporation

Intel Corporation
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 28 Issue 2May 2000pp 306–315https://doi.org/10.1145/342001.339705

Published:01 May 2000Publication History

ACM SIGARCH Computer Architecture News

Abstract

Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-speculative technique that partially hides the increasing load-to-use latency, by allowing the early issue of load instructions. Early load address resolution relies on register tracking to safely compute the addresses of memory references in the front-end part of the processor pipeline. Register tracking enables decode-time computation of register values by tracking simple operations of the form reg±immediate. Register tracking may be performed in any pipeline stage following instruction decode and prior to execution.

Several tracking schemes are proposed in this paper:

Stack pointer tracking allows safe early resolution of stack references by keeping track of the value of the ESP register (the stack pointer). About 25% of all loads are stack loads and 95% of these loads may be resolved in the front-end.
Absolute address tracking allows the early resolution of constant-address loads.
Displacement-based tracking tackles all loads with addresses of the form reg±immediate by tracking the values of all general-purpose registers. This class corresponds to 82% of all loads, and about 65% of these loads can be safely resolved in the front-end pipeline.

The paper describes the tracking schemes, analyzes their performance potential in a deeply pipelined processor and discusses the integration of tracking with memory disambiguation.

References

1 T. M. Austin and G. S. Sohi, Zero-cycle Loads: Microarchitecture Support for Reducing Load Latency, in Proceedings of the 28th Annual International Symposium on Microarchitecture, November 1995. Google ScholarDigital Library
2 T.M.Austin, D.N. Pnevmatikatos, G.S. Sohi. Streamlining Data Cache Access with Fast Address Calculation, In 22nd International Symposium on Computer Architecture, 1995, pp. 369-380 Google ScholarDigital Library
3 J. Baer and T. Chen, An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty, in Proceedings of the International Conference on Supercomputing, November 1991. Google ScholarDigital Library
4 M. Bekerman, S. Jourdan, R. Ronen, G. Kirshenboim, L. Rappoport, A. Yoaz, U. Weiser. Correlated Load Address Predictors, in Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999. Google ScholarDigital Library
5 T. Chen and and J. Baer, Effective Hardware-Based Data Prefetching for High-Performance Processors, in IEEE Transactions on Computer, V.45 N.5, May 1995. Google ScholarDigital Library
6 S. Cho, P.-C. Yew, G. Lee. Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor, in Proceedings of the 26th International Symposium on Computer Architecture, May 1999. Google ScholarDigital Library
7 G. Chrysos and J. Emer, Memory Dependence Prediction Using Store Sets, in Proceedings of the 25th International Symposium on Computer Architecture, July 1998. Google ScholarCross Ref
8 D. Ditzel and R. McLellan. Register Allocation for Free: The C Machine Stack Cache, in Proc. of the Symposium on Architectural Support for Programming Languages and Operating Systems, March 1982. Google ScholarDigital Library
9 R. J. Eickemeyer and S. Vassiliadis, A Load-Instruction Unit for Pipelined Processors, in IBM Journal of Research and Development, July 93. Google ScholarDigital Library
10 F. Gabbay and A. Mendelson. The Effect of Instruction Fetch Bandwidth on Value Prediction, in Proceeding of the 25th International Symposium on computer Architecture, July, 1998. Google ScholarDigital Library
11 J. Gonzalez and A. Gonzalez, Speculative Execution via Address Prediction and Data Prefetching, in Proceedings of the International Conference on Supercomputing, 1997. Google ScholarDigital Library
12 Pentium Pro Family Developer Manual, Volume 2: Programmer s Reference Manual, Intel Corporation, 1996Google Scholar
13 S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, A. Yoaz, A Novel Renaming Scheme to Exploit Value Temporal Locality Through Physical Register Reuse and Unification, in Proceedings of the 31st Annual International Symposium on Microarchitecture, November 1998. Google ScholarDigital Library
14 M. H. Lipasti, C. B. Wilkerson, and J. P. Shen, Value Locality and Load Value Prediction, in Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996. Google ScholarDigital Library
15 A. I. Moshovos, S. E. Breach, T. N. Vijaykumar, and G. S. Sohi, Speculation and Synchronization of Data Dependencies, in Proceedings of the 24th International Symposium on Computer Architecture, June 1997. Google ScholarDigital Library
16 A. I. Moshovos and G. S. Sohi, Streamlining Inter-operation Memory Communication via Data Dependence Prediction, in Proceedings of the 30th Annual international Symposium on Microarchitecture, December 1997. Google ScholarDigital Library
17 E. Rotenberg, S. Bennett, and J. Smith, Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching, in Proceedings of the 29th International Symposium on Microarchitecture, December 1996. Google ScholarDigital Library
18 R. Valentine, G. Sheaffer, R. Ronen, I. Spillinger and A. Yoaz, Out-of-order Superscalar Microprocessor with a Renaming Device that Maps Instructions from Memory to Registers, U.S. Patent 5,838,941, November 1998.Google Scholar
19 A. Yoaz, M. Erez, R. Ronen, and S. Jourdan, Speculation Techniques for Improving Load Related Instruction Scheduling, in Proceedings of the 26th Annual International Symposium on Computer Architecture, May 1999. Google ScholarDigital Library

Index Terms

Early load address resolution via register tracking
1. Hardware
  1. Hardware validation
  2. Robustness
    1. Fault tolerance
    2. Hardware reliability

Recommendations

Early load address resolution via register tracking
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-...
Read More
Address-Value Decoupling for Early Register Deallocation
ICPP '06: Proceedings of the 2006 International Conference on Parallel Processing

We propose a series of aggressive register deallocation mechanisms to reduce the register file pressure and increase the parallelism exploited by superscalar microprocessors. Our techniques are based on a key observation that a register value can be ...
Read More
Speculative register promotion using Advanced Load Address Table (ALAT)
CGO '03: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

The pervasive use of pointers with complicated patterns in C programs often constrains compiler alias analysis to yield conservative register allocation and promotion. Speculative register promotion with hardware support has the potential to more ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 28, Issue 2
Special Issue: Proceedings of the 27th annual international symposium on Computer architecture (ISCA '00)
May 2000
325 pages
ISSN:0163-5964
DOI:10.1145/342001
Chairmen:
Alan Berenbaum
Lucent Technologies, Berkeley Heights, NJ
,
Joel Emer
Compaq Computer Corp., Palo Alto, CA
Issue’s Table of Contents
ISCA '00: Proceedings of the 27th annual international symposium on Computer architecture
June 2000
327 pages
ISBN:1581132328
DOI:10.1145/339647
Chairmen:
Alan Berenbaum
Lucent Technologies
,
Joel Emer
Compaq Computer Corp.
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2000
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 704
  Total Downloads
- Downloads (Last 12 months)86
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Early load address resolution via register tracking

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Early load address resolution via register tracking

Address-Value Decoupling for Early Register Deallocation

Speculative register promotion using Advanced Load Address Table (ALAT)