article

Free Access

CAVA: Using checkpoint-assisted value prediction to hide L2 misses

Authors:
Luis Ceze

University of Illinois at Urbana--Champaign, Urbana-Champaign, IL

University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
View Profile

,
Karin Strauss

University of Illinois at Urbana--Champaign, Urbana-Champaign, IL

University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
View Profile

,
James Tuck

University of Illinois at Urbana--Champaign, Urbana-Champaign, IL

University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
View Profile

,
Josep Torrellas

University of Illinois at Urbana--Champaign, Urbana-Champaign, IL

University of Illinois at Urbana--Champaign, Urbana-Champaign, IL
View Profile

,
Jose Renau

University of California, Santa Cruz, Santa Cruz, CA

University of California, Santa Cruz, Santa Cruz, CA
View Profile

ACM Transactions on Architecture and Code Optimization Volume 3 Issue 2pp 182–208https://doi.org/10.1145/1138035.1138038

Published:01 June 2006Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Modern superscalar processors often suffer long stalls because of load misses in on-chip L2 caches. To address this problem, we propose hiding L2 misses with Checkpoint-Assisted VAlue prediction (CAVA). On an L2 cache miss, a predicted value is returned to the processor. When the missing load finally reaches the head of the ROB, the processor checkpoints its state, retires the load, and speculatively uses the predicted value and continues execution. When the value in memory arrives at the L2 cache, it is compared to the predicted value. If the prediction was correct, speculation has succeeded and execution continues; otherwise, execution is rolled back and restarted from the checkpoint. CAVA uses fast checkpointing, speculative buffering, and a modest-sized value prediction structure that has about 50% accuracy. Compared to an aggressive superscalar processor, CAVA speeds up execution by up to 1.45 for SPECint applications and 1.58 for SPECfp applications, with a geometric mean of 1.14 for SPECint and 1.34 for SPECfp applications. We also evaluate an implementation of Runahead execution---a previously proposed scheme that does not perform value prediction and discards all work done between checkpoint and data reception from memory. Runahead execution speeds up execution by a geometric mean of 1.07 for SPECint and 1.18 for SPECfp applications, compared to the same baseline.

References

Akkary, H., Rajwar, R., and Srinivasan, S. T. 2003. Checkpoint processing and recovery: Towards scalable large instruction window processors. In Proceedings of the 36th International Symposium on Microarchitecture. Google Scholar
Baer, J. and Chen, T. 1991. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of the 1991 International Conference on Supercomputing. Google Scholar
Burtscher, M. and Zorn, B. G. 1999. Exploring last n value prediction. In Proceedings of the 1999 Parallel Architectures and Compilation Techniques. Google Scholar
Callahan, D., Kennedy, K., and Porterfield, A. 1991. Software prefetching. In Proceedings of the 4th Architectural Support for Programming Languages and Operating Systems. Google Scholar
Cooksey, R. 2002. Content-sensitive data prefetching. Ph.D. thesis, University of Colorado, Boulder. Google Scholar
Cristal, A., Ortega, D., Llosa, J., and Valero, M. 2004. Out-of-order commit processors. In Proceedings of the 10th High Performance Computer Architecture. Google Scholar
Farkas, K. I. and Jouppi, N. P. 1994. Complexity/performance tradeoffs with non-blocking loads. In Proceedings of the 21st International Symposium on Computer Architecture. Google Scholar
Goeman, B., Vandierendonck, H., and Bosschere, K. D. 2001. Differential FCM: Increasing value prediction accuracy by improving usage efficiency. In Proceedings of the 7th High Performance Computer Architecture. Google Scholar
Hammond, L., Willey, M., and Olukotun, K. 1998. Data speculation support for a chip multiprocessor. In Proceedings of the 8th Architectural Support for Programming Languages and Operating Systems. Google Scholar
Joseph, D. and Grunwald, D. 1997. Prefetching using markov predictors. In Proceedings of the 24th International Symposium on Computer Architecture. Google Scholar
Jouppi, N. P. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th International Symposium on Computer Architecture. Google Scholar
Kirman, N., Kirman, M., Chaudhuri, M., and Martinez, J. F. 2005. Checkpointed early load retirement. In Proceedings of the 11th High Performance Computer Architecture. Google Scholar
Krishnan, V. and Torrellas, J. 1999. A chip-multiprocessor architecture with speculative multithreading. IEEE Transactions on Computers, 866--880. Google Scholar
Kroft, D. 1981. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th International Symposium on Computer Architecture. Google Scholar
Lebeck, A. R., Koppanalil, J., Li, T., Patwardhan, J., and Rotenberg, E. 2002. A large, fast instruction window for tolerating cache misses. In Proceedings of the 29th International Symposium on Computer Architecture. Google Scholar
Lipasti, M. H., Wilkerson, C. B., and Shen, J. P. 1996. Value locality and load value prediction. In Proceedings of the 7th Architectural Support for Programming Languages and Operating Systems. Google Scholar
Luk, C.-K. and Mowry, T. C. 1996. Compiler-based prefetching for recursive data structures. In Proceedings of the 7th Architectural Support for Programming Languages and Operating Systems. Google Scholar
Martin, M. M. K., Sorin, D. J., Cain, H. W., Hill, M. D., and Lipasti, M. H. 2001. Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing. In Proceedings of the 34th International Symposium on Microarchitecture. Google Scholar
Martinez, J. F., Renau, J., Huang, M., Prvulovic, M., and Torrellas, J. 2002. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In Proceedings of the 35th International Symposium on Microarchitecture. Google Scholar
Mutlu, O., Stark, J., Wilkerson, C., and Patt, Y. N. 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proceedings of the 9th High Performance Computer Architecture. Google Scholar
Mutlu, O., Kim, H., Stark, J., and Patt, Y. N. 2005. On reusing the results of pre-executed instructions in a runahead execution processor. Computer Architecture Letters. Google Scholar
Palacharla, S. and Kessler, R. E. 1994. Evaluating stream buffers as a secondary cache replacement. In Proceedings of the 21st International Symposium on Computer Architecture. Google Scholar
Sazeides, Y. and Smith, J. E. 1997. The predictability of data values. In Proceedings of the 30th International Symposium on Microarchitecture. Google Scholar
Sohi, G., Breach, S., and Vijaykumar, T. 1995. Multiscalar processors. In Proceedings of the 22nd International Symposium on Computer Architecture. Google Scholar
Srinivasan, S. T., Rajwar, R., Akkary, H., Gandhi, A., and Upton, M. 2004. Continual flow pipelines. In Proceedings of the 11th Architectural Support for Programming Languages and Operating Systems. Google Scholar
Steffan, J. G., Colohan, C., Zhai, A., and Mowry, T. 2000. A scalable approach to thread-level speculation. In Proceedings of the 27th International Symposium on Computer Architecture. Google Scholar
Tuck, N. and Tullsen, D. 2005. Multithreaded value prediction. In Proceedings of the 11th High Performance Computer Architecture. Google Scholar
Zhou, H. and Conte, T. 2003. Enhancing memory level parallelism via recovery-free value prediction. In Proceedings of the 17th International Conference on Supercomputing. Google Scholar

Index Terms

CAVA: Using checkpoint-assisted value prediction to hide L2 misses
1. Computer systems organization
  1. Architectures

Recommendations

CAVA: Hiding L2 Misses with Checkpoint-Assisted Value Prediction

Load misses in on-chip L2 caches often end up stalling modern superscalars. To address this problem, we propose hiding L2 misses with Checkpoint-Assisted VAlue prediction (CAVA). When a load misses in L2, a predicted value is returned to the processor. ...
Read More
B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors

Computer architecture is beset by two opposing trends. Technology scaling and deep pipelining have led to high memory access latencies; meanwhile, power and energy considerations have revived interest in traditional in-order processors. In-order ...
Read More
Balanced Prefetching Aggressiveness Controller for NoC-based Multiprocessor
SBCCI '14: Proceedings of the 27th Symposium on Integrated Circuits and Systems Design

The performance gap between memory hierarchy and processor is a well-known issue and the prefetching approach is often used to minimize this problem. This technique performs a data prefetch in memory and makes it available in the private cache before ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 3, Issue 2
June 2006
115 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1138035
Issue’s Table of Contents

Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2006
Published in taco Volume 3, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Value prediction
checkpointed processor architectures
memory hierarchies
multiprocessor
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 714
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CAVA: Using checkpoint-assisted value prediction to hide L2 misses

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

CAVA: Hiding L2 Misses with Checkpoint-Assisted Value Prediction

B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors

Balanced Prefetching Aggressiveness Controller for NoC-based Multiprocessor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

CAVA: Using checkpoint-assisted value prediction to hide L2 misses

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

CAVA: Hiding L2 Misses with Checkpoint-Assisted Value Prediction

B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors

Balanced Prefetching Aggressiveness Controller for NoC-based Multiprocessor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media