research-article

Efficiently prefetching complex address patterns

Authors:
Manjunath Shevgoor

University of Utah, Salt Lake City, UT

University of Utah, Salt Lake City, UT
View Profile

,
Sahil Koladiya

University of Utah, Salt Lake City, UT

University of Utah, Salt Lake City, UT
View Profile

,
Rajeev Balasubramonian

University of Utah, Salt Lake City, UT

University of Utah, Salt Lake City, UT
View Profile

,
Chris Wilkerson

Intel Labs, Hillsboro, OR

Intel Labs, Hillsboro, OR
View Profile

,
Seth H. Pugsley

Intel Labs, Hillsboro, OR

Intel Labs, Hillsboro, OR
View Profile

,
Zeshan Chishti

Intel Labs, Hillsboro, OR

Intel Labs, Hillsboro, OR
View Profile

MICRO-48: Proceedings of the 48th International Symposium on MicroarchitectureDecember 2015Pages 141–152https://doi.org/10.1145/2830772.2830793

Published:05 December 2015Publication History

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

Pages 141–152

ABSTRACT

Prior work in hardware prefetching has focused mostly on either predicting regular streams with uniform strides, or predicting irregular access patterns at the cost of large hardware structures. This paper introduces the Variable Length Delta Prefetcher (VLDP), which builds up delta histories between successive cache line misses within physical pages, and then uses these histories to predict the order of cache line misses in new pages. One of VLDP's distinguishing features is its use of multiple prediction tables, each of which stores predictions based on a different length of input history. For example, the first prediction table takes as input only the single most recent delta between cache misses within a page, and attempts to predict the next cache miss in that page. The second prediction table takes as input a sequence of the two most recent deltas between cache misses within a page, and also attempts to predict the next cache miss in that page, and so on with additional tables. Longer histories generally yield more accurate predictions, so VLDP prefers to make predictions based on the longest history table that has a matching entry.

Using a global history of patterns it has seen in the past, VLDP is able to issue prefetches without having to wait for additional per-page confirmation, and it is even able to prefetch patterns that show no repetition within a physical page. VLDP does not use the program counter (PC) to make its predictions, but our evaluation shows that it out-performs the highest-performing PC-based prefetcher by 7.1%, and the highest performing prefetcher that doesn't employ the PC by 5.8%.

References

Y. Ishii, M. Inaba, and K. Hiraki, "Access Map Pattern Matching for High Performance Data Cache Prefetch," The Journal of Instruction-Level Parallelism, vol. 13, January 2011.Google Scholar
S. Srinath, O. Mutlu, H. Kim, and Y. Patt, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," in Proceedings of HPCA, 2007. Google ScholarDigital Library
S. Pugsley, Z. Chishti, C. Wilkerson, T. Chuang, R. Scott, A. Jaleel, S.-L. Lu, K. Chow, and R. Balasubramonian, "Sandbox Prefetching: Safe, Run-Time Evaluation of Aggressive Prefetchers," in Proceedings of HPCA, 2014.Google Scholar
K. Nesbit and J. E. Smith, "Data Cache Prefetching Using a Global History Buffer," in Proceedings HPCA, 2004. Google ScholarDigital Library
S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial Memory Streaming," in Proceedings of ISCA, 2006. Google ScholarDigital Library
A. Seznec and P. Michaud, "A case for (partially) TAgged GEometric history length branch predictor," Journal of Instruction-Level Parallelism, vol. 8, 2006.Google Scholar
"Wind River Simics Full System Simulator," 2007. http://www.windriver.com/products/simics/.Google Scholar
N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti, "USIMM: the Utah SImulated Memory Module," tech. rep., University of Utah, 2012. UUCS-12-002.Google Scholar
"Micron DDR3 SDRAM Part MT41J1G4," 2009.Google Scholar
C. Wu, A. Jaleel, M. Martonosi, S. Steely, and J. Emer, "PACMan: Prefetch-Aware Cache Management for High Performance Caching," in Proceedings of MICRO-44, 2011. Google ScholarDigital Library
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," in Proceedings of ASPLOS, 2012. Google ScholarDigital Library
P. Michaud, "A Best-Offset Prefetcher," in Data Prefetching Championship, 2015.Google Scholar
V. Jimenez, R. Gioiosa, F. Cazorla, A. Buyuktosunoglu, P. Bose, and F. O'Connell, "Making Data Prefetch Smarter: Adaptive Prefetching on POWER7," in Proceedings of PACT, 2012. Google ScholarDigital Library
J. Baer and T. Chen, "An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty," in Proceedings of Supercomputing, 1991. Google ScholarDigital Library
N. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," in Proceedings of ISCA-17, pp. 364--373, May 1990. Google ScholarDigital Library
S. Palacharla and R. Kessler, "Evaluating Stream Buffers as a Secondary Cache Replacement," in Proceedings of ISCA-21, pp. 24--33, April 1994. Google ScholarDigital Library
E. Ebrahimi, O. Mutlu, and Y. Patt, "Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems," in Proceedings of HPCA, 2009.Google Scholar
A. Roth, A. Moshovos, and G. Sohi, "Dependence Based Prefetching for Linked Data Structures," in Proceedings of ASPLOS VIII, pp. 115--126, October 1998. Google ScholarDigital Library
D. Joseph and D. Grunwald, "Prefetching Using Markov Predictors," in Proceedings of ISCA, 1997. Google ScholarDigital Library
F. Dahlgren, M. Dubois, and P. Stenstrom, "Sequential Hardware Prefetching in Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, 1995. Google ScholarDigital Library
K. Nesbit, A. Dhodapkar, and J. Smith, "AC/DC: An Adaptive Data Cache Prefetcher," in Proceedings of PACT, 2004. Google ScholarDigital Library
I. Hur and C. Lin, "Memory Prefetching Using Adaptive Stream Detection," in Proceedings of MICRO, 2006. Google ScholarDigital Library
S. Iacobovici, L. Spracklen, S. Kadambi, Y. Chou, and S. Abraham, "Effective Stream-Based and Execution-Based Data Prefetching," in Proceedings of ICS, 2004. Google ScholarDigital Library
F. Dahlgren and P. Stenstrom, "Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 7(4), pp. 385--395, April 1999. Google ScholarDigital Library
J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy, "Power4 System Microarchitecture," tech. rep., Technical White Paper, IBM, October 2001.Google Scholar
J. Fu, J. Patel, and B. Janssens, "Stride Directed Prefetching in Scalar Processors," in Proceedings of MICRO-25, pp. 102--110, December 1992. Google ScholarDigital Library
S. Kumar and C. Wilkerson, "Exploiting Spatial Locality in Data Caches Using Spatial Footprints," in Proceedings of ISCA, 1998. Google ScholarDigital Library
T. Wenisch, S. Somogyi, N. Hardavellas, J. Kim, A. Ailamaki, and B. Falsafi, "Temporal Streaming of Shared Memory," in Proceedings of ISCA, 2005. Google ScholarDigital Library
S. Somogyi, T. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-Temporal Memory Streaming," in Proceedings of ISCA, 2009. Google ScholarDigital Library
A. Jain and C. Lin, "Linearizing Irregular Memory Accesses for Improved Correlated Prefetching," in Proceedings of MICRO, 2013. Google ScholarDigital Library

Index Terms

Efficiently prefetching complex address patterns
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Read More
Stealth prefetching
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Read More
Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
December 2015
787 pages
ISBN:9781450340342
DOI:10.1145/2830772
General Chair:
Milos Prvulovic
Georgia Tech
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 December 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
prefetching
Qualifiers
- research-article
Conference

Acceptance Rates
MICRO-48 Paper Acceptance Rate61of283submissions,22%Overall Acceptance Rate484of2,242submissions,22%
More
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 113
  Total Citations
  View Citations
- 685
  Total Downloads
- Downloads (Last 12 months)110
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficiently prefetching complex address patterns

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stealth prefetching

Stealth prefetching

Stealth prefetching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficiently prefetching complex address patterns

MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Stealth prefetching

Stealth prefetching

Stealth prefetching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media