ABSTRACT
Prior work in hardware prefetching has focused mostly on either predicting regular streams with uniform strides, or predicting irregular access patterns at the cost of large hardware structures. This paper introduces the Variable Length Delta Prefetcher (VLDP), which builds up delta histories between successive cache line misses within physical pages, and then uses these histories to predict the order of cache line misses in new pages. One of VLDP's distinguishing features is its use of multiple prediction tables, each of which stores predictions based on a different length of input history. For example, the first prediction table takes as input only the single most recent delta between cache misses within a page, and attempts to predict the next cache miss in that page. The second prediction table takes as input a sequence of the two most recent deltas between cache misses within a page, and also attempts to predict the next cache miss in that page, and so on with additional tables. Longer histories generally yield more accurate predictions, so VLDP prefers to make predictions based on the longest history table that has a matching entry.
Using a global history of patterns it has seen in the past, VLDP is able to issue prefetches without having to wait for additional per-page confirmation, and it is even able to prefetch patterns that show no repetition within a physical page. VLDP does not use the program counter (PC) to make its predictions, but our evaluation shows that it out-performs the highest-performing PC-based prefetcher by 7.1%, and the highest performing prefetcher that doesn't employ the PC by 5.8%.
- Y. Ishii, M. Inaba, and K. Hiraki, "Access Map Pattern Matching for High Performance Data Cache Prefetch," The Journal of Instruction-Level Parallelism, vol. 13, January 2011.Google Scholar
- S. Srinath, O. Mutlu, H. Kim, and Y. Patt, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," in Proceedings of HPCA, 2007. Google ScholarDigital Library
- S. Pugsley, Z. Chishti, C. Wilkerson, T. Chuang, R. Scott, A. Jaleel, S.-L. Lu, K. Chow, and R. Balasubramonian, "Sandbox Prefetching: Safe, Run-Time Evaluation of Aggressive Prefetchers," in Proceedings of HPCA, 2014.Google Scholar
- K. Nesbit and J. E. Smith, "Data Cache Prefetching Using a Global History Buffer," in Proceedings HPCA, 2004. Google ScholarDigital Library
- S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial Memory Streaming," in Proceedings of ISCA, 2006. Google ScholarDigital Library
- A. Seznec and P. Michaud, "A case for (partially) TAgged GEometric history length branch predictor," Journal of Instruction-Level Parallelism, vol. 8, 2006.Google Scholar
- "Wind River Simics Full System Simulator," 2007. http://www.windriver.com/products/simics/.Google Scholar
- N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti, "USIMM: the Utah SImulated Memory Module," tech. rep., University of Utah, 2012. UUCS-12-002.Google Scholar
- "Micron DDR3 SDRAM Part MT41J1G4," 2009.Google Scholar
- C. Wu, A. Jaleel, M. Martonosi, S. Steely, and J. Emer, "PACMan: Prefetch-Aware Cache Management for High Performance Caching," in Proceedings of MICRO-44, 2011. Google ScholarDigital Library
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," in Proceedings of ASPLOS, 2012. Google ScholarDigital Library
- P. Michaud, "A Best-Offset Prefetcher," in Data Prefetching Championship, 2015.Google Scholar
- V. Jimenez, R. Gioiosa, F. Cazorla, A. Buyuktosunoglu, P. Bose, and F. O'Connell, "Making Data Prefetch Smarter: Adaptive Prefetching on POWER7," in Proceedings of PACT, 2012. Google ScholarDigital Library
- J. Baer and T. Chen, "An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty," in Proceedings of Supercomputing, 1991. Google ScholarDigital Library
- N. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," in Proceedings of ISCA-17, pp. 364--373, May 1990. Google ScholarDigital Library
- S. Palacharla and R. Kessler, "Evaluating Stream Buffers as a Secondary Cache Replacement," in Proceedings of ISCA-21, pp. 24--33, April 1994. Google ScholarDigital Library
- E. Ebrahimi, O. Mutlu, and Y. Patt, "Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems," in Proceedings of HPCA, 2009.Google Scholar
- A. Roth, A. Moshovos, and G. Sohi, "Dependence Based Prefetching for Linked Data Structures," in Proceedings of ASPLOS VIII, pp. 115--126, October 1998. Google ScholarDigital Library
- D. Joseph and D. Grunwald, "Prefetching Using Markov Predictors," in Proceedings of ISCA, 1997. Google ScholarDigital Library
- F. Dahlgren, M. Dubois, and P. Stenstrom, "Sequential Hardware Prefetching in Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, 1995. Google ScholarDigital Library
- K. Nesbit, A. Dhodapkar, and J. Smith, "AC/DC: An Adaptive Data Cache Prefetcher," in Proceedings of PACT, 2004. Google ScholarDigital Library
- I. Hur and C. Lin, "Memory Prefetching Using Adaptive Stream Detection," in Proceedings of MICRO, 2006. Google ScholarDigital Library
- S. Iacobovici, L. Spracklen, S. Kadambi, Y. Chou, and S. Abraham, "Effective Stream-Based and Execution-Based Data Prefetching," in Proceedings of ICS, 2004. Google ScholarDigital Library
- F. Dahlgren and P. Stenstrom, "Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 7(4), pp. 385--395, April 1999. Google ScholarDigital Library
- J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy, "Power4 System Microarchitecture," tech. rep., Technical White Paper, IBM, October 2001.Google Scholar
- J. Fu, J. Patel, and B. Janssens, "Stride Directed Prefetching in Scalar Processors," in Proceedings of MICRO-25, pp. 102--110, December 1992. Google ScholarDigital Library
- S. Kumar and C. Wilkerson, "Exploiting Spatial Locality in Data Caches Using Spatial Footprints," in Proceedings of ISCA, 1998. Google ScholarDigital Library
- T. Wenisch, S. Somogyi, N. Hardavellas, J. Kim, A. Ailamaki, and B. Falsafi, "Temporal Streaming of Shared Memory," in Proceedings of ISCA, 2005. Google ScholarDigital Library
- S. Somogyi, T. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-Temporal Memory Streaming," in Proceedings of ISCA, 2009. Google ScholarDigital Library
- A. Jain and C. Lin, "Linearizing Irregular Memory Accesses for Improved Correlated Prefetching," in Proceedings of MICRO, 2013. Google ScholarDigital Library
Index Terms
- Efficiently prefetching complex address patterns
Recommendations
Stealth prefetching
Proceedings of the 2006 ASPLOS ConferencePrefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systemsPrefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Stealth prefetching
Proceedings of the 2006 ASPLOS ConferencePrefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Comments