skip to main content
10.1145/2830772.2830793acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Efficiently prefetching complex address patterns

Published:05 December 2015Publication History

ABSTRACT

Prior work in hardware prefetching has focused mostly on either predicting regular streams with uniform strides, or predicting irregular access patterns at the cost of large hardware structures. This paper introduces the Variable Length Delta Prefetcher (VLDP), which builds up delta histories between successive cache line misses within physical pages, and then uses these histories to predict the order of cache line misses in new pages. One of VLDP's distinguishing features is its use of multiple prediction tables, each of which stores predictions based on a different length of input history. For example, the first prediction table takes as input only the single most recent delta between cache misses within a page, and attempts to predict the next cache miss in that page. The second prediction table takes as input a sequence of the two most recent deltas between cache misses within a page, and also attempts to predict the next cache miss in that page, and so on with additional tables. Longer histories generally yield more accurate predictions, so VLDP prefers to make predictions based on the longest history table that has a matching entry.

Using a global history of patterns it has seen in the past, VLDP is able to issue prefetches without having to wait for additional per-page confirmation, and it is even able to prefetch patterns that show no repetition within a physical page. VLDP does not use the program counter (PC) to make its predictions, but our evaluation shows that it out-performs the highest-performing PC-based prefetcher by 7.1%, and the highest performing prefetcher that doesn't employ the PC by 5.8%.

References

  1. Y. Ishii, M. Inaba, and K. Hiraki, "Access Map Pattern Matching for High Performance Data Cache Prefetch," The Journal of Instruction-Level Parallelism, vol. 13, January 2011.Google ScholarGoogle Scholar
  2. S. Srinath, O. Mutlu, H. Kim, and Y. Patt, "Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers," in Proceedings of HPCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Pugsley, Z. Chishti, C. Wilkerson, T. Chuang, R. Scott, A. Jaleel, S.-L. Lu, K. Chow, and R. Balasubramonian, "Sandbox Prefetching: Safe, Run-Time Evaluation of Aggressive Prefetchers," in Proceedings of HPCA, 2014.Google ScholarGoogle Scholar
  4. K. Nesbit and J. E. Smith, "Data Cache Prefetching Using a Global History Buffer," in Proceedings HPCA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Somogyi, T. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Spatial Memory Streaming," in Proceedings of ISCA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Seznec and P. Michaud, "A case for (partially) TAgged GEometric history length branch predictor," Journal of Instruction-Level Parallelism, vol. 8, 2006.Google ScholarGoogle Scholar
  7. "Wind River Simics Full System Simulator," 2007. http://www.windriver.com/products/simics/.Google ScholarGoogle Scholar
  8. N. Chatterjee, R. Balasubramonian, M. Shevgoor, S. Pugsley, A. Udipi, A. Shafiee, K. Sudan, M. Awasthi, and Z. Chishti, "USIMM: the Utah SImulated Memory Module," tech. rep., University of Utah, 2012. UUCS-12-002.Google ScholarGoogle Scholar
  9. "Micron DDR3 SDRAM Part MT41J1G4," 2009.Google ScholarGoogle Scholar
  10. C. Wu, A. Jaleel, M. Martonosi, S. Steely, and J. Emer, "PACMan: Prefetch-Aware Cache Management for High Performance Caching," in Proceedings of MICRO-44, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," in Proceedings of ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Michaud, "A Best-Offset Prefetcher," in Data Prefetching Championship, 2015.Google ScholarGoogle Scholar
  13. V. Jimenez, R. Gioiosa, F. Cazorla, A. Buyuktosunoglu, P. Bose, and F. O'Connell, "Making Data Prefetch Smarter: Adaptive Prefetching on POWER7," in Proceedings of PACT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Baer and T. Chen, "An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty," in Proceedings of Supercomputing, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," in Proceedings of ISCA-17, pp. 364--373, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Palacharla and R. Kessler, "Evaluating Stream Buffers as a Secondary Cache Replacement," in Proceedings of ISCA-21, pp. 24--33, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Ebrahimi, O. Mutlu, and Y. Patt, "Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems," in Proceedings of HPCA, 2009.Google ScholarGoogle Scholar
  18. A. Roth, A. Moshovos, and G. Sohi, "Dependence Based Prefetching for Linked Data Structures," in Proceedings of ASPLOS VIII, pp. 115--126, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Joseph and D. Grunwald, "Prefetching Using Markov Predictors," in Proceedings of ISCA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Dahlgren, M. Dubois, and P. Stenstrom, "Sequential Hardware Prefetching in Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Nesbit, A. Dhodapkar, and J. Smith, "AC/DC: An Adaptive Data Cache Prefetcher," in Proceedings of PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. I. Hur and C. Lin, "Memory Prefetching Using Adaptive Stream Detection," in Proceedings of MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Iacobovici, L. Spracklen, S. Kadambi, Y. Chou, and S. Abraham, "Effective Stream-Based and Execution-Based Data Prefetching," in Proceedings of ICS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Dahlgren and P. Stenstrom, "Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors," IEEE Transactions on Parallel and Distributed Systems, vol. 7(4), pp. 385--395, April 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy, "Power4 System Microarchitecture," tech. rep., Technical White Paper, IBM, October 2001.Google ScholarGoogle Scholar
  26. J. Fu, J. Patel, and B. Janssens, "Stride Directed Prefetching in Scalar Processors," in Proceedings of MICRO-25, pp. 102--110, December 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Kumar and C. Wilkerson, "Exploiting Spatial Locality in Data Caches Using Spatial Footprints," in Proceedings of ISCA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Wenisch, S. Somogyi, N. Hardavellas, J. Kim, A. Ailamaki, and B. Falsafi, "Temporal Streaming of Shared Memory," in Proceedings of ISCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Somogyi, T. Wenisch, A. Ailamaki, and B. Falsafi, "Spatio-Temporal Memory Streaming," in Proceedings of ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Jain and C. Lin, "Linearizing Irregular Memory Accesses for Improved Correlated Prefetching," in Proceedings of MICRO, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficiently prefetching complex address patterns

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
      December 2015
      787 pages
      ISBN:9781450340342
      DOI:10.1145/2830772

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 December 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Author Tags

      Qualifiers

      • research-article

      Acceptance Rates

      MICRO-48 Paper Acceptance Rate61of283submissions,22%Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader