skip to main content
10.1145/3445814.3446752acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

A hierarchical neural model of data prefetching

Published:17 April 2021Publication History

ABSTRACT

This paper presents Voyager, a novel neural network for data prefetching. Unlike previous neural models for prefetching, which are limited to learning delta correlations, our model can also learn address correlations, which are important for prefetching irregular sequences of memory accesses. The key to our solution is its hierarchical structure that separates addresses into pages and offsets and that introduces a mechanism for learning important relations among pages and offsets. Voyager provides significant prediction benefits over current data prefetchers. For a set of irregular programs from the SPEC 2006 and GAP benchmark suites, Voyager sees an average IPC improvement of 41.6% over a system with no prefetcher, compared with 21.7% and 28.2%, respectively, for idealized Domino and ISB prefetchers. We also find that for two commercial workloads for which current data prefetchers see very little benefit, Voyager dramatically improves both accuracy and coverage. At present, slow training and prediction preclude neural models from being practically used in hardware, but Voyager’s overheads are significantly lower—in every dimension—than those of previous neural models. For example, computation cost is reduced by 15- 20×, and storage overhead is reduced by 110-200×. Thus, Voyager represents a significant step towards a practical neural prefetcher.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, et al. Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265-283, 2016.Google ScholarGoogle Scholar
  2. Jean-Loup Baer and Tien-Fu Chen. Efective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44 ( 5 ): 609-623, May 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 131-142, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  4. Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399-411, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  5. Scott Beamer, Krste Asanovi?, and David Patterson. The GAP benchmark suite. arXiv preprint arXiv:1508.03619, 2015.Google ScholarGoogle Scholar
  6. Derek Bruening, Timothy Garnett, and Saman Amarasinghe. An infrastructure for adaptive dynamic optimization. In International Symposium on Code Generation and Optimization, 2003. CGO 2003., pages 265-275. IEEE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  7. Doug Burger, Thomas R. Puzak, Wei-Fen Lin, and Steven K. Reinhardt. Filtering superfluous prefetches using density vectors. In Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors (ICCD), pages 124-133, 2001.Google ScholarGoogle Scholar
  8. Chi F. Chen, Se-Hyun Yang, Babak Falsafi, and Andreas Moshovos. Accurate and complexity-efective spatial pattern prediction. In Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA), pages 276-288, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Trishul M. Chilimbi. Eficient representations and abstractions for quantifying and exploiting data reference locality. In SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 191-202, 2001.Google ScholarGoogle Scholar
  10. Yuan Chou. Low-cost epoch-based correlation prefetching for commercial applications. In Proceedings of the 40th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 301-313, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Keith I. Farkas, Paul Chow, Norman P. Jouppi, and Zvonko Vranesic. Memorysystem design considerations for dynamically-scheduled processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 133-143, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. Simpoint 3. 0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism, 7 ( 4 ): 1-28, 2005.Google ScholarGoogle Scholar
  13. Milad Hashemi, Kevin Swersky, Jamie A Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. arXiv preprint arXiv:1803.02329, 2018.Google ScholarGoogle Scholar
  14. Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9 ( 8 ): 1735-1780, 1997.Google ScholarGoogle Scholar
  15. Zhigang Hu, Margaret Martonosi, and Stefanos Kaxiras. TCP: tag correlating prefetchers. In International Symposium on, High Performance Computer Architecture (HPCA), pages 317-326, 2003.Google ScholarGoogle Scholar
  16. Ibrahim Hur and Calvin Lin. Memory prefetching using adaptive stream detection. In Proceedings of the 39th International Symposium on Microarchitecture (MICRO), pages 397-408, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yasuo Ishii, Mary Inaba, and Kei Hiraki. Access map pattern matching for high performance data cache prefetch. Journal of Instruction-Level Parallelism, 13 : 1-24, 2011.Google ScholarGoogle Scholar
  18. Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geofrey E. Hinton. Adaptive mixtures of local experts. Neural computation, 3 ( 1 ): 79-87, 1991.Google ScholarGoogle Scholar
  19. Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 247-259, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Akanksha Jain and Calvin Lin. Back to the future: Leveraging belady's algorithm for improved cache replacement. In Proceedings of the International Symposium on Computer Architecture (ISCA), June 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Aamer Jaleel, Robert S Cohn, Chi-Keung Luk, and Bruce Jacob. Cmp$im: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA, pages 28-36, 2008.Google ScholarGoogle Scholar
  22. Daniel A Jiménez. Multiperspective perceptron predictor. In The Journal of Instruction-Level Parallelism 5th JILP Workshop on Computer Architecture Competitions (JWAC-5), Championship Branch Prediction, (co-located with ISCA 2016 ), 2016.Google ScholarGoogle Scholar
  23. Daniel A Jiménez and Calvin Lin. Dynamic branch prediction with perceptrons. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA), pages 197-206, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  24. Daniel A Jiménez and Elvira Teran. Multiperspective reuse prediction. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 436-448. IEEE, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Teresa L. Johnson, Matthew C. Merten, and Wen-Mei W. Hwu. Run-time spatial locality detection and optimization. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 57-64, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Doug Joseph and Dirk Grunwald. Prefetching using markov predictors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 252-263, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch bufers. In International Symposium on Computer Architecture (ISCA), pages 364-373, 1990.Google ScholarGoogle Scholar
  28. Samira Khan, Yingying Tian, and Daniel A Jiménez. Sampling dead block prediction for last-level caches. In 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 175-186, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jinchun Kim, Seth H Pugsley, Paul V Gratz, AL Reddy, Chris Wilkerson, and Zeshan Chishti. Path confidence based lookahead prefetching. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture, page 60. IEEE Press, 2016.Google ScholarGoogle Scholar
  30. Tim Kraska, Alex Beutel, Ed H. Chi, Jef Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sanjeev Kumar and Christopher Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 357-368, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  32. Pierre Michaud. Best-ofset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469-480, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  33. Pierre Michaud. Best-ofset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469-480, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  34. Jinseok Nam, Jungi Kim, Eneldo Loza Mencía, Iryna Gurevych, and Johannes Fürnkranz. Large-scale multi-label text classification-revisiting neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 437-452, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kyle J. Nesbit, Ashutosh S. Dhodapkar, and James E. Smith. AC/DC: an adaptive data cache prefetcher. In 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 135-145, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  36. Kyle J. Nesbit and James E. Smith. Data cache prefetching using a global history bufer. IEEE Micro, 25 ( 1 ): 90-97, 2005.Google ScholarGoogle Scholar
  37. Subbarao Palacharla and Richard E. Kessler. Evaluating stream bufers as a secondary cache replacement. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 24-33, April 1994.Google ScholarGoogle ScholarCross RefCross Ref
  38. Leeor Peled, Shie Mannor, Uri Weiser, and Yoav Etsion. Semantic locality and context-based prefetching using reinforcement learning. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pages 285-297, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Leeor Peled, Uri Weiser, and Yoav Etsion. A neural network prefetcher for arbitrary memory access patterns. ACM Transactions on Architecture and Code Optimization (TACO), page 37, 2019.Google ScholarGoogle Scholar
  40. Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.Google ScholarGoogle ScholarCross RefCross Ref
  41. Suleyman Sair, Timothy Sherwood, and Brad Calder. A decoupled predictordirected stream prefetching architecture. IEEE Transactions on Computers, 52 ( 3 ): 260-276, March 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chisthi. Eficiently prefetching complex address patterns. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO), pages 141-152, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. Applying deep learning to the cache replacement problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 413-425, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A.J. Smith. Sequential program prefetching in memory hierarchies. IEEE Transactions on Computers, 11 ( 12 ): 7-12, December 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yan Solihin, Jaejin Lee, and Josep Torrellas. Using a user-level memory thread for correlation prefetching. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), pages 171-182, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  46. Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, and Babak Falsafi. Spatio-temporal memory streaming. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 69-80, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. Spatial memory streaming. In Proceedings of the 33th Annual International Symposium on Computer Architecture (ISCA), pages 252-263, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ajitesh Srivastava, Angelos Lazaris, Benjamin Brooks, Rajgopal Kannan, and Viktor K. Prasanna. Predicting memory accesses: The road to compact ml-driven prefetcher. In Proceedings of the International Symposium on Memory Systems (MEMSYS), pages 461-470, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Stephen J Tarsa, Chit-Kwan Lin, Gokce Keskin, Gautham Chinya, and Hong Wang. Improving branch prediction by modeling global history with convolutional neural networks. arXiv preprint arXiv:1906.09889, 2019.Google ScholarGoogle Scholar
  50. Elvira Teran, Zhe Wang, and Daniel A Jiménez. Perceptron learning for reuse prediction. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1-12, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  51. Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 3 ( 3 ): 1-13, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  52. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998-6008, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Temporal streams in commercial server applications. In IEEE International Symposium on Workload Characterization, pages 99-108, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  54. Thomas F Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Practical of-chip meta-data for temporal memory streaming. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture (HPCA), pages 79-90, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  55. Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. SHiP: Signature-based hit predictor for high performance caching. In 44th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 430-441, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Hao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, and Calvin Lin. Temporal prefetching without the of-chip metadata. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 996-1008, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, and Calvin Lin. Eficient metadata management for irregular data prefetching. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA), pages 449-461, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Siavash Zangeneh, Stephen Pruett, Sangkug Lym, and Yale N Patt. Branchnet : A convolutional neural network to predict hard-to-predict branches. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 118-130, 2020.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A hierarchical neural model of data prefetching

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
      April 2021
      1090 pages
      ISBN:9781450383172
      DOI:10.1145/3445814

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 April 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate535of2,713submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader