A hierarchical neural model of data prefetching

Authors:
Zhan Shi

University of Texas at Austin, USA

University of Texas at Austin, USA
View Profile

,
Akanksha Jain

University of Texas at Austin, USA

University of Texas at Austin, USA
View Profile

,
Kevin Swersky

Google, USA

Google, USA
View Profile

,
Milad Hashemi

Google, USA

Google, USA
View Profile

,
Parthasarathy Ranganathan

Google, USA

Google, USA
View Profile

,
Calvin Lin

University of Texas at Austin, USA

University of Texas at Austin, USA
View Profile

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsApril 2021Pages 861–873https://doi.org/10.1145/3445814.3446752

Published:17 April 2021Publication History

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 861–873

ABSTRACT

This paper presents Voyager, a novel neural network for data prefetching. Unlike previous neural models for prefetching, which are limited to learning delta correlations, our model can also learn address correlations, which are important for prefetching irregular sequences of memory accesses. The key to our solution is its hierarchical structure that separates addresses into pages and offsets and that introduces a mechanism for learning important relations among pages and offsets. Voyager provides significant prediction benefits over current data prefetchers. For a set of irregular programs from the SPEC 2006 and GAP benchmark suites, Voyager sees an average IPC improvement of 41.6% over a system with no prefetcher, compared with 21.7% and 28.2%, respectively, for idealized Domino and ISB prefetchers. We also find that for two commercial workloads for which current data prefetchers see very little benefit, Voyager dramatically improves both accuracy and coverage. At present, slow training and prediction preclude neural models from being practically used in hardware, but Voyager’s overheads are significantly lower—in every dimension—than those of previous neural models. For example, computation cost is reduced by 15- 20×, and storage overhead is reduced by 110-200×. Thus, Voyager represents a significant step towards a practical neural prefetcher.

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Geofrey Irving, Michael Isard, et al. Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 265-283, 2016.Google Scholar
Jean-Loup Baer and Tien-Fu Chen. Efective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44 ( 5 ): 609-623, May 1995.Google ScholarDigital Library
Mohammad Bakhshalipour, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Domino temporal data prefetcher. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 131-142, 2018.Google ScholarCross Ref
Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399-411, 2019.Google ScholarCross Ref
Scott Beamer, Krste Asanovi?, and David Patterson. The GAP benchmark suite. arXiv preprint arXiv:1508.03619, 2015.Google Scholar
Derek Bruening, Timothy Garnett, and Saman Amarasinghe. An infrastructure for adaptive dynamic optimization. In International Symposium on Code Generation and Optimization, 2003. CGO 2003., pages 265-275. IEEE, 2003.Google ScholarCross Ref
Doug Burger, Thomas R. Puzak, Wei-Fen Lin, and Steven K. Reinhardt. Filtering superfluous prefetches using density vectors. In Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors (ICCD), pages 124-133, 2001.Google Scholar
Chi F. Chen, Se-Hyun Yang, Babak Falsafi, and Andreas Moshovos. Accurate and complexity-efective spatial pattern prediction. In Proceedings of the 10th International Symposium on High Performance Computer Architecture (HPCA), pages 276-288, 2004.Google ScholarDigital Library
Trishul M. Chilimbi. Eficient representations and abstractions for quantifying and exploiting data reference locality. In SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 191-202, 2001.Google Scholar
Yuan Chou. Low-cost epoch-based correlation prefetching for commercial applications. In Proceedings of the 40th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 301-313, 2007.Google ScholarDigital Library
Keith I. Farkas, Paul Chow, Norman P. Jouppi, and Zvonko Vranesic. Memorysystem design considerations for dynamically-scheduled processors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 133-143, 1997.Google ScholarDigital Library
Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. Simpoint 3. 0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism, 7 ( 4 ): 1-28, 2005.Google Scholar
Milad Hashemi, Kevin Swersky, Jamie A Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. arXiv preprint arXiv:1803.02329, 2018.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9 ( 8 ): 1735-1780, 1997.Google Scholar
Zhigang Hu, Margaret Martonosi, and Stefanos Kaxiras. TCP: tag correlating prefetchers. In International Symposium on, High Performance Computer Architecture (HPCA), pages 317-326, 2003.Google Scholar
Ibrahim Hur and Calvin Lin. Memory prefetching using adaptive stream detection. In Proceedings of the 39th International Symposium on Microarchitecture (MICRO), pages 397-408, 2006.Google ScholarDigital Library
Yasuo Ishii, Mary Inaba, and Kei Hiraki. Access map pattern matching for high performance data cache prefetch. Journal of Instruction-Level Parallelism, 13 : 1-24, 2011.Google Scholar
Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geofrey E. Hinton. Adaptive mixtures of local experts. Neural computation, 3 ( 1 ): 79-87, 1991.Google Scholar
Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 247-259, 2013.Google ScholarDigital Library
Akanksha Jain and Calvin Lin. Back to the future: Leveraging belady's algorithm for improved cache replacement. In Proceedings of the International Symposium on Computer Architecture (ISCA), June 2016.Google ScholarDigital Library
Aamer Jaleel, Robert S Cohn, Chi-Keung Luk, and Bruce Jacob. Cmp$im: A Pin-based on-the-fly multi-core cache simulator. In Proceedings of the Fourth Annual Workshop on Modeling, Benchmarking and Simulation (MoBS), co-located with ISCA, pages 28-36, 2008.Google Scholar
Daniel A Jiménez. Multiperspective perceptron predictor. In The Journal of Instruction-Level Parallelism 5th JILP Workshop on Computer Architecture Competitions (JWAC-5), Championship Branch Prediction, (co-located with ISCA 2016 ), 2016.Google Scholar
Daniel A Jiménez and Calvin Lin. Dynamic branch prediction with perceptrons. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA), pages 197-206, 2001.Google ScholarCross Ref
Daniel A Jiménez and Elvira Teran. Multiperspective reuse prediction. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 436-448. IEEE, 2017.Google ScholarDigital Library
Teresa L. Johnson, Matthew C. Merten, and Wen-Mei W. Hwu. Run-time spatial locality detection and optimization. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), pages 57-64, 1997.Google ScholarDigital Library
Doug Joseph and Dirk Grunwald. Prefetching using markov predictors. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), pages 252-263, 1997.Google ScholarDigital Library
Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch bufers. In International Symposium on Computer Architecture (ISCA), pages 364-373, 1990.Google Scholar
Samira Khan, Yingying Tian, and Daniel A Jiménez. Sampling dead block prediction for last-level caches. In 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 175-186, 2010.Google ScholarDigital Library
Jinchun Kim, Seth H Pugsley, Paul V Gratz, AL Reddy, Chris Wilkerson, and Zeshan Chishti. Path confidence based lookahead prefetching. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture, page 60. IEEE Press, 2016.Google Scholar
Tim Kraska, Alex Beutel, Ed H. Chi, Jef Dean, and Neoklis Polyzotis. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), 2018.Google ScholarDigital Library
Sanjeev Kumar and Christopher Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 357-368, 1998.Google ScholarCross Ref
Pierre Michaud. Best-ofset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469-480, 2016.Google ScholarCross Ref
Pierre Michaud. Best-ofset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469-480, 2016.Google ScholarCross Ref
Jinseok Nam, Jungi Kim, Eneldo Loza Mencía, Iryna Gurevych, and Johannes Fürnkranz. Large-scale multi-label text classification-revisiting neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 437-452, 2014.Google ScholarDigital Library
Kyle J. Nesbit, Ashutosh S. Dhodapkar, and James E. Smith. AC/DC: an adaptive data cache prefetcher. In 13th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 135-145, 2004.Google ScholarCross Ref
Kyle J. Nesbit and James E. Smith. Data cache prefetching using a global history bufer. IEEE Micro, 25 ( 1 ): 90-97, 2005.Google Scholar
Subbarao Palacharla and Richard E. Kessler. Evaluating stream bufers as a secondary cache replacement. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 24-33, April 1994.Google ScholarCross Ref
Leeor Peled, Shie Mannor, Uri Weiser, and Yoav Etsion. Semantic locality and context-based prefetching using reinforcement learning. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pages 285-297, 2015.Google ScholarDigital Library
Leeor Peled, Uri Weiser, and Yoav Etsion. A neural network prefetcher for arbitrary memory access patterns. ACM Transactions on Architecture and Code Optimization (TACO), page 37, 2019.Google Scholar
Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.Google ScholarCross Ref
Suleyman Sair, Timothy Sherwood, and Brad Calder. A decoupled predictordirected stream prefetching architecture. IEEE Transactions on Computers, 52 ( 3 ): 260-276, March 2003.Google ScholarDigital Library
Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chisthi. Eficiently prefetching complex address patterns. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO), pages 141-152, 2015.Google ScholarDigital Library
Zhan Shi, Xiangru Huang, Akanksha Jain, and Calvin Lin. Applying deep learning to the cache replacement problem. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 413-425, 2019.Google ScholarDigital Library
A.J. Smith. Sequential program prefetching in memory hierarchies. IEEE Transactions on Computers, 11 ( 12 ): 7-12, December 1978.Google ScholarDigital Library
Yan Solihin, Jaejin Lee, and Josep Torrellas. Using a user-level memory thread for correlation prefetching. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA), pages 171-182, 2002.Google ScholarCross Ref
Stephen Somogyi, Thomas F. Wenisch, Anastasia Ailamaki, and Babak Falsafi. Spatio-temporal memory streaming. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 69-80, 2009.Google ScholarDigital Library
Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. Spatial memory streaming. In Proceedings of the 33th Annual International Symposium on Computer Architecture (ISCA), pages 252-263, 2006.Google ScholarDigital Library
Ajitesh Srivastava, Angelos Lazaris, Benjamin Brooks, Rajgopal Kannan, and Viktor K. Prasanna. Predicting memory accesses: The road to compact ml-driven prefetcher. In Proceedings of the International Symposium on Memory Systems (MEMSYS), pages 461-470, 2019.Google ScholarDigital Library
Stephen J Tarsa, Chit-Kwan Lin, Gokce Keskin, Gautham Chinya, and Hong Wang. Improving branch prediction by modeling global history with convolutional neural networks. arXiv preprint arXiv:1906.09889, 2019.Google Scholar
Elvira Teran, Zhe Wang, and Daniel A Jiménez. Perceptron learning for reuse prediction. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1-12, 2016.Google ScholarCross Ref
Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 3 ( 3 ): 1-13, 2007.Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998-6008, 2017.Google ScholarDigital Library
Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Temporal streams in commercial server applications. In IEEE International Symposium on Workload Characterization, pages 99-108, 2008.Google ScholarCross Ref
Thomas F Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Practical of-chip meta-data for temporal memory streaming. In 2009 IEEE 15th International Symposium on High Performance Computer Architecture (HPCA), pages 79-90, 2009.Google ScholarCross Ref
Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. SHiP: Signature-based hit predictor for high performance caching. In 44th IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 430-441, 2011.Google ScholarDigital Library
Hao Wu, Krishnendra Nathella, Joseph Pusdesris, Dam Sunwoo, Akanksha Jain, and Calvin Lin. Temporal prefetching without the of-chip metadata. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 996-1008, 2019.Google ScholarDigital Library
Hao Wu, Krishnendra Nathella, Dam Sunwoo, Akanksha Jain, and Calvin Lin. Eficient metadata management for irregular data prefetching. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA), pages 449-461, 2019.Google ScholarDigital Library
Siavash Zangeneh, Stephen Pruett, Sangkug Lym, and Yale N Patt. Branchnet : A convolutional neural network to predict hard-to-predict branches. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 118-130, 2020.Google ScholarCross Ref

Index Terms

A hierarchical neural model of data prefetching
1. Computer systems organization
  1. Architectures

Recommendations

Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Read More
Stealth prefetching
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Read More
Increasing hardware data prefetching performance using the second-level cache

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
ISBN:9781450383172
DOI:10.1145/3445814
General Chair:
Tim Sherwood
University of California at Santa Barbara, USA
,
Program Chairs:
Emery Berger
University of Massachusetts at Amherst, USA
,
Christos Kozyrakis
Stanford University, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Attention Mechanism
Neural Networks
Prefetching
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 2,011
  Total Downloads
- Downloads (Last 12 months)722
- Downloads (Last 6 weeks)96
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.