skip to main content
10.1145/2597917.2597941acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Multiple stream tracker: a new hardware stride prefetcher

Published:20 May 2014Publication History

ABSTRACT

Data prefetching is a very important technique for hiding memory latency and improving performance in modern computer processors. Existing techniques are not able to find all or best data streams to prefetch. This paper proposes a new prefetching technique, Multiple Stream Tracker (MST), that improves over state-of-the-art by identifying strided accesses in a cache miss stream. Targeting the lower levels of cache it searches for the best among all possible strided streams to prefetch. A technique to efficiently search and rank multiple strided streams is proposed. The proposed technique can identify streams that subsume streams generated by both delta correlated and standard stride prefetchers. The MST pefetcher can also significantly improve performance in parallel programs.

The Multiple Stream Tracker applied at the L3 cache improves the IPC by up to 173% (14% on average) over stride prefetching for SPEC CPU2006 benchmarks. The improvement is up to 92% over delta correlation (5% on average). The speedup for SPEComp programs is up to 300% over delta correlation (22% on average). MST also has lower average memory bandwidth requirements compared to prior techniques.

References

  1. Data prefetching championship, 2009.Google ScholarGoogle Scholar
  2. V. Aslot, M. J. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B. Parady. Specomp: A new benchmark suite for measuring parallel computer performance. In Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, WOMPAT '01, pages 1--10. Springer-Verlag, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J.-L. Baer and T.-F. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing '91, pages 176--186. ACM, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, Aug. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T.-F. Chen and J. loup Baer. Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44:609--623, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Diaz and M. Cintra. Stream chaining: exploiting multiple levels of correlation in data prefetching. In Proceedings of the 36th annual international symposium on Computer architecture, ISCA '09, pages 81--92. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Dimitrov and H. Zhou. Combining local and global history for high performance data prefetching, 2009.Google ScholarGoogle Scholar
  8. J. Doweck. Inside Intel® Core#8482; Microarchitecture and Smart Memory Access, 2006.Google ScholarGoogle Scholar
  9. E. Ebrahimi, O. Mutlu, C. J. Lee, and Y. N. Patt. Coordinated control of multiple prefetchers in multi-core systems. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 316--326. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors. In Proceedings of the 25th annual international symposium on Microarchitecture, MICRO 25, pages 102--110. IEEE Computer Society Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Grannaes, M. Jahre, and L. Natvig. Storage efficient hardware prefetching using delta correlating prediction tables, 2009.Google ScholarGoogle Scholar
  12. I. Hur and C. Lin. Memory prefetching using adaptive stream detection. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pages 397--408. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Ishii, M. Inaba, and K. Hiraki. Access map pattern matching for data cache prefetch. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 499--500. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Jégou and O. Temam. Speculative prefetching. In Proceedings of the 7th international conference on Supercomputing, ICS '93, pages 57--66. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Joseph and D. Grunwald. Prefetching using markov predictors. In Proceedings of the 24th annual international symposium on Computer architecture, ISCA '97, pages 252--263. ACM, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th annual international symposium on Computer Architecture, ISCA '90, pages 364--373. ACM, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. B. Kandiraju and A. Sivasubramaniam. Going the distance for tlb prefetching: an application-driven study. In Proceedings of the 29th annual international symposium on Computer architecture, ISCA '02, pages 195--206. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Kim and A. V. Veidenbaum. Stride-directed prefetching for secondary caches. In Proceedings of the international Conference on Parallel Processing, ICPP '97, pages 314--. IEEE Computer Society, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. C. Mowry. Tolerating latency in multiprocessors through compiler-inserted prefetching. ACM Trans. Comput. Syst., 16(1):55--92, Feb. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith. Ac/dc: An adaptive data cache prefetcher. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pages 135--145. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. J. Nesbit and J. E. Smith. Data cache prefetching using a global history buffer. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, HPCA '04, pages 96--. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. In Proceedings of the 21st annual international symposium on Computer architecture, ISCA '94, pages 24--33. IEEE Computer Society Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. G. Perez, G. Mouchard, and O. Temam. Microlib: A case for the quantitative comparison of micro-architecture mechanisms. In Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 37, pages 43--54. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. M. Ramos, J. L. Briz, P. E. IbÃąÃśez, and V. ViÃśals. Multi-level adaptive prefetching based on performance gradient tracking, 2009.Google ScholarGoogle Scholar
  25. A. Smith. Sequential program prefetching in memory hierarchies. Computer, 11(12):7--21, dec. 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. J. Smith. Cache memories. ACM Comput. Surv., 14(3):473--530, Sept. 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Spatial memory streaming. In Proceedings of the 33rd annual international symposium on Computer Architecture, ISCA '06, pages 252--263. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Thoziyoor, N. Muralimanohar, and N. P. Jouppi. Cacti 5.0, 2007.Google ScholarGoogle Scholar
  29. X. Tian, R. Krishnaiyer, H. Saito, M. Girkar, and W. Li. Impact of compiler-based data-prefetching techniques on spec omp application performance. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, pages 53a--53a, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Yuffe, E. Knoll, M. Mehalel, J. Shor, and T. Kurts. A fully integrated multi-cpu, gpu and memory controller 32nm processor. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE International, pages 264--266, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  31. H. Zhu, Y. Chen, and X.-H. Sun. Timing local streams: improving timeliness in data prefetching. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 169--178. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multiple stream tracker: a new hardware stride prefetcher

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CF '14: Proceedings of the 11th ACM Conference on Computing Frontiers
      May 2014
      305 pages
      ISBN:9781450328708
      DOI:10.1145/2597917

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 May 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CF '14 Paper Acceptance Rate28of62submissions,45%Overall Acceptance Rate207of584submissions,35%

      Upcoming Conference

      CF '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader