ABSTRACT
Data prefetching is a very important technique for hiding memory latency and improving performance in modern computer processors. Existing techniques are not able to find all or best data streams to prefetch. This paper proposes a new prefetching technique, Multiple Stream Tracker (MST), that improves over state-of-the-art by identifying strided accesses in a cache miss stream. Targeting the lower levels of cache it searches for the best among all possible strided streams to prefetch. A technique to efficiently search and rank multiple strided streams is proposed. The proposed technique can identify streams that subsume streams generated by both delta correlated and standard stride prefetchers. The MST pefetcher can also significantly improve performance in parallel programs.
The Multiple Stream Tracker applied at the L3 cache improves the IPC by up to 173% (14% on average) over stride prefetching for SPEC CPU2006 benchmarks. The improvement is up to 92% over delta correlation (5% on average). The speedup for SPEComp programs is up to 300% over delta correlation (22% on average). MST also has lower average memory bandwidth requirements compared to prior techniques.
- Data prefetching championship, 2009.Google Scholar
- V. Aslot, M. J. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B. Parady. Specomp: A new benchmark suite for measuring parallel computer performance. In Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, WOMPAT '01, pages 1--10. Springer-Verlag, 2001. Google ScholarDigital Library
- J.-L. Baer and T.-F. Chen. An effective on-chip preloading scheme to reduce data access penalty. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing, Supercomputing '91, pages 176--186. ACM, 1991. Google ScholarDigital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, Aug. 2011. Google ScholarDigital Library
- T.-F. Chen and J. loup Baer. Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44:609--623, 1995. Google ScholarDigital Library
- P. Diaz and M. Cintra. Stream chaining: exploiting multiple levels of correlation in data prefetching. In Proceedings of the 36th annual international symposium on Computer architecture, ISCA '09, pages 81--92. ACM, 2009. Google ScholarDigital Library
- M. Dimitrov and H. Zhou. Combining local and global history for high performance data prefetching, 2009.Google Scholar
- J. Doweck. Inside Intel® Core#8482; Microarchitecture and Smart Memory Access, 2006.Google Scholar
- E. Ebrahimi, O. Mutlu, C. J. Lee, and Y. N. Patt. Coordinated control of multiple prefetchers in multi-core systems. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 42, pages 316--326. ACM, 2009. Google ScholarDigital Library
- J. W. C. Fu, J. H. Patel, and B. L. Janssens. Stride directed prefetching in scalar processors. In Proceedings of the 25th annual international symposium on Microarchitecture, MICRO 25, pages 102--110. IEEE Computer Society Press, 1992. Google ScholarDigital Library
- M. Grannaes, M. Jahre, and L. Natvig. Storage efficient hardware prefetching using delta correlating prediction tables, 2009.Google Scholar
- I. Hur and C. Lin. Memory prefetching using adaptive stream detection. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pages 397--408. IEEE Computer Society, 2006. Google ScholarDigital Library
- Y. Ishii, M. Inaba, and K. Hiraki. Access map pattern matching for data cache prefetch. In Proceedings of the 23rd international conference on Supercomputing, ICS '09, pages 499--500. ACM, 2009. Google ScholarDigital Library
- Y. Jégou and O. Temam. Speculative prefetching. In Proceedings of the 7th international conference on Supercomputing, ICS '93, pages 57--66. ACM, 1993. Google ScholarDigital Library
- D. Joseph and D. Grunwald. Prefetching using markov predictors. In Proceedings of the 24th annual international symposium on Computer architecture, ISCA '97, pages 252--263. ACM, 1997. Google ScholarDigital Library
- N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the 17th annual international symposium on Computer Architecture, ISCA '90, pages 364--373. ACM, 1990. Google ScholarDigital Library
- G. B. Kandiraju and A. Sivasubramaniam. Going the distance for tlb prefetching: an application-driven study. In Proceedings of the 29th annual international symposium on Computer architecture, ISCA '02, pages 195--206. IEEE Computer Society, 2002. Google ScholarDigital Library
- S. Kim and A. V. Veidenbaum. Stride-directed prefetching for secondary caches. In Proceedings of the international Conference on Parallel Processing, ICPP '97, pages 314--. IEEE Computer Society, 1997. Google ScholarDigital Library
- T. C. Mowry. Tolerating latency in multiprocessors through compiler-inserted prefetching. ACM Trans. Comput. Syst., 16(1):55--92, Feb. 1998. Google ScholarDigital Library
- K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith. Ac/dc: An adaptive data cache prefetcher. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pages 135--145. IEEE Computer Society, 2004. Google ScholarDigital Library
- K. J. Nesbit and J. E. Smith. Data cache prefetching using a global history buffer. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, HPCA '04, pages 96--. IEEE Computer Society, 2004. Google ScholarDigital Library
- S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. In Proceedings of the 21st annual international symposium on Computer architecture, ISCA '94, pages 24--33. IEEE Computer Society Press, 1994. Google ScholarDigital Library
- D. G. Perez, G. Mouchard, and O. Temam. Microlib: A case for the quantitative comparison of micro-architecture mechanisms. In Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, MICRO 37, pages 43--54. IEEE Computer Society, 2004. Google ScholarDigital Library
- L. M. Ramos, J. L. Briz, P. E. IbÃąÃśez, and V. ViÃśals. Multi-level adaptive prefetching based on performance gradient tracking, 2009.Google Scholar
- A. Smith. Sequential program prefetching in memory hierarchies. Computer, 11(12):7--21, dec. 1978. Google ScholarDigital Library
- A. J. Smith. Cache memories. ACM Comput. Surv., 14(3):473--530, Sept. 1982. Google ScholarDigital Library
- S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Spatial memory streaming. In Proceedings of the 33rd annual international symposium on Computer Architecture, ISCA '06, pages 252--263. IEEE Computer Society, 2006. Google ScholarDigital Library
- S. Thoziyoor, N. Muralimanohar, and N. P. Jouppi. Cacti 5.0, 2007.Google Scholar
- X. Tian, R. Krishnaiyer, H. Saito, M. Girkar, and W. Li. Impact of compiler-based data-prefetching techniques on spec omp application performance. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, pages 53a--53a, 2005. Google ScholarDigital Library
- M. Yuffe, E. Knoll, M. Mehalel, J. Shor, and T. Kurts. A fully integrated multi-cpu, gpu and memory controller 32nm processor. In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2011 IEEE International, pages 264--266, 2011.Google ScholarCross Ref
- H. Zhu, Y. Chen, and X.-H. Sun. Timing local streams: improving timeliness in data prefetching. In Proceedings of the 24th ACM International Conference on Supercomputing, ICS '10, pages 169--178. ACM, 2010. Google ScholarDigital Library
Index Terms
- Multiple stream tracker: a new hardware stride prefetcher
Recommendations
Execution History Guided Instruction Prefetching
The increasing gap in performance between processors and main memory has made effective instructions prefetching techniques more important than ever. A major deficiency of existing prefetching methods is that most of them require an extra port to I-...
A Prefetch-Adaptive Intelligent Cache Replacement Policy Based on Machine Learning
AbstractHardware prefetching and replacement policies are two techniques to improve the performance of the memory subsystem. While prefetching hides memory latency and improves performance, interactions take place with the cache replacement policies, ...
On the importance of optimizing the configuration of stream prefetchers
MSP '05: Proceedings of the 2005 workshop on Memory system performanceThis paper provides a detailed analysis of how the parameters of hardware prefetchers affect the memory system performance. In particular, we found the configuration of the frequently used stream prefetcher to have a major impact on the runtime, making ...
Comments