skip to main content
10.1145/2540708.2540730acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Linearizing irregular memory accesses for improved correlated prefetching

Published:07 December 2013Publication History

ABSTRACT

This paper introduces the Irregular Stream Buffer (ISB), a prefetcher that targets irregular sequences of temporally correlated memory references. The key idea is to use an extra level of indirection to translate arbitrary pairs of correlated physical addresses into consecutive addresses in a new structural address space, which is visible only to the ISB. This structural address space allows the ISB to organize prefetching meta-data so that it is simultaneously temporally and spatially ordered, which produces technical benefits in terms of coverage, accuracy, and memory traffic overhead.

We evaluate the ISB using the Marss full system simulator and the irregular memory-intensive programs of SPEC CPU 2006 for both single-core and multi-core systems. For example, on a single core, the ISB exhibits an average speedup of 23.1% with 93.7% accuracy, compared to 9.9% speedup and 64.2% accuracy for an idealized prefetcher that over-approximates the STMS prefetcher, the previous best temporal stream prefetcher; this ISB prefetcher uses 32 KB of on-chip storage and sees 8.4% memory traffic overhead due to meta-data accesses. We also show that a hybrid prefetcher that combines a stride-prefetcher and an ISB with just 8 KB of on-chip storage exhibits 40.8% speedup and 66.2% accuracy.

References

  1. J.-L. Baer and T.-F. Chen. Effective hardware-based data prefetching for high-performance processors. IEEE Transactions on Computers, 44(5):609--623, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Burcea, S. Somogyi, A. Moshovos, and B. Falsafi. Predictor virtualization. In Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, ASPLOS XIII, pages 157--167. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Burger, T. R. Puzak, W.-F. Lin, and S. K. Reinhardt. Filtering superfluous prefetches using density vectors. In ICCD '01: Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors, pages 124--133, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. B. Carter, W. C. Hsieh, L. Stoller, M. R. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. A. Parker, L. Schaelicke, and T. Tateyama. Impulse: Building a smarter memory controller. In HPCA, pages 70--79, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. F. Chen, S.-H. Yang, B. Falsafi, and A. Moshovos. Accurate and complexity-effective spatial pattern prediction. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, HPCA '04, pages 276--288, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In PLDI, pages 191--202, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proceedings of the ACM SIGPLAN 1999 conference on Programming Language Design and Implementation, PLDI '99, pages 1--12, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Chou. Low-cost epoch-based correlation prefetching for commercial applications. In MICRO, pages 301--313, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I.-H. Chung, C. Kim, H.-F. Wen, and G. Cong. Application data prefetching on the ibm blue gene/q supercomputer. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Collins, S. Sair, B. Calder, and D. M. Tullsen. Pointer cache assisted prefetching. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 35, pages 62--73, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Cooksey, S. Jourdan, and D. Grunwald. A stateless, content-directed data prefetching mechanism. SIGARCH Computer Architecture News, 30(5):279--290, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Diaz and M. Cintra. Stream chaining: exploiting multiple levels of correlation in data prefetching. In ISCA, pages 81--92, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Dimitrov and H. Zhou. Combining local and global history for high performance data prefetching. In Journal of Instruction-Level Parallelism Data Prefetching Championship, volume 13, 2011.Google ScholarGoogle Scholar
  14. E. Ebrahimi, O. Mutlu, and Y. N. Patt. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems. In HPCA, pages 7--17, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  15. K. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic. Memory-system design considerations for dynamically-scheduled processors. In ISCA '97: Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 133--143, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Hamerly, E. Perelman, J. Lau, and B. Calder. Simpoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism, 7(4):1--28, 2005.Google ScholarGoogle Scholar
  17. Z. Hu, M. Martonosi, and S. Kaxiras. TCP: tag correlating prefetchers. In HPCA, pages 317--326, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng. The garbage collection advantage: improving program locality. In Proceedings of the 19th annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 69--80, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I. Hur and C. Lin. Memory prefetching using adaptive stream detection. In Proceedings of the 39th International Symposium on Microarchitecture, pages 397--408, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Ishii, M. Inaba, and K. Hiraki. Access map pattern matching for high performance data cache prefetch. In Journal of Instruction-Level Parallelism, volume 13, pages 1--24, 2011.Google ScholarGoogle Scholar
  21. A. Jaleel. Memory characterization of workloads using instrumentation-driven simulation -- a pin-based memory characterization of the SPEC CPU2000 and SPEC CPU2006 benchmark suites. Technical report, VSSAD Technical Report 2007, 2007.Google ScholarGoogle Scholar
  22. T. L. Johnson, M. C. Merten, and W.-M. W. Hwu. Run-time spatial locality detection and optimization. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pages 57--64, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Joseph and D. Grunwald. Prefetching using markov predictors. In Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 252--263, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. SIGARCH Computer Architecture News, 18(3a):364--373, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Kumar and C. Wilkerson. Exploiting spatial locality in data caches using spatial footprints. SIGARCH Computer Architecture News, 26(3):357--368, April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C.-K. Luk and T. C. Mowry. Compiler-based prefetching for recursive data structures. SIGOPS Operating Systems Review, 30(5):222--233, September 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. J. Nesbit, A. S. Dhodapkar, and J. E. Smith. Ac/dc: An adaptive data cache prefetcher. In IEEE PACT, pages 135--145, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. J. Nesbit and J. E. Smith. Data cache prefetching using a global history buffer. IEEE Micro, 25(1):90--97, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Palacharla and R. E. Kessler. Evaluating stream buffers as a secondary cache replacement. In Proceedings of the International Symposium on Computer Architecture, pages 24--33, April 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Patel, F. Afram, S. Chen, and K. Ghose. MARSSx86: A Full System Simulator for x86 CPUs. In Design Automation Conference 2011 (DAC'11), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder. Using simpoint for accurate and efficient simulation. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 318--319, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Roth, A. Moshovos, and G. S. Sohi. Dependence based prefetching for linked data structures. In Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, ASPLOS-VIII, pages 115--126, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Roth and G. S. Sohi. Effective jump-pointer prefetching for linked data structures. In Proceedings of the 26th Annual International Symposium on Computer Architecture, ISCA '99, pages 111--121, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Sair, T. Sherwood, and B. Calder. A decoupled predictor-directed stream prefetching architecture. IEEE Transactions on Computers, 52(3):260--276, March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Shivakumar and N. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. Technical report, Technical Report 2001/2, Compaq Computer Corporation, 2001.Google ScholarGoogle Scholar
  36. A. Smith. Sequential program prefetching in memory hierarchies. IEEE Transactions on Computers, 11(12):7--12, December 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Solihin, J. Lee, and J. Torrellas. Using a user-level memory thread for correlation prefetching. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 171--182, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Somogyi, T. F. Wenisch, A. Ailamaki, and B. Falsafi. Spatio-temporal memory streaming. In ISCA, pages 69--80, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Somogyi, T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Spatial memory streaming. In ISCA '06: Proceedings of the 33rd Annual International Symposium on Computer Architecture, pages 252--263, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Z. Wang, D. Burger, K. S. McKinley, S. K. Reinhardt, and C. C. Weems. Guided region prefetching: a cooperative hardware/software approach. SIGARCH Computer Architecture News, 31(2):388--398, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos. Temporal streams in commercial server applications. In IISWC, pages 99--108, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  42. T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos. Practical off-chip meta-data for temporal memory streaming. In HPCA, pages 79--90, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  43. T. F. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, and A. Moshovos. Making address-correlated prefetching practical. IEEE Micro, 30(1):50--59, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. F. Wenisch, S. Somogyi, N. Hardavellas, J. Kim, A. Ailamaki, and B. Falsafi. Temporal streaming of shared memory. SIGARCH Computer Architecture News, 33(2):222--233, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Linearizing irregular memory accesses for improved correlated prefetching

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
      December 2013
      498 pages
      ISBN:9781450326384
      DOI:10.1145/2540708

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 December 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Author Tags

      Qualifiers

      • research-article

      Acceptance Rates

      MICRO-46 Paper Acceptance Rate39of239submissions,16%Overall Acceptance Rate484of2,242submissions,22%

      Upcoming Conference

      MICRO '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader