skip to main content
article

Spatial Memory Streaming

Published:01 May 2006Publication History
Skip Abstract Section

Abstract

Prior research indicates that there is much spatial variation in applications' memory access patterns. Modern memory systems, however, use small fixed-size cache blocks and as such cannot exploit the variation. Increasing the block size would not only prohibitively increase pin and interconnect bandwidth demands, but also increase the likelihood of false sharing in shared-memory multiprocessors. In this paper, we show that memory accesses in commercial workloads often exhibit repetitive layouts that span large memory regions (e.g., several kB), and these accesses recur in patterns that are predictable through codebased correlation. We propose Spatial Memory Streaming, a practical on-chip hardware technique that identifies codecorrelated spatial access patterns and streams predicted blocks to the primary cache ahead of demand misses. Using cycle-accurate full-system multiprocessor simulation of commercial and scientific applications, we demonstrate that Spatial Memory Streaming can on average predict 58% of L1 and 65% of off-chip misses, for a mean performance improvement of 37% and at best 307%.

References

  1. {1} S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29(12):66-76, Dec. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {2} A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In The VLDB Journal, Sep. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {3} L. A. Barroso, K. Gharachorloo, and E. Bugnion. Memory system characterization of commercial workloads. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} C. F. Chen, S.-H. Yang, B. Falsafi, and A. Moshovos. Accurate and complexity-effective spatial pattern prediction. In Proceedings of the Tenth Symposium on High-Performance Computer Architecture, Feb. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. In Proceedings of the 20th International Conference on Data Engineering, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} Y. Chou, B. Fahs, and S. Abraham. Microarchitecture optimizations for exploiting memory-level parallelism. In Proceedings of the 31st International Symposium on Computer Architecture, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} Z. Cvetanovic. Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor. In Proceedings of the 30th International Symposium on Computer Architecture, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} C. Dubnicki and T. J. LeBlanc. Adjustable block size coherence caches. In Proceedings of the 19th International Symposium on Computer Architecture, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, Aug. 1991.Google ScholarGoogle Scholar
  10. {10} C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In Proceedings of the 26th International Symposium on Computer Architecture, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} A. Gonzalez, C. Aliagas, and M. Valero. A data cache with multiple caching strategies tuned to different types of locality. In International Conference on Supercomputing , July 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} D. Gracia Perez, G. Mouchard, and O. Temam. MicroLib: A case for the quantitative comparison of micro-architecture mechanisms. In Proceedings of the 37th International Symposium on Microarchitecture, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} R. Hankins, T. Diep, M. Annavaram, B. Hirano, H. Eri, H. Nueckel, and J. P. Shen. Scaling and characterizing database workloads: Bridging the gap between research and practice. In Proceedings of the 36th International Symposium on Microarchitecture, Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} N. Hardavellas, S. Somogyi, T. F. Wenisch, R. E. Wunderlich, S. Chen, J. Kim, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. SimFlex: A fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Performance Evaluation Review, 31(4):31-35, Apr. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} J. Huh, J. Chang, D. Burger, and G. S. Sohi. Coherence decoupling: making use of incoherence. In Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {16} T. Johnson, M. Merten, and W.-M. Hwu. Run-time spatial locality detection and optimization. In Proceedings of the 31st International Symposium on Microarchitecture , Nov. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {17} S. Kumar and C. Wilkerson. Exploiting spatial locality in data caches using spatial footprints. In Proceedings of the 25th International Symposium on Computer Architecture , June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {18} A.-C. Lai and B. Falsafi. Dead-block prediction & dead-block correlating prefetchers. In Proceedings of the 28th Annual International Symposium on Computer Architecture, July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {19} O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt. Runahead execution: an effective alternative to large instruction windows. IEEE Micro, 23(6):20-25, Nov./ Dec. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {20} K. J. Nesbit and J. E. Smith. Data cache prefetching using a global history buffer. In Proceedings of the Tenth Symposium on High-Performance Computer Architecture , Feb. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {21} P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {22} A. Seznec. Decoupled sectored caches. In IEEE Transactions on Computers, 46(2):210-215, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {23} M. Shao, A. Ailamaki, and B. Falsafi. DBmbench: Fast and accurate database workload representation on modern microarchitecture. In Proceedings of the 15th IBM Center for Advanced Studies Conference, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {24} T. Sherwood, S. Sair, and B. Calder. Predictor-directed stream buffers. In Proceedings of the 33rd International Symposium on Microarchitecture, Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {25} Y. Solihin, J. Lee, and J. Torrellas. Using a user-level memory thread for correlation prefetching. In Proceedings of the 29th International Symposium on Computer Architecture, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {26} P. Trancoso, J.-L. Larriba-Pey, Z. Zhang, and J. Torellas. The memory performance of DSS commercial workloads in shared-memory multiprocessors. In Proceedings of the Third Symposium on High-Performance Computer Architecture, Feb. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {27} A. V. Veidenbaum, W. Tang, R. Gupta, A. Nicolau, and X. Ji. Adapting cache line size to application behavior. In International Conference on Supercomputing, July 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. {28} P. V. Vleet, E. Anderson, L. Brown, J.-L. Bear, and A. Karlin. Pursuing the performance potential of dynamic cache line sizes. In International Conference on Computer Design, Oct. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {29} Z. Wang, D. Burger, K. S. McKinley, S. K. Reinhardt, and C. C. Weems. Guided region prefetching: a cooperative hardware/software approach. In Proceedings of the 30th International Symposium on Computer Architecture , June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {30} T. F. Wenisch, S. Somogyi, N. Hardavellas, J. Kim, A. Ailamaki, and B. Falsafi. Temporal streaming of shared memory. In Proceedings of the 32nd International Symposium on Computer Architecture, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {31} T. F. Wenisch, R. E. Wunderlich, B. Falsafi, and J. C. Hoe. Simulation sampling with live-points. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, June 2006.Google ScholarGoogle ScholarCross RefCross Ref
  32. {32} R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. SMARTS: Accelerating microarchitecture simulation through rigorous statistical sampling. In Proceedings of the 30th International Symposium on Computer Architecture, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Spatial Memory Streaming

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 34, Issue 2
            May 2006
            383 pages
            ISSN:0163-5964
            DOI:10.1145/1150019
            Issue’s Table of Contents
            • cover image ACM Conferences
              ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture
              June 2006
              383 pages
              ISBN:076952608X

            Copyright © 2006 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 May 2006

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader