Skip to main content

A Programmable Memory Hierarchy for Prefetching Linked Data Structures

  • Conference paper
  • First Online:
High Performance Computing (ISHPC 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2327))

Included in the following conference series:

Abstract

Prefetching is often used to overlap memory latency with computation for array-based applications. However, prefetching for pointer-intensive applications remains a challenge because of the irregular memory access pattern and pointer-chasing problem. In this paper, we use a programmable processor, a prefetch engine (PFE), at each level of the memory hierarchy to cooperatively execute instructions that traverse a linked data structure. Cache blocks accessed by the processors at the L2 and memory levels are proactively pushed up to the CPU.

We look at several design issues to support this programmable memory hierarchy. We establish a general interaction scheme among three PFEs and design a mechanism to synchronize the PFE execution with the CPU. Our simulation results show that the proposed prefetching scheme can reduce up to 100% of memory stall time on a suite of pointer-intensive applications, reducing overall execution time by an average 19%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Klaiber, A.C., Levy, H.M.: An architecture for software-controlled data prefetching. In: Proceedings of the 18th Annual International Symposium on Computer Architecture. (1991) 43–53

    Google Scholar 

  2. Yang, C., Lebeck, A.R.: Push vs. pull: Data movement for linked data structures. In: Proceedings of the ACM International Conference on Supercomputing. (2000) 176–186

    Google Scholar 

  3. Collins, J.D., Wang, H., Tullsen, D.M., Christopher, H.J., Lee, Y.F., Lavery, D., Shen, J.P.: Speculative precomputation: Long-range prefetching of delinquent loads. In: Proceedings of the 28th Annual International Symposium on Computer Architecture. (2001) 14–25

    Google Scholar 

  4. Roth, A., Sohi, G.: Speculative data-driven multithreading. In: Proceedings of 7th Symposium High-Performance Computer Architecture. (2001) 134–143

    Google Scholar 

  5. Zilles, C.B., Sohi, G.: Execution-base prediction using speculative slices. In: Proceedings of the 28th Annual International Symposium on Computer Architecture. (2001) 2–13

    Google Scholar 

  6. Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings of the 17th Annual International Symposium on Computer Architecture. (1990) 364–373

    Google Scholar 

  7. Baer, J.L., Chen, T.F.: An effective on-chip preloading scheme to reduce data access penalty. In: Proceedings of the 1991 Conference on SuperComputing. (1991) 176–186

    Google Scholar 

  8. Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). (1991) 40–52

    Google Scholar 

  9. Mowry, T.C., Lam, M.S., Gupta, A.: Design and evaluation of a compiler algorithm for prefetching. In: Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating System. (1992) 62–73

    Google Scholar 

  10. Joseph, D., Grunwald, D.: Prefetching using markov predictors. In: Proceedings of the 24th Annual International Symposium on Computer Architecture. (1997) 252–263

    Google Scholar 

  11. Alexander, T., Kedem, G.: Distributed predictive cache design for high performance memory system. In: Proceedings of the 2th International Symposium on High-Performance Computer Architecture. (1996)

    Google Scholar 

  12. Lipasti, M.H., Schmidt, W.J., Kunkel, S.R., Roediger, R.R.: Spaid: Software prefeteching in pointer-and call-intensive environments. In: Proceedings of the 28th Annual International Symposium on Microarchitecture. (1995)

    Google Scholar 

  13. Luk, C.K., Mowry, T.C.: Compiler based prefetching for recursive data structure. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII). (1996) 222–233

    Google Scholar 

  14. Zhang, Z., Torrellas, J.: Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture. (1995) 188–200

    Google Scholar 

  15. Chilimbi, T.M., Hill, M.D., Larus, J.R.: Cache-conscious struture layout. In: Proceedings of the SIGPLAN’ 99 Conference on Programming Language Design and Implementation. (1999) 1–12

    Google Scholar 

  16. Chilimbi, T.M., Davidson, B., Larus, J.R.: Cache-conscious struture definition. In: Proceedings of the SIGPLAN’ 99 Conference on Programming Language Design and Implementation. (1999) 13–24

    Google Scholar 

  17. Mehrotra, S., Harrison, L.: Examination of a memory access classification scheme for pointer-intensive and numeric program. In: Proceedings of the 10th International Conference on Supercomputing. (1996) 133–139

    Google Scholar 

  18. Roth, A., Moshovos, A., Sohi, G.: Dependence based prefetching for linked data structures. In: Proceedings of the Eigth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). (1998) 115–126

    Google Scholar 

  19. Roth, A., Sohi, G.: Effective jump-pointer prefetching for linked data structures. In: Proceedings of the 26th Annual International Symposium on Computer Architecture. (1999) 111–121

    Google Scholar 

  20. Karlsson, M., Dahlgren, F., Stenstrom, P.: A prefetching technique for irregular accesses to linked data structures. In: Proceedings of Sixth Symposium High-Performance Computer Architecture. (1999) 206–217

    Google Scholar 

  21. Patterson, D., Andreson, T., Cardwell, N., Fromm, R., Keaton, K., Kazyrakis, C., Thomas, R., Yellick, K.: A case for intelligentam. IEEE Micro (1997) 34–44

    Google Scholar 

  22. Kang, Y., Huang, W., Yoo, S.M., Keen, D., Ge, Z., Lam, V., Pattnaik, P., Torrellas, J.: Flexram: Toward an advanced intelligent memory system. In: Proceedings of the 1999 International Conference on Computer Design. (1999) 192–201

    Google Scholar 

  23. Oskin, M., Chong, F.T., Sherwood, T.: Active pages: a computation model for intelligent memory. In: Proceedings of the 25th Annual International Symposium on Computer Architecture. (1998) 192–203

    Google Scholar 

  24. Carter, J., Hsieh, W., Stoller, L., Swanson, M., Zhang, L.: Impulse: Building a smarter memory controller. In: Proceedings of 5th Symposium High-Performance Computer Architecture. (1999) 70–79

    Google Scholar 

  25. Hughes, C.J.: Prefetching linked data structures in systems with merged dramlogic, master thesis. Technical Report UIUCDCS-R-2001-2221, Department of Computer Science, University of Illinois at Urbana-Champaign (2000)

    Google Scholar 

  26. Annavaram, M.M., Patel, J.M., Davidson, E.S.: Data prefetching by dependence graph precomputation. In: Proceedings of the 28th Annual International Symposium on Computer Architecture. (2001) 52–61

    Google Scholar 

  27. Sundaramoorthy, K., Purser, Z., Rotenberg, E.: Slipstream processors: Improving both performance and fault tolerance. (2000) 257–268

    Google Scholar 

  28. Luk, C.K.: Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. In: Proceedings of the 28th Annual International Symposium on Computer Architecture. (2001) 40–51

    Google Scholar 

  29. Collins, J., Tullsen, D., Wang, H., Shen, J.: Dynamic speculative precomputation. In: Proceedings of the 34st Annual International Symposium on Microarchitecture. (2001)

    Google Scholar 

  30. Moshovos, A., Pnevmatikatos, D., Baniasadi, A.: Slice processors: An implementation of operation-based prediction. In: Proceedings of the ACM International Conference on Supercomputing. (2001) 321–334

    Google Scholar 

  31. Crago, S.P., Despain, A., Gaudiot, J., Makhija, M., Ro, W., Sricastava, A.: A high-performance, hierarchical decoupled architecture. In: Proceedings of the Memory Access Decoupling for SuperScalar and Multiple Issue Architecture Workship. (2000)

    Google Scholar 

  32. Yang, C.L.: The Push Architecture: a Prefetching Framework for Linked-Data Structure. PhD thesis, Department of Computer Science, Duke University (2001)

    Google Scholar 

  33. Smith, B.: Architecture and applications of the hep multiprocessor computer system. In: Proceedings of the Int. Soc. for Opt. Engr. (1982) 241–248

    Google Scholar 

  34. Burger, D.C., Austin, T.M., Bennett, S.: Evaluating future microprocessors-the simplescalar tool set. Technical Report 1308, Computer Sciences Department, University of Wisconsin-Madison (1996)

    Google Scholar 

  35. Kessler, R.E.: The alpha 21264 microprocessor. IEEE Micro (1999) 34–36

    Google Scholar 

  36. Lebeck, A., Wood, D.: Cache profiling and the spec benchmarks: A case study. In: IEEE Computer. (1994) 15–26

    Google Scholar 

  37. Roger, A., Carlisle, M., Reppy, J., Hendren, L.: Supporting dynamic data structures on distributed memory machines. ACM Transactions on Programming Languages and Sytems 17 (1995)

    Google Scholar 

  38. Kolb, C.: The rayshade user’s guide. (In: http://graphics.stanford.edu/-cek/-rayshade)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, CL., Lebeck, A. (2002). A Programmable Memory Hierarchy for Prefetching Linked Data Structures. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds) High Performance Computing. ISHPC 2002. Lecture Notes in Computer Science, vol 2327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47847-7_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-47847-7_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43674-4

  • Online ISBN: 978-3-540-47847-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics