Abstract
Prefetching is often used to overlap memory latency with computation for array-based applications. However, prefetching for pointer-intensive applications remains a challenge because of the irregular memory access pattern and pointer-chasing problem. In this paper, we use a programmable processor, a prefetch engine (PFE), at each level of the memory hierarchy to cooperatively execute instructions that traverse a linked data structure. Cache blocks accessed by the processors at the L2 and memory levels are proactively pushed up to the CPU.
We look at several design issues to support this programmable memory hierarchy. We establish a general interaction scheme among three PFEs and design a mechanism to synchronize the PFE execution with the CPU. Our simulation results show that the proposed prefetching scheme can reduce up to 100% of memory stall time on a suite of pointer-intensive applications, reducing overall execution time by an average 19%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Klaiber, A.C., Levy, H.M.: An architecture for software-controlled data prefetching. In: Proceedings of the 18th Annual International Symposium on Computer Architecture. (1991) 43–53
Yang, C., Lebeck, A.R.: Push vs. pull: Data movement for linked data structures. In: Proceedings of the ACM International Conference on Supercomputing. (2000) 176–186
Collins, J.D., Wang, H., Tullsen, D.M., Christopher, H.J., Lee, Y.F., Lavery, D., Shen, J.P.: Speculative precomputation: Long-range prefetching of delinquent loads. In: Proceedings of the 28th Annual International Symposium on Computer Architecture. (2001) 14–25
Roth, A., Sohi, G.: Speculative data-driven multithreading. In: Proceedings of 7th Symposium High-Performance Computer Architecture. (2001) 134–143
Zilles, C.B., Sohi, G.: Execution-base prediction using speculative slices. In: Proceedings of the 28th Annual International Symposium on Computer Architecture. (2001) 2–13
Jouppi, N.P.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings of the 17th Annual International Symposium on Computer Architecture. (1990) 364–373
Baer, J.L., Chen, T.F.: An effective on-chip preloading scheme to reduce data access penalty. In: Proceedings of the 1991 Conference on SuperComputing. (1991) 176–186
Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). (1991) 40–52
Mowry, T.C., Lam, M.S., Gupta, A.: Design and evaluation of a compiler algorithm for prefetching. In: Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating System. (1992) 62–73
Joseph, D., Grunwald, D.: Prefetching using markov predictors. In: Proceedings of the 24th Annual International Symposium on Computer Architecture. (1997) 252–263
Alexander, T., Kedem, G.: Distributed predictive cache design for high performance memory system. In: Proceedings of the 2th International Symposium on High-Performance Computer Architecture. (1996)
Lipasti, M.H., Schmidt, W.J., Kunkel, S.R., Roediger, R.R.: Spaid: Software prefeteching in pointer-and call-intensive environments. In: Proceedings of the 28th Annual International Symposium on Microarchitecture. (1995)
Luk, C.K., Mowry, T.C.: Compiler based prefetching for recursive data structure. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII). (1996) 222–233
Zhang, Z., Torrellas, J.: Speeding up irregular applications in shared-memory multiprocessors: Memory binding and group prefetching. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture. (1995) 188–200
Chilimbi, T.M., Hill, M.D., Larus, J.R.: Cache-conscious struture layout. In: Proceedings of the SIGPLAN’ 99 Conference on Programming Language Design and Implementation. (1999) 1–12
Chilimbi, T.M., Davidson, B., Larus, J.R.: Cache-conscious struture definition. In: Proceedings of the SIGPLAN’ 99 Conference on Programming Language Design and Implementation. (1999) 13–24
Mehrotra, S., Harrison, L.: Examination of a memory access classification scheme for pointer-intensive and numeric program. In: Proceedings of the 10th International Conference on Supercomputing. (1996) 133–139
Roth, A., Moshovos, A., Sohi, G.: Dependence based prefetching for linked data structures. In: Proceedings of the Eigth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). (1998) 115–126
Roth, A., Sohi, G.: Effective jump-pointer prefetching for linked data structures. In: Proceedings of the 26th Annual International Symposium on Computer Architecture. (1999) 111–121
Karlsson, M., Dahlgren, F., Stenstrom, P.: A prefetching technique for irregular accesses to linked data structures. In: Proceedings of Sixth Symposium High-Performance Computer Architecture. (1999) 206–217
Patterson, D., Andreson, T., Cardwell, N., Fromm, R., Keaton, K., Kazyrakis, C., Thomas, R., Yellick, K.: A case for intelligentam. IEEE Micro (1997) 34–44
Kang, Y., Huang, W., Yoo, S.M., Keen, D., Ge, Z., Lam, V., Pattnaik, P., Torrellas, J.: Flexram: Toward an advanced intelligent memory system. In: Proceedings of the 1999 International Conference on Computer Design. (1999) 192–201
Oskin, M., Chong, F.T., Sherwood, T.: Active pages: a computation model for intelligent memory. In: Proceedings of the 25th Annual International Symposium on Computer Architecture. (1998) 192–203
Carter, J., Hsieh, W., Stoller, L., Swanson, M., Zhang, L.: Impulse: Building a smarter memory controller. In: Proceedings of 5th Symposium High-Performance Computer Architecture. (1999) 70–79
Hughes, C.J.: Prefetching linked data structures in systems with merged dramlogic, master thesis. Technical Report UIUCDCS-R-2001-2221, Department of Computer Science, University of Illinois at Urbana-Champaign (2000)
Annavaram, M.M., Patel, J.M., Davidson, E.S.: Data prefetching by dependence graph precomputation. In: Proceedings of the 28th Annual International Symposium on Computer Architecture. (2001) 52–61
Sundaramoorthy, K., Purser, Z., Rotenberg, E.: Slipstream processors: Improving both performance and fault tolerance. (2000) 257–268
Luk, C.K.: Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. In: Proceedings of the 28th Annual International Symposium on Computer Architecture. (2001) 40–51
Collins, J., Tullsen, D., Wang, H., Shen, J.: Dynamic speculative precomputation. In: Proceedings of the 34st Annual International Symposium on Microarchitecture. (2001)
Moshovos, A., Pnevmatikatos, D., Baniasadi, A.: Slice processors: An implementation of operation-based prediction. In: Proceedings of the ACM International Conference on Supercomputing. (2001) 321–334
Crago, S.P., Despain, A., Gaudiot, J., Makhija, M., Ro, W., Sricastava, A.: A high-performance, hierarchical decoupled architecture. In: Proceedings of the Memory Access Decoupling for SuperScalar and Multiple Issue Architecture Workship. (2000)
Yang, C.L.: The Push Architecture: a Prefetching Framework for Linked-Data Structure. PhD thesis, Department of Computer Science, Duke University (2001)
Smith, B.: Architecture and applications of the hep multiprocessor computer system. In: Proceedings of the Int. Soc. for Opt. Engr. (1982) 241–248
Burger, D.C., Austin, T.M., Bennett, S.: Evaluating future microprocessors-the simplescalar tool set. Technical Report 1308, Computer Sciences Department, University of Wisconsin-Madison (1996)
Kessler, R.E.: The alpha 21264 microprocessor. IEEE Micro (1999) 34–36
Lebeck, A., Wood, D.: Cache profiling and the spec benchmarks: A case study. In: IEEE Computer. (1994) 15–26
Roger, A., Carlisle, M., Reppy, J., Hendren, L.: Supporting dynamic data structures on distributed memory machines. ACM Transactions on Programming Languages and Sytems 17 (1995)
Kolb, C.: The rayshade user’s guide. (In: http://graphics.stanford.edu/-cek/-rayshade)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, CL., Lebeck, A. (2002). A Programmable Memory Hierarchy for Prefetching Linked Data Structures. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds) High Performance Computing. ISHPC 2002. Lecture Notes in Computer Science, vol 2327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47847-7_15
Download citation
DOI: https://doi.org/10.1007/3-540-47847-7_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43674-4
Online ISBN: 978-3-540-47847-8
eBook Packages: Springer Book Archive