skip to main content
10.1145/379539.379559acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Language support for Morton-order matrices

Authors Info & Claims
Published:18 June 2001Publication History

ABSTRACT

The uniform representation of 2-dimensional arrays serially in Morton order (or {\eee} order) supports both their iterative scan with cartesian indices and their divide-and-conquer manipulation as quaternary trees. This data structure is important because it relaxes serious problems of locality and latency, and the tree helps to schedule multi-processing. Results here show how it facilitates algorithms that avoid cache misses and page faults at all levels in hierarchical memory, independently of a specific runtime environment.

We have built a rudimentary C-to-C translator that implements matrices in Morton-order from source that presumes a row-major implementation. Early performance from LAPACK's reference implementation of \texttt{dgesv} (linear solver), and all its supporting routines (including \texttt{dgemm} matrix-multiplication) form a successful research demonstration. Its performance predicts improvements from new algebra in back-end optimizers.

We also present results from a more stylish \texttt{dgemm} algorithm that takes better advantage of this representation. With only routine back-end optimizations inserted by hand (unfolding the base case and passing arguments in registers), we achieve machine performance exceeding that of the manufacturer-crafted {\tt dgemm} running at 67% of peak flops. And the same code performs similarly on several machines.

Together, these results show how existing codes and future block-recursive algorithms can work well together on this matrix representation. Locality is key to future performance, and the new representation has a remarkable impact.

References

  1. 1.N. Ahmed and K. Pingali. Automatic generation of block-recursive codes. In A. Bode, T. Ludwig, W. Karl, and R. Wism. uller, editors, EURO-PAR 2000 Parallel Processing, volume 1900 of Lecture Notes in Computer Science, pages 368-378, Heidelberg, 2000. Springer. http://link.springer.de/link/service/series/0558/bibs/1900/19000368.htm]] Google ScholarGoogle Scholar
  2. 2.F. E. Allen, J. Cocke, and K. Kennedy. Reduction of operator strength. In S. W. Muchnick and N. D. Jones, editors, Program Flow Analysis: Theory and Applications, chapter 3.2, pages 79-101. Prentice-Hall, Englewood Cliffs, NJ, 1981.]]Google ScholarGoogle Scholar
  3. 3.J. Backus. The history of F ortran i, ii, and iii. In R. L. Wexelblat, editor, History of Programming Languages, pages 25-45. Academic Press, New York, 1981. Also preprinted in SIGPLAN Not., 13(8):166-180, Aug. 1978.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.J. Bilmes, K. Asanovic, C.-W. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a protable, high-performance, ansi c coding methodology. InProc. 1977 Intl. Conf. on Supercomputing, pages 340-347, New York, July 1997. ACM Press. http://www.acm.org/pubs/citations/proceedings/supercomputing/263580/p340-bilmes/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.R. M. Burstall and J. Darlington. A transformation system for developing recursive programs. J.ACM, 24(1):44-67, Jan. 1977. http://www.acm.org/pubs/citations/journals/jacm/1977-24-1/p44-burstall/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.F. W. Burton, V. J. Kollias, and J. G. Kollias. Real-time raster-to-quadtree and quadtree-to-raster conversion algorithms with modest storage requirements. Angew. Informatik, 4:170-174, 1986.]]Google ScholarGoogle Scholar
  7. 7.S. Chatterjee, V. V. Jain, A. R. Lebeck, S. Mundhra, and M. Thottethodi. Nonlinear array layouts for hierarchical memory systems. In Proc. 1999 Intl. Conf. on Supercomputing, pages 444-453, New York, June 1999. ACM Press. http://www.acm.org/pubs/citations/proceedings/supercomputing/305138/p444-chatterjee/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.S. Chatterjee, A. R. Lebeck, P. K. Patnala, and M. Thottenthodi. Recursive array layouts and fast parallel matrix multiplication. In Proc. 11th ACM Symp. Parallel Algorithms and Architectures, pages 222-231, New York, June 1999. ACM Press. http://www.acm.org/pubs/citations/proceedings/spaa/305619/p222-chatterjee/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.H. G. Cragon. A historical note on binary tree. SIGARCH Comput. Archit. News, 18(4):3, Dec. 1990.]]Google ScholarGoogle Scholar
  10. 10.D. Culler, R. Karp, D. Patterson, A. Sahay, K.E.Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: a practical model of parallel computation. Commun. ACM, 39(11):78-85, Nov. 1996. http://www.acm.org/pubs/citations/journals/cacm/1996-39-11/p78-culler/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.C. Ding and K. Kennedy. The memory bandwidth bottleneck and its amelioration by a compiler. In 14th International Parallel and Distributed Processing Symposium (IPDPS'00), Los Alamitos, CA, 2000. IEEE Computer Society Press. http://www.computer.org/proceedings/ipdps/0574/05740181abs.htm]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. An extended set of F ortran basic linear algebra subprograms. ACM Trans. Math. Softw., 14(1):1-17, Mar. 1988. http://www.acm.org/pubs/citations/journals/toms/1988-14-1/p1-dongarra/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.E. Elmroth and F. Gustavson. Applying recursion to serial and parallel QR factorization leads to better performance. IBM J. Res. Develop., 44(4):605-624, July 2000. http://www.research.ibm.com/journal/rd/444/elmroth.html]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.P. C. Fischer and R. L. Probert. Storage reorganization techniques for matrix computation in a paging environment. Commun. ACM, 22(7):405-415, July 1979. http://www.acm.org/pubs/citations/journals/cacm/1979-22-7/p405-fischer/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.C. W. Fraser and D. R. Hanson. A Retargetable C Compiler : Design and Implementation. Benjamin/Cummings, Redwood City, CA, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.J. D. Frens and D. S. Wise. Auto-blocking matrix multiplication, or tracking BLAS3 performance from source code. Proc. 6th ACM SIGPLAN Symp. on Principles and Practice of Parallel Program., SIGPLAN Not., 32(7):206-216, July 1997. http://www.acm.org/pubs/citations/proceedings/ppopp/263764/p206-frens/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Ann. Symp. Foundations of Computer Science, session 6B. IEEE Computer Society, Los Alamitos, CA, 1999. http://www.computer.org/proceedings/focs/0409/04090285abs.htm]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.I. Gargantini. An effective way to represent quadtrees. Commun. ACM, 25(12):905-910, Dec. 1982. http://www.acm.org/pubs/citations/journals/cacm/1982-25-12/p905-gargantini/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.P. W. Grant, J. A. Sharp, M. F. Webster, and X. Zhang. Experiences of parallising finite-element problemsin a functional style. Softw. Prac. Exper., 25(9):947-974, Sept. 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.F. G. Gustavson. Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM J. Res. Develop., 41(6):737-755, Nov. 1997. http://www.research.ibm.com/journal/rd/416/gustavson.html]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.E.-J. Im and K. Yelick. Optimizing sparse matrix vector multimplication on SMPs. In 9th SIAM Conf. on Parallel Processing for Scientific Computing, volume 98 of Proc. in Applied Mathematics, Mar. 1999. http://www.siam.org/catalog/mcc07/pr98.htm]]Google ScholarGoogle Scholar
  22. 22.D. E. Knuth. Fundamental Algorithms, volume 1 of The Art of Computer Programming. Addison-Wesley, Reading, MA, third edition, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. Proc. ACM SIGPLAN '97 Conf. on Program. Language Design and Implementation, SIGPLAN Not., 32(7):346-357, May 1997. http://www.acm.org/pubs/citations/proceedings/pldi/258915/p346-kodukula/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis. Power-aware page allocation. Proc. 9th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, SIGPLAN Not., 35(11):105-116, Nov. 2000. http://www.acm.org/pubs/citations/proceedings/asplos/356988/p105-lebeck/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.A. C. McKellar and E. G. Coffman, Jr. Organizing matrices and matrix operations for paged-memory systems. Commun. ACM, 12(3):153-165, Mar. 1969.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.G. M. Morton. A computer oriented geodetic data base and a new technique in file sequencing. Technical report, IBM Ltd., Ottawa, Ontario, Mar. 1, 1966.]]Google ScholarGoogle Scholar
  27. 27.G. Newman. Organizing arrays for paged memory systems. Commun. ACM, 38(7):93-103 + 108-110, July 1995. http://www.acm.org/pubs/citations/journals/cacm/1995-38-7/p93-newman/]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28.J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In Proc. 3rd ACM SIGACT-SIGMOD Symp. on Principles of Database Systems, pages 181-190, New York, 1984. ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.G. Peano. Sur une courbe, qui remplit toute une aire plaine. Math. Ann., 36:157-160, 1890.]]Google ScholarGoogle ScholarCross RefCross Ref
  30. 30.H. Samet. The Design and Analysis of Spatial Data Structures, section 2.7. Addison-Wesley, Reading, MA, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31.G. Schrack. Finding neighbors of equal size in linear quadtrees and octrees in constant time. CVGIP: Image Underst., 55(3):221-230, May 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.K. D. Tocher. The application of automatic computers to sampling experiments. J. Roy. Statist. Soc. Ser. B, 16(1):39-61, 1954. See pp. 53-55.]]Google ScholarGoogle Scholar
  33. 33.M. S. Warren. and J. K. Salmon. A parallel hashed oct-tree N-body problem. In Proc. Supercomputing '93, pages 12-21, Los Alamitos, CA, 1993. IEEE Computer Society Press.]] Google ScholarGoogle Scholar
  34. 34.R. C. Whaley and J. J. Dongarra. Automatically tuned linear algebra software. In Proc. Supercomputing '98, Los Alamitos, CA, 1998. IEEE Computer Society. http://www.supercomp.org/sc98/TechPapers/sc98 FullAbstracts/Whaley814/INDEX.HTM]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35.D. S. Wise. Ahnentafel indexing into Morton-ordered arrays, or matrix locality for free. In A. Bode, T. Ludwig, W. Karl, and R. Wism. uller, editors, Euro-Par 2000 - Parallel Processing, volume 1900 of Lecture Notes in Computer Science, pages 774-883, Heidelberg, 2000. Springer. http://link.springer.de/link/service/series/0558/bibs/1900/19000774.htm]] Google ScholarGoogle Scholar
  36. 36.D. S. Wise and J. D. Frens. Morton-order matrices deserve compilers' support. Technical Report 533, Computer Science Dept., Indiana University, Nov. 1999. http://www.cs.indiana.edu/ftp/techreports/TR533.html]]Google ScholarGoogle Scholar
  37. 37.Q.Yi, V. Adve, and K. Kennedy. Transforming loops to recursion for multi-level memory hierarchies. Proc. ACM SIGPLAN '00 Conf. on Program. Language Design and Implementation, SIGPLAN Not., 35(5):169-181, May 2000. http://www.acm.org/pubs/citations/proceedings/pldi/301618a/p169-yi/]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Language support for Morton-order matrices

                        Recommendations

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in
                        • Published in

                          cover image ACM Conferences
                          PPoPP '01: Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
                          June 2001
                          142 pages
                          ISBN:1581133464
                          DOI:10.1145/379539

                          Copyright © 2001 ACM

                          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                          Publisher

                          Association for Computing Machinery

                          New York, NY, United States

                          Publication History

                          • Published: 18 June 2001

                          Permissions

                          Request permissions about this article.

                          Request Permissions

                          Check for updates

                          Author Tags

                          Qualifiers

                          • Article

                          Acceptance Rates

                          Overall Acceptance Rate230of1,014submissions,23%

                        PDF Format

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader