skip to main content
research-article

The tao of parallelism in algorithms

Authors Info & Claims
Published:04 June 2011Publication History
Skip Abstract Section

Abstract

For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets.

To address the need for better abstractions, we introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties important for parallelization. It reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in algorithms, and that, depending on the tao-structure of the algorithm, this parallelism may be exploited by compile-time, inspector-executor or optimistic parallelization, thereby unifying these seemingly unrelated parallelization techniques. Regular algorithms emerge as a special case of irregular algorithms, and many application-specific optimization techniques can be generalized to a broader context.

These results suggest that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.

References

  1. A. Aho, R. Sethi, and J. Ullman. Compilers: principles, techniques, and tools. Addison Wesley, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An adaptive, generic parallel C++ library. In LCPC, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, 1994.Google ScholarGoogle Scholar
  4. Arvind and R.S.Nikhil. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. on Computers, 39(3), 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. Journal of Parallel and Distributed Computing, 66(11):1366--1378, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Barnes and P. Hut. A hierarchical o(n log n) force-calculation algorithm. Nature, 324(4), December 1986.Google ScholarGoogle Scholar
  7. D. K. Blandford, G. E. Blelloch, and C. Kadow. Engineering a compact parallel Delaunay algorithm in 3D. In Symposium on Computational Geometry, pages 292--300, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3), March 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. U. Brandes and T. Erlebach, editors. Network Analysis: Methodological Foundations. Springer-Verlag, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. P. Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Delaunay. Sur la sphere vide. Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk,, 7:793--800, 1934.Google ScholarGoogle Scholar
  15. J. Dennis. Dataflow ideas for supercomputers. In CompCon, 1984.Google ScholarGoogle Scholar
  16. E. Dijkstra. A Discipline of Programming. Prentice Hall, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Diniz and M. Rinard. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM TOPLAS, 19(6), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Ehrig and M. Löwe. Parallel and distributed derivations in the single-pushout approach. Theoretical Computer Science, 109:123--143, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Ghiya and L. Hendren. Is it a tree, a dag, or a cyclic graph? a shape analysis for heap-directed pointers in C. In POPL, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. R. Gilbert and R. Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Gregor and A. Lumsdaine. Lifting sequential graph algorithms for distributed-memory parallel computation. In OOPSLA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. J. Guibas, D. E. Knuth, and M. Sharir. Randomized incremental construction of delaunay and voronoi diagrams. Algorithmica, 7(1):381--413, December 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Harris and K. Fraser. Language support for lightweight transactions. In OOPSLA, pages 388--402, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. A. Hassaan, M. Burtscher, and K. Pingali. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms. In PPoPP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE TPDS, 1(1):35--47, January 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Herlihy and E. Koskinen. Transactional boosting: a methodology for highly-concurrent transactional objects. In PPoPP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Horwitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In PLDI, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Hudson, G. L. Miller, and T. Phillips. Sparse parallel Delaunay mesh refinement. In SPAA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. JaJa. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. R. Jefferson. Virtual time. ACM TOPLAS, 7(3), 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. JPDC, 48(1):96--129, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Kennedy and J. Allen, editors. Optimizing compilers for modern architectures. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Kjolstad and M. Snir. Ghost cell pattern. In Workshop on Parallel Programming Patterns, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Kramer, R. Gupta, and M. L. Soffa. The combining DAG: A technique for parallel data flow analysis. IEEE Transactions on Parallel and Distributed Systems, 5(8), August 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Cascaval. How much parallelism is there in irregular applications? In PPoPP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the commutativity lattice. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. Chew. Optimistic parallelism requires abstractions. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Lee and B. G. Ryder. A comprehensive approach to parallel data flow analysis. In Supercomputing, pages 236--247, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Lowe and H. Ehrig. Algebraic approach to graph transformation based on single pushout derivations. In Workshop on Graph-theoretic concepts in computer science, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput., 15, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Mackay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. T. Mattson, B. Sanders, and B. Massingill. Patterns for Parallel Programming. Addison-Wesley Publishers, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel Anderson-style points-to analysis. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Méndez-Lojo, D. Nguyen, D. Prountzos, X. Sui, M. A. Hassaan, M. Kulkarni, M. Burtscher, and K. Pingali. Structure-driven optimizations for amorphous data-parallel programs. In PPoPP, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. V. Menon, K. Pingali, and N. Mateev. Fractal symbolic analysis. ACM TOPLAS, March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39--65, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. D. Nguyen and K. Pingali. Synthesizing concurrent schedulers for irregular algorithms. In ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. D. Patterson, K. Keutzer, K. Asanovica, K. Yelick, and R. Bodik. Berkeley dwarfs. http://view.eecs.berkeley.edu/.Google ScholarGoogle Scholar
  52. M. Pharr, C. Kolb, R. Gershbein, and P. Hanrahan. Rendering complex scenes with memory-coherent ray tracing. In SIGGRAPH, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. K. Pingali and Arvind. Efficient demand-driven evaluation. part 1. ACM Trans. Program. Lang. Syst., 7, April 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. D. Prountzos, R. Manevich, K. Pingali, and K. McKinley. A shape analysis for optimizing parallel graph programs. In POPL, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. L. Rauchwerger and D. A. Padua. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. E. E. Santos, S. Feng, and J. M. Rickman. Efficient parallel algorithms for 2-dimensional Ising spin models. In IPDPS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programming with sets: An introduction to SETL. Springer-Verlag, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. L. Hennessy. Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. Journal Of Parallel and Distributed Computing, 27, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. P.-N. Tan, M. Steinbach, and V. Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. C. Verbrugge. A Parallel Solution Strategy for Irregular, Dynamic Problems. PhD thesis, McGill University, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. T. N. Vijaykumar, S. Gopal, J. E. Smith, and G. Sohi. Speculative versioning cache. IEEE Trans. Parallel Distrib. Syst., 12(12):1305-- 1317, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. U. Vishkin et al. Explicit multi-threading (xmt) bridging models for instruction parallelism. In SPAA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. N. Wirth. Algorithms + Data Structures = Programs. Prentice-Hall, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The tao of parallelism in algorithms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 46, Issue 6
          PLDI '11
          June 2011
          652 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1993316
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2011
            668 pages
            ISBN:9781450306638
            DOI:10.1145/1993498
            • General Chair:
            • Mary Hall,
            • Program Chair:
            • David Padua

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 June 2011

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader