Abstract
For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets.
To address the need for better abstractions, we introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties important for parallelization. It reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in algorithms, and that, depending on the tao-structure of the algorithm, this parallelism may be exploited by compile-time, inspector-executor or optimistic parallelization, thereby unifying these seemingly unrelated parallelization techniques. Regular algorithms emerge as a special case of irregular algorithms, and many application-specific optimization techniques can be generalized to a broader context.
These results suggest that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.
- A. Aho, R. Sethi, and J. Ullman. Compilers: principles, techniques, and tools. Addison Wesley, 1986. Google ScholarDigital Library
- P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An adaptive, generic parallel C++ library. In LCPC, 2003. Google ScholarDigital Library
- L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, 1994.Google Scholar
- Arvind and R.S.Nikhil. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. on Computers, 39(3), 1990. Google ScholarDigital Library
- D. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. Journal of Parallel and Distributed Computing, 66(11):1366--1378, 2006. Google ScholarDigital Library
- J. Barnes and P. Hut. A hierarchical o(n log n) force-calculation algorithm. Nature, 324(4), December 1986.Google Scholar
- D. K. Blandford, G. E. Blelloch, and C. Kadow. Engineering a compact parallel Delaunay algorithm in 3D. In Symposium on Computational Geometry, pages 292--300, 2006. Google ScholarDigital Library
- G. Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3), March 1996. Google ScholarDigital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995. Google ScholarDigital Library
- U. Brandes and T. Erlebach, editors. Network Analysis: Methodological Foundations. Springer-Verlag, 2005. Google ScholarDigital Library
- L. P. Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG, 1993. Google ScholarDigital Library
- T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarDigital Library
- B. Delaunay. Sur la sphere vide. Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk,, 7:793--800, 1934.Google Scholar
- J. Dennis. Dataflow ideas for supercomputers. In CompCon, 1984.Google Scholar
- E. Dijkstra. A Discipline of Programming. Prentice Hall, 1976. Google ScholarDigital Library
- P. Diniz and M. Rinard. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM TOPLAS, 19(6), 1997. Google ScholarDigital Library
- H. Ehrig and M. Löwe. Parallel and distributed derivations in the single-pushout approach. Theoretical Computer Science, 109:123--143, 1993. Google ScholarDigital Library
- R. Ghiya and L. Hendren. Is it a tree, a dag, or a cyclic graph? a shape analysis for heap-directed pointers in C. In POPL, 1996. Google ScholarDigital Library
- J. R. Gilbert and R. Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992. Google ScholarDigital Library
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google ScholarDigital Library
- D. Gregor and A. Lumsdaine. Lifting sequential graph algorithms for distributed-memory parallel computation. In OOPSLA, 2005. Google ScholarDigital Library
- L. J. Guibas, D. E. Knuth, and M. Sharir. Randomized incremental construction of delaunay and voronoi diagrams. Algorithmica, 7(1):381--413, December 1992.Google ScholarDigital Library
- B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In PLDI, 2007. Google ScholarDigital Library
- T. Harris and K. Fraser. Language support for lightweight transactions. In OOPSLA, pages 388--402, 2003. Google ScholarDigital Library
- M. A. Hassaan, M. Burtscher, and K. Pingali. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms. In PPoPP, 2011. Google ScholarDigital Library
- L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE TPDS, 1(1):35--47, January 1990. Google ScholarDigital Library
- M. Herlihy and E. Koskinen. Transactional boosting: a methodology for highly-concurrent transactional objects. In PPoPP, 2008. Google ScholarDigital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993. Google ScholarDigital Library
- S. Horwitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In PLDI, 1989. Google ScholarDigital Library
- B. Hudson, G. L. Miller, and T. Phillips. Sparse parallel Delaunay mesh refinement. In SPAA, 2007. Google ScholarDigital Library
- J. JaJa. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. Google ScholarDigital Library
- D. R. Jefferson. Virtual time. ACM TOPLAS, 7(3), 1985. Google ScholarDigital Library
- G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. JPDC, 48(1):96--129, 1998. Google ScholarDigital Library
- K. Kennedy and J. Allen, editors. Optimizing compilers for modern architectures. Morgan Kaufmann, 2001. Google ScholarDigital Library
- F. Kjolstad and M. Snir. Ghost cell pattern. In Workshop on Parallel Programming Patterns, 2010. Google ScholarDigital Library
- R. Kramer, R. Gupta, and M. L. Soffa. The combining DAG: A technique for parallel data flow analysis. IEEE Transactions on Parallel and Distributed Systems, 5(8), August 1994. Google ScholarDigital Library
- M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Cascaval. How much parallelism is there in irregular applications? In PPoPP, 2009. Google ScholarDigital Library
- M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the commutativity lattice. In PLDI, 2011. Google ScholarDigital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. Chew. Optimistic parallelism requires abstractions. In PLDI, 2007. Google ScholarDigital Library
- Y. Lee and B. G. Ryder. A comprehensive approach to parallel data flow analysis. In Supercomputing, pages 236--247, 1992. Google ScholarDigital Library
- M. Lowe and H. Ehrig. Algebraic approach to graph transformation based on single pushout derivations. In Workshop on Graph-theoretic concepts in computer science, 1991. Google ScholarDigital Library
- M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput., 15, 1986. Google ScholarDigital Library
- D. Mackay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. Google ScholarDigital Library
- T. Mattson, B. Sanders, and B. Massingill. Patterns for Parallel Programming. Addison-Wesley Publishers, 2004. Google ScholarDigital Library
- M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel Anderson-style points-to analysis. In OOPSLA, 2010. Google ScholarDigital Library
- M. Méndez-Lojo, D. Nguyen, D. Prountzos, X. Sui, M. A. Hassaan, M. Kulkarni, M. Burtscher, and K. Pingali. Structure-driven optimizations for amorphous data-parallel programs. In PPoPP, 2010. Google ScholarDigital Library
- V. Menon, K. Pingali, and N. Mateev. Fractal symbolic analysis. ACM TOPLAS, March 2003. Google ScholarDigital Library
- J. Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39--65, 1986. Google ScholarDigital Library
- D. Nguyen and K. Pingali. Synthesizing concurrent schedulers for irregular algorithms. In ASPLOS, 2011. Google ScholarDigital Library
- D. Patterson, K. Keutzer, K. Asanovica, K. Yelick, and R. Bodik. Berkeley dwarfs. http://view.eecs.berkeley.edu/.Google Scholar
- M. Pharr, C. Kolb, R. Gershbein, and P. Hanrahan. Rendering complex scenes with memory-coherent ray tracing. In SIGGRAPH, 1997. Google ScholarDigital Library
- K. Pingali and Arvind. Efficient demand-driven evaluation. part 1. ACM Trans. Program. Lang. Syst., 7, April 1985. Google ScholarDigital Library
- D. Prountzos, R. Manevich, K. Pingali, and K. McKinley. A shape analysis for optimizing parallel graph programs. In POPL, 2011. Google ScholarDigital Library
- L. Rauchwerger and D. A. Padua. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarDigital Library
- E. E. Santos, S. Feng, and J. M. Rickman. Efficient parallel algorithms for 2-dimensional Ising spin models. In IPDPS, 2002. Google ScholarDigital Library
- J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programming with sets: An introduction to SETL. Springer-Verlag, 1986. Google ScholarDigital Library
- J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. L. Hennessy. Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. Journal Of Parallel and Distributed Computing, 27, 1995. Google ScholarDigital Library
- P.-N. Tan, M. Steinbach, and V. Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005. Google ScholarDigital Library
- C. Verbrugge. A Parallel Solution Strategy for Irregular, Dynamic Problems. PhD thesis, McGill University, 2006. Google ScholarDigital Library
- T. N. Vijaykumar, S. Gopal, J. E. Smith, and G. Sohi. Speculative versioning cache. IEEE Trans. Parallel Distrib. Syst., 12(12):1305-- 1317, 2001. Google ScholarDigital Library
- U. Vishkin et al. Explicit multi-threading (xmt) bridging models for instruction parallelism. In SPAA, 1998. Google ScholarDigital Library
- N. Wirth. Algorithms + Data Structures = Programs. Prentice-Hall, 1976. Google ScholarDigital Library
- J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44, 1995. Google ScholarDigital Library
Index Terms
- The tao of parallelism in algorithms
Recommendations
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
PPoPP '11Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. ...
The tao of parallelism in algorithms
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationFor more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In ...
Optimistic parallelism requires abstractions
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and ImplementationIrregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Comments