research-article

The tao of parallelism in algorithms

Authors:
Keshav Pingali

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Donald Nguyen

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Milind Kulkarni

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

,
Martin Burtscher

Texas State University--San Marcos, San Marcos, TX, USA

Texas State University--San Marcos, San Marcos, TX, USA
View Profile

,
M. Amber Hassaan

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Rashid Kaleem

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Tsung-Hsien Lee

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Andrew Lenharth

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Roman Manevich

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Mario Méndez-Lojo

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Dimitrios Prountzos

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Xin Sui

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 46 Issue 6June 2011pp 12–25https://doi.org/10.1145/1993316.1993501

Published:04 June 2011Publication History

ACM SIGPLAN Notices

Abstract

For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets.

To address the need for better abstractions, we introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties important for parallelization. It reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in algorithms, and that, depending on the tao-structure of the algorithm, this parallelism may be exploited by compile-time, inspector-executor or optimistic parallelization, thereby unifying these seemingly unrelated parallelization techniques. Regular algorithms emerge as a special case of irregular algorithms, and many application-specific optimization techniques can be generalized to a broader context.

These results suggest that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.

References

A. Aho, R. Sethi, and J. Ullman. Compilers: principles, techniques, and tools. Addison Wesley, 1986. Google ScholarDigital Library
P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An adaptive, generic parallel C++ library. In LCPC, 2003. Google ScholarDigital Library
L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, 1994.Google Scholar
Arvind and R.S.Nikhil. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. on Computers, 39(3), 1990. Google ScholarDigital Library
D. Bader and G. Cong. Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs. Journal of Parallel and Distributed Computing, 66(11):1366--1378, 2006. Google ScholarDigital Library
J. Barnes and P. Hut. A hierarchical o(n log n) force-calculation algorithm. Nature, 324(4), December 1986.Google Scholar
D. K. Blandford, G. E. Blelloch, and C. Kadow. Engineering a compact parallel Delaunay algorithm in 3D. In Symposium on Computational Geometry, pages 292--300, 2006. Google ScholarDigital Library
G. Blelloch. Programming parallel algorithms. Communications of the ACM, 39(3), March 1996. Google ScholarDigital Library
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. SIGPLAN Not., 30(8):207--216, 1995. Google ScholarDigital Library
U. Brandes and T. Erlebach, editors. Network Analysis: Methodological Foundations. Springer-Verlag, 2005. Google ScholarDigital Library
L. P. Chew. Guaranteed-quality mesh generation for curved surfaces. In SCG, 1993. Google ScholarDigital Library
T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms. MIT Press, 2001. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarDigital Library
B. Delaunay. Sur la sphere vide. Izvestia Akademii Nauk SSSR, Otdelenie Matematicheskikh i Estestvennykh Nauk,, 7:793--800, 1934.Google Scholar
J. Dennis. Dataflow ideas for supercomputers. In CompCon, 1984.Google Scholar
E. Dijkstra. A Discipline of Programming. Prentice Hall, 1976. Google ScholarDigital Library
P. Diniz and M. Rinard. Commutativity analysis: a new analysis technique for parallelizing compilers. ACM TOPLAS, 19(6), 1997. Google ScholarDigital Library
H. Ehrig and M. Löwe. Parallel and distributed derivations in the single-pushout approach. Theoretical Computer Science, 109:123--143, 1993. Google ScholarDigital Library
R. Ghiya and L. Hendren. Is it a tree, a dag, or a cyclic graph? a shape analysis for heap-directed pointers in C. In POPL, 1996. Google ScholarDigital Library
J. R. Gilbert and R. Schreiber. Highly parallel sparse Cholesky factorization. SIAM Journal on Scientific and Statistical Computing, 13:1151--1172, 1992. Google ScholarDigital Library
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google ScholarDigital Library
D. Gregor and A. Lumsdaine. Lifting sequential graph algorithms for distributed-memory parallel computation. In OOPSLA, 2005. Google ScholarDigital Library
L. J. Guibas, D. E. Knuth, and M. Sharir. Randomized incremental construction of delaunay and voronoi diagrams. Algorithmica, 7(1):381--413, December 1992.Google ScholarDigital Library
B. Hardekopf and C. Lin. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In PLDI, 2007. Google ScholarDigital Library
T. Harris and K. Fraser. Language support for lightweight transactions. In OOPSLA, pages 388--402, 2003. Google ScholarDigital Library
M. A. Hassaan, M. Burtscher, and K. Pingali. Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms. In PPoPP, 2011. Google ScholarDigital Library
L. Hendren and A. Nicolau. Parallelizing programs with recursive data structures. IEEE TPDS, 1(1):35--47, January 1990. Google ScholarDigital Library
M. Herlihy and E. Koskinen. Transactional boosting: a methodology for highly-concurrent transactional objects. In PPoPP, 2008. Google ScholarDigital Library
M. Herlihy and J. E. B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, 1993. Google ScholarDigital Library
S. Horwitz, P. Pfieffer, and T. Reps. Dependence analysis for pointer variables. In PLDI, 1989. Google ScholarDigital Library
B. Hudson, G. L. Miller, and T. Phillips. Sparse parallel Delaunay mesh refinement. In SPAA, 2007. Google ScholarDigital Library
J. JaJa. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. Google ScholarDigital Library
D. R. Jefferson. Virtual time. ACM TOPLAS, 7(3), 1985. Google ScholarDigital Library
G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. JPDC, 48(1):96--129, 1998. Google ScholarDigital Library
K. Kennedy and J. Allen, editors. Optimizing compilers for modern architectures. Morgan Kaufmann, 2001. Google ScholarDigital Library
F. Kjolstad and M. Snir. Ghost cell pattern. In Workshop on Parallel Programming Patterns, 2010. Google ScholarDigital Library
R. Kramer, R. Gupta, and M. L. Soffa. The combining DAG: A technique for parallel data flow analysis. IEEE Transactions on Parallel and Distributed Systems, 5(8), August 1994. Google ScholarDigital Library
M. Kulkarni, M. Burtscher, R. Inkulu, K. Pingali, and C. Cascaval. How much parallelism is there in irregular applications? In PPoPP, 2009. Google ScholarDigital Library
M. Kulkarni, D. Nguyen, D. Prountzos, X. Sui, and K. Pingali. Exploiting the commutativity lattice. In PLDI, 2011. Google ScholarDigital Library
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. Chew. Optimistic parallelism requires abstractions. In PLDI, 2007. Google ScholarDigital Library
Y. Lee and B. G. Ryder. A comprehensive approach to parallel data flow analysis. In Supercomputing, pages 236--247, 1992. Google ScholarDigital Library
M. Lowe and H. Ehrig. Algebraic approach to graph transformation based on single pushout derivations. In Workshop on Graph-theoretic concepts in computer science, 1991. Google ScholarDigital Library
M. Luby. A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput., 15, 1986. Google ScholarDigital Library
D. Mackay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003. Google ScholarDigital Library
T. Mattson, B. Sanders, and B. Massingill. Patterns for Parallel Programming. Addison-Wesley Publishers, 2004. Google ScholarDigital Library
M. Méndez-Lojo, A. Mathew, and K. Pingali. Parallel Anderson-style points-to analysis. In OOPSLA, 2010. Google ScholarDigital Library
M. Méndez-Lojo, D. Nguyen, D. Prountzos, X. Sui, M. A. Hassaan, M. Kulkarni, M. Burtscher, and K. Pingali. Structure-driven optimizations for amorphous data-parallel programs. In PPoPP, 2010. Google ScholarDigital Library
V. Menon, K. Pingali, and N. Mateev. Fractal symbolic analysis. ACM TOPLAS, March 2003. Google ScholarDigital Library
J. Misra. Distributed discrete-event simulation. ACM Comput. Surv., 18(1):39--65, 1986. Google ScholarDigital Library
D. Nguyen and K. Pingali. Synthesizing concurrent schedulers for irregular algorithms. In ASPLOS, 2011. Google ScholarDigital Library
D. Patterson, K. Keutzer, K. Asanovica, K. Yelick, and R. Bodik. Berkeley dwarfs. http://view.eecs.berkeley.edu/.Google Scholar
M. Pharr, C. Kolb, R. Gershbein, and P. Hanrahan. Rendering complex scenes with memory-coherent ray tracing. In SIGGRAPH, 1997. Google ScholarDigital Library
K. Pingali and Arvind. Efficient demand-driven evaluation. part 1. ACM Trans. Program. Lang. Syst., 7, April 1985. Google ScholarDigital Library
D. Prountzos, R. Manevich, K. Pingali, and K. McKinley. A shape analysis for optimizing parallel graph programs. In POPL, 2011. Google ScholarDigital Library
L. Rauchwerger and D. A. Padua. The LRPD test: Speculative runtime parallelization of loops with privatization and reduction parallelization. IEEE Trans. Parallel Distrib. Syst., 10(2):160--180, 1999. Google ScholarDigital Library
E. E. Santos, S. Feng, and J. M. Rickman. Efficient parallel algorithms for 2-dimensional Ising spin models. In IPDPS, 2002. Google ScholarDigital Library
J. T. Schwartz, R. B. K. Dewar, E. Dubinsky, and E. Schonberg. Programming with sets: An introduction to SETL. Springer-Verlag, 1986. Google ScholarDigital Library
J. P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. L. Hennessy. Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. Journal Of Parallel and Distributed Computing, 27, 1995. Google ScholarDigital Library
P.-N. Tan, M. Steinbach, and V. Kumar, editors. Introduction to Data Mining. Pearson Addison Wesley, 2005. Google ScholarDigital Library
C. Verbrugge. A Parallel Solution Strategy for Irregular, Dynamic Problems. PhD thesis, McGill University, 2006. Google ScholarDigital Library
T. N. Vijaykumar, S. Gopal, J. E. Smith, and G. Sohi. Speculative versioning cache. IEEE Trans. Parallel Distrib. Syst., 12(12):1305-- 1317, 2001. Google ScholarDigital Library
U. Vishkin et al. Explicit multi-threading (xmt) bridging models for instruction parallelism. In SPAA, 1998. Google ScholarDigital Library
N. Wirth. Algorithms + Data Structures = Programs. Prentice-Hall, 1976. Google ScholarDigital Library
J. Wu, R. Das, J. Saltz, H. Berryman, and S. Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transactions on Computers, 44, 1995. Google ScholarDigital Library

Index Terms

The tao of parallelism in algorithms
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Frameworks
      2. Language types
        Parallel programming languages

Recommendations

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
PPoPP '11

Outside of computational science, most problems are formulated in terms of irregular data structures such as graphs, trees and sets. Unfortunately, we understand relatively little about the structure of parallelism and locality in irregular algorithms. ...
Read More
The tao of parallelism in algorithms
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In ...
Read More
Optimistic parallelism requires abstractions
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation

Irregular applications, which manipulate large, pointer-based data structures like graphs, are difficult to parallelize manually. Automatic tools and techniques such as restructuring compilers and run-time speculative execution have failed to uncover ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Check for updates
Author Tags
amorphous data-parallelism
galois system
irregular programs
operator formulation
tao-analysis
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 355
  Total Citations
  View Citations
- 2,162
  Total Downloads
- Downloads (Last 12 months)193
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The tao of parallelism in algorithms

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

The tao of parallelism in algorithms

Optimistic parallelism requires abstractions