IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Special Section on Foundations of Computer Science — New Trends in Theory of Computation and Algorithm —
Asynchronous Memory Machine Models with Barrier Synchronization
Koji NAKANO
Author information
JOURNAL FREE ACCESS

2014 Volume E97.D Issue 3 Pages 431-441

Details
Abstract

The Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM) are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. It is assumed that warps (or groups of threads) on the DMM and the UMM work synchronously in a round-robin manner. However, warps work asynchronously in real GPUs, in the sense that they are randomly (or arbitrarily) dispatched for execution. The first contribution of this paper is to introduce asynchronous versions of these models in which warps are arbitrarily dispatched. In addition, we assume that threads can execute the “syncthreads” instruction for barrier synchronization. Since the barrier synchronization operation may be costly, we should evaluate and minimize the number of barrier synchronization operations executed by parallel algorithms. The second contribution of this paper is to show a parallel algorithm to the sum of n numbers in optimal computing time and few barrier synchronization steps. Our parallel algorithm computes the sum of n numbers in O(n/w+llog n) time units and O(log l/log w+log log w) barrier synchronization steps using wl threads on the asynchronous UMM with width w and latency l. Since the computation of the sum takes at least Ω(n/w+llog n) time units, this algorithm is time optimal. Finally, we show that the prefix-sums of n numbers can also be computed in O(n/w+llog n) time units and O(log l/log w+log log w) barrier synchronization steps using wl threads.

Content from these authors
© 2014 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top