Faster output-sensitive parallel algorithms for 3D convex hulls and vector maxima

doi:10.1016/S0743-7315(03)00035-2

Journal of Parallel and Distributed Computing

Volume 63, Issue 4, April 2003, Pages 488-500

https://doi.org/10.1016/S0743-7315(03)00035-2 Get rights and content

Abstract

In this paper we focus on the problem of designing very fast parallel algorithms for the convex hull and the vector maxima problems in three dimensions that are output-size sensitive. Our algorithms achieve $O(log log^{2} n log h)$ parallel time and optimal $O(n log h)$ work with high probability in the CRCW PRAM where n and h are the input and output size, respectively. These bounds are independent of the input distribution and are faster than the previously known algorithms. We also present an optimal speed-up (with respect to the input size only) sublogarithmic time algorithm that uses superlinear number of processors for vector maxima in three dimensions.

Introduction

The study of algorithms is mainly concerned with developing algorithms for well-defined problems that are close to the best-possible (for the problem at hand). The most common measure of the performance of an algorithm is related to the running time of the algorithm. For any non-trivial problem, the running time of an algorithm increases monotonically with the size of the input. Hence the efficiency of an algorithm is usually measured as a function of the input-size. However, for many geometric problems, additional parameters like the size of the output capture the complexity of the problem more accurately enabling us to design superior algorithms. Algorithms whose running times are sensitive to the output-size have been actively pursued for problems like convex hulls [Cha95], [CM92], [CS89], [Mat92], [Sei86].

The primary objective of designing parallel algorithms is to obtain very fast running time while keeping the total work (the processor-time product) close to the best sequential algorithms. Problems for which output-size sensitive algorithms are known, it becomes especially important to have similar parallel algorithms. Unless we design parallel algorithms that are output-sensitive, the corresponding (output-sensitive) sequential may be faster for many instances. (We will often use the shorter term output-sensitive instead of output-size sensitive.)

The task of designing output-size sensitive algorithms on parallel machines is even more challenging than their sequential counterparts. Not only is the output-size an unknown parameter, we also have to rapidly eliminate input points that do not contribute to the final output without incurring a high cost. The divide-and-conquer method is not directly effective as it cannot divide the output evenly—in fact herein lies the crux of the difficulty in designing ‘fast’ output-sensitive algorithms. By ‘fast’ we imply $O(log h)$ or something very close, where h is the output size. The sequential output-size sensitive algorithms often use techniques like gift wrapping or incremental construction which are inherently sequential.

The parallel random access machine (PRAM) has been the most popular model for designing algorithms because of its close relationship with the sequential models. For example if S(n) is the best-known sequential time complexity for the input size n then we aim for a parallel algorithm with P(n) processors and T(n) running time so as to minimize T(n) subject to keeping the product P(n)·T(n) close to O(S(n)). A parallel algorithm that actually does total work O(S(n)) is called a work optimal algorithm. Simultaneously, if one can also match the time lower bound then the algorithm is the best possible (theoretically).

The fastest possible time bound clearly depends on the parallel machine model. For example, in the case of concurrent read exclusive write (CREW) PRAM model, the convex hull cannot be constructed faster than $Ω(log n)$ time, because of an $Ω(log n)$ bound for computing the maximum (minimum). Note that this bound is independent of the output-size. However, this bound is not applicable to the CRCW model. For parallel algebraic decision tree model, Sen [Sen97] has obtained a trade-off between the number of processors and possible speed-up for a wide range of problems in computational geometry. For convex hulls, the following is known.

Lemma 1.1 Sen [Sen97]

Any randomized algorithm in the parallel bounded degree decision tree model for constructing convex hull of n points and output-size h, has a parallel time-bound of $Ω(log h/ log k)$ using kn processors, k>1, in the worst case.

In other words, for super-linear number of processors, a proportional speed-up is not achievable and hence these parallel algorithms cannot be considered efficient. The best or the ultimate that one can hope for under the circumstances is an algorithm that achieves

O(log h)

time using n processors.

Our algorithms are designed for the CRCW PRAM model. Although we always work in a Euclidean space $E^{d}$ we are mainly interested in the combinatorial complexity of the algorithm. We pretend that a single memory location can hold a real number and a processor can perform an arithmetic operation involving real numbers in a single step. In reality there may be numerical errors due to finite word lengths that we ignore. This is a non-trivial issue and deserves a separate treatment. This model is consistent with the model that is used for sequential computational geometry, called ‘Real’-RAM. For our randomized algorithms, we further assume that processors have access to $O(log n)$ random bits in constant time.

Convex polygons are very important objects in computational geometry, and in a large number of cases give rise to very efficient algorithms because of their nice property, namely convexity.

Definition 1.1

Given a set S={p₁,p₂,…,p_n} of n points in $E^{d}$ (the Euclidean d-dimensional space), the convex hull of S is the smallest convex polytope containing all the points of S. The convex hull problem is to determine the ordered list CH(S) of the points of S defining the boundary of the convex hull of S.

In two dimensions, the ‘ordering’ in the output is simply the clockwise (or counter-clockwise) ordering of the vertices of the hull. In three dimensions it can be the cyclic ordering of the hull edges around each hull vertex.

The problem of constructing convex hulls has attracted a great deal of attention from the inception of computational geometry. In three dimensions, Preparata and Hong [PH77], described the first $O(n log n)$ time algorithm. Clarkson and Shor [CS89] presented the first (randomized) optimal output-sensitive algorithm which ran in $O(n log h)$ expected time, where h is the number of hull vertices. Their algorithm was subsequently derandomized optimally by Chazelle and Matoušek [CM92]. Chan [Cha95] presented a very elegant approach for output-sensitive construction of convex hulls using ray shooting that achieve optimal $Θ(n log h)$ running times for dimensions two and three. In higher dimensions, the quest is still on to design optimal output-sensitive algorithms. Recently, Chan [Cha95] designed an output-sensitive algorithm for convex hulls in d dimensions which runs in $O(n log^{O(1)} h+h^{⌊d/2⌋})$ time. Seidel [Sei86] computes F faces of the convex hull of n points in a fixed dimension d in $O(n^{2} +F log n)$ time. This can be slightly improved to $O(n^{2−(2/(⌊d/2⌋+1))+ε} +F log n)$ , for any fixed ε>0, using a technique of Matoušek [Mat92]. In a recent paper, Amato and Ramos [AR96] have shown that the convex hull of n points in $R^{d+1}, 2⩽d⩽4$ , can be computed in $O((n+F) log^{d−1} F)$ time.

In the context of parallel algorithms, for convex hulls in three dimensions (3-D hulls) Chow [Cho80] described an $O(log^{3} n)$ time algorithm using n CREW processors. An $O(log^{2} n log^{∗} n)$ time algorithm using n processors was obtained by Dadoun and Kirkpatrick [DK89] and an $O(log^{2} n)$ time algorithm was designed by Amato and Preparata [AP92]. Reif and Sen [RS92] presented an $O(log n)$ time and $O(n log n)$ work randomized algorithm which was the first-known worst-case optimal algorithm for three-dimensional hulls in the CREW model. Their algorithm deals with the dual equivalent of the convex hull problem namely the intersection of half-spaces in three dimensions. The algorithm works on a divide-and-conquer approach. It works in $O(log n)$ phases pruning away the redundant half-spaces and keeping the work to linear in each phase. This algorithm was derandomized by Goodrich [Goo93] who obtained an $O(log^{2} n)$ time, $O(n log n)$ work method for the EREW PRAM. In the context of output-size sensitive algorithm, one faces the problem of dividing the output points evenly so that one can finish in $O(log h)$ phases, keeping the work linear in each phase, especially when the output-size is not known. In [GS97], the authors have shown that the convex hulls in two dimensions can be constructed in $O(log log n log h)$ time with optimal work with high probability. In three dimensions, Goodrich and Ghouse [GG91] described an $O(log^{2} n)$ expected time, $O(min {n log^{2} h,n log n})$ work method, which is output-sensitive but not work-optimal. More recently, Amato et al. [AGR94] gave a deterministic $O(log^{3} n)$ time, $O(n log h)$ work algorithm for convex hulls in $R^{3}$ on the EREW PRAM.

In higher dimensions, Amato et al. [AGR94] have shown that the convex hull of n points in $R^{d}$ can be constructed in $O(log n)$ time with $O(n log n+n^{⌊d/2⌋})$ work with high probability and in $O(log n)$ time with $O(n^{⌊d/2⌋} log^{c(⌈d/2⌉−⌊d/2⌋)} n)$ work deterministically, where c>0 is a constant.

In this paper, we present a randomized algorithm for the convex hulls in three dimensions that is faster for all output-sizes while maintaining the optimal output-sensitive work bound. The algorithm exploits an observation of Clarkson and Shor [CS89], namely, that of iterative pruning of non-extreme points. We also make use of a number of sophisticated techniques like bootstrapping and super-linear processors parallel algorithms for convex hulls [Sen97] combined with a very fine-tuned analysis.

Let T be a set of vectors in $R^{d}$ . The partial order ⩽_M is defined for two vectors x=(x₁,x₂,…,x_d) and y=(y₁,y₂,…,y_d) as follows: x⩽_My (x is dominated by y) if and only if x_i⩽y_i for all 1⩽i⩽d.

Definition 1.2

A vector v∈T is said to be maximal if it is not dominated by any other vector w∈T. The problem of maximal vectors is to determine all maximal vectors in a given set of input vectors.

Relatively little work has been done in the context of parallel algorithms for the vector maxima problem. Efficient sequential algorithms are known for two and three dimensions (the $O(n log n)$ —time algorithm is worst case optimal). However, for $h∈o(log n)$ (h is the number of vertices on the hull) a better (O(nh) time) algorithm can be easily obtained in two dimensions by finding the point with maximum y coordinate, deleting all the points dominated by it and repeating on the reduced problem. This algorithm takes linear time for constant h. Kirkpatrick and Seidel presented an $O(n log h)$ time algorithm and showed that the bound is tight with respect to the input and the output sizes [KS86]. In the context of parallel algorithms for two dimensions, an $O(log n)$ time optimal algorithm can be obtained easily by using any of the $(n, log n)$ algorithm for sorting followed by a straight forward divide-and-conquer. The best-known result for three-dimensional maxima is $O(log n)$ time and $O(n log n)$ operations due to Atallah et al. [ACG89] in which they have used parallel merging techniques. Using the techniques of [GS97] an $O(log log n log h)$ time, optimal work algorithm can be obtained for vector maxima in two dimensions also. We present an algorithm for vector maxima in three dimensions.

Our algorithms are randomized and always provide correct output and the bounds (running time) hold with high probability independent of the input distribution. Such algorithms are often referred to as Las Vegas algorithms. The term high probability implies probability exceeding 1−1/n^c for any predetermined constant c where n is the input-size. We will use the notation $O ̃$ instead of O to denote that the bound holds with high probability.

Our algorithms achieve $O(log log^{2} n log h)$ time with optimal $O(n log h)$ work with high probability. This is one of the first non-trivial applications of super-linear processor algorithms in computational geometry to obtain speed-up for a situation where initially there is no processor advantage. Our work establishes a close connection between fast output-sensitive parallel algorithms and super-linear processor algorithms. Consequently, our algorithms become increasingly faster than the previous algorithms as the output size decreases. We are not aware of any previous work where the parallel algorithms speed-up optimally with the output size in the sublogarithmic time domain.

We also present an optimal speed-up (with respect to the input size only) sublogarithmic time algorithm that uses super-linear number of processors for vector maxima in three dimensions.

Section snippets

Some known useful results

Definition 2.1

For all $n,m∈ N$ and λ⩾1, the m-color semi-sorting problem of size n and with slack λ is defined as follows: Given n integers (colors) x₁,…,x_n in the range 0,…m, compute n non-negative integers y₁,…,y_n (the placement of x_i) such that:

(1)
All the x_i of the same color are placed in contiguous locations (not necessarily consecutive).
(2)
max{y_j:1⩽j⩽n}=O(λn).

Definition 2.2

For all $n∈ N$ interval allocation problem is defined as follows: Given n non-negative integers x₁,…,x_n, compute n non-negative integers y₁,…,y_n (the

Brief overview of the algorithm for convex hulls

The problem of constructing the convex hull of points in three dimensions is well known to be equivalent to the problem of finding the intersection of half-spaces. Here we give an algorithm for the latter which implies a solution for the former.

Let us denote the input set of half-spaces by S and their intersection by P(S). We construct the intersection P(R) of a random sample R of r half-spaces and filter out the redundant half-spaces, i.e. the half-spaces which do not contribute to P(S).

Random sampling lemma

Let H(R) denote the set of cones induced by a sample R and let $H^{∗} (R)$ denote the set of critical cones. We will denote the set of half-spaces intersecting a cone △∈H(R) by L(△) and its cardinality |L(△)| by l(△). L(△) will also be referred to as the conflict list of △ and l(△), its conflict size. We will use the following results related to bounding the size of the reduced problem.

Lemma 4.1

Clarkson and Shor [CS89], Rajasekaran and Sen [RS93]

For some suitable constant k and large n, $Pr ∑ △∈H(R) l(△)⩾kn ⩽1/4,$ where the probability is taken over all possible

Algorithm

We give below a general algorithm for both the convex hull and the vector maxima problems. In the later sections we describe how each step is carried out in the context of a specific problem and then give a combined analysis.

Let S be the input set of n objects. Objects are half-spaces for the intersection of half-spaces or the convex hull and they are vectors or points for the vector maxima problem. The algorithm is iterative. Let n_i (respectively r_i) denote the size of the problem

Convex hulls

In order to get a fast algorithm we must be able to determine the intersections of the half-spaces with the cones and determine the critical cones quickly.

Vector maxima

Let S be the set of input vectors. We pick up a random sample R of the input vectors and compute its maxima and define regions. We say that a vector conflicts with a region if it can potentially dominate the vectors in that region. Determine the critical regions, i.e. the regions containing an output vector, delete the vectors not conflicting with any critical region and iterate on the reduced problem. As in the case of convex hulls, in order to get a fast algorithm we must be able to detect

Analysis

Assume that h=O(n^δ) for some δ between 0 and 1, for otherwise the problem can be solved in $O(∑ log r_{i})=O(log n)=O(log h)$ time with $O(n log h)$ work.

Since r_i=O(n^ε),ε<1 therefore Step 2 can be done in constant time by a brute-force method followed by sorting with r_i² processors. Step 3(a)i (finding conflicts) can be done in $O(log r_{i} / log p n_{i})$ time using p processors by Lemma 6.2 for convex hulls and by Lemma 7.1 for vector maxima. An application of semi-sorting then gives us the set of input objects

Sublogarithmic algorithm for 3d maxima

In this section we present a sublogarithmic time algorithm that achieves optimal speed-up (with respect to the input-size only) and uses super-linear number of processors for the vector maxima problem in three dimensions.

References (25)

N. Dadoun et al.
Parallel construction of subdivision hierarchies
J. Comput. System Sci.
(1989)
N. Gupta et al.
Optimal, output-sensitive algorithms for constructing planar hulls in parallel
Comput. Geom.: Theory Appl.
(1997)
S. Sen
Lower bounds for algebraic decision trees, parallel complexity of convex hulls and related problems
Theoret. Comput. Sci.
(1997)
N.M. Amato, M.T. Goodrich, E.A. Ramos, Parallel algorithms for higher-dimensional convex hulls, Proceedings of the 35th...
N.M. Amato et al.
The parallel 3d convex hull problem revisited
Internat. J. Comput. Geom. Appl.
(1992)
N.M. Amato, E.M. Ramos, On computing vornoi diagrams by divide-prune-and-conquer, ACM Symposium on Computational...
M.J. Atallah et al.
Cascading divide-and-conquera technique for designing parallel algorithms
SIAM J. Comput.
(1989)
H. Bast, T. Hagerup, Fast parallel space allocation, estimation and integer sorting, Technical Report, MPI-I-93-123,...
R. Brent
Parallel evaluation of general arithmetic expressions
J. ACM
(1974)
T.M. Chan, Output-sensitive results on convex hulls, extreme points and related problems, ACM Symposium on...

B. Chazelle, J. Matoušek, Derandomizing an output-sensitive convex-hull algorithm in three dimensions, Tech. Report,...

A. Chow, Parallel algorithms for geometric problems, Ph.D. Thesis, University of Illinois, Urbana-Champaign,...

Cited by (12)

Space subdivision to speed-up convex hull construction in E3
2016, Advances in Engineering Software
Convex hulls are fundamental geometric tools used in a number of algorithms. This paper presents a fast, simple to implement and robust Smart Convex Hull (S-CH) algorithm for computing the convex hull of a set of points in E³. This algorithm is based on “spherical” space subdivision. The main idea of the S-CH algorithm is to eliminate as many input points as possible before the convex hull construction. The experimental results show that only a very small number of points are used for the final convex hull calculation. Experiments made also proved that the proposed S-CH algorithm achieves a better time complexity in comparison with other algorithms in E³.
GPU accelerated convex hull computation
2012, Computers and Graphics (Pergamon)
Citation Excerpt :
Although their implementations [3] can be quite efficient and robust, they are unable to exploit the computation capability of multi-core/many-core processors. Some prior GPU-based algorithms [9,23,14,15] are either limited to lower dimensions (2D or 3D) or may not offer high speedups. Recently, Tzeng and Owens [24,25] presented a parallel convex hull algorithm based on a generalized framework for recursive divide-and-conquer on GPUs.
We present a hybrid algorithm to compute the convex hull of points in three or higher dimensional spaces. Our formulation uses a GPU-based interior point filter to cull away many of the points that do not lie on the boundary. The convex hull of remaining points is computed on a CPU. The GPU-based filter proceeds in an incremental manner and computes a pseudo-hull that is contained inside the convex hull of the original points. The pseudo-hull computation involves only localized operations and maps well to GPU architectures. Furthermore, the underlying approach extends to high dimensional point sets and deforming points. In practice, our culling filter can reduce the number of candidate points by two orders of magnitude. We have implemented the hybrid algorithm on commodity GPUs, and evaluated its performance on several large point sets. In practice, the GPU-based filtering algorithm can cull up to 85 M interior points per second on an NVIDIA GeForce GTX 580 and the hybrid algorithm improves the overall performance of convex hull computation by 10–27 times (for static point sets) and 22–46 times (for deforming point sets).
CudaHull: Fast parallel 3D convex hull on the GPU
2012, Computers and Graphics (Pergamon)
In this paper, we present a novel parallel algorithm for computing the convex hull of a set of points in 3D using the CUDA programming model. It is based on the QuickHull approach and starts by constructing an initial tetrahedron using four extreme points, discards the internal points, and distributes the external points to the four faces. It then proceeds iteratively. In each iteration, it refines the faces of the polyhedron, discards the internal points, and redistributes the remaining points for each face among its children faces. The refinement of a face is performed by selecting the furthest point from its associated points and generating three children triangles. In each iteration, concave edges are swapped, and concave vertices are removed to maintain convexity. The face refinement procedure is performed on the CPU, because it requires a very small fraction of the execution time (approximately 1%), and the intensive point redistribution is performed in parallel on the GPU. Our implementation outpaced the CPU-based Qhull implementation by 30 times for 10 million points and 40 times for 20 million points.
Cache-Oblivious Parallel Convex Hull in the Binary Forking Model
2023, arXiv
Randomized Incremental Convex Hull is Highly Parallel
2020, Annual ACM Symposium on Parallelism in Algorithms and Architectures
Diameter and Convex Hull of Points Using Space Subdivision in E2 and E3
2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

^☆: Some of the results of this paper appeared in a preliminary version [GS96] in the Twelth Annual Symposium on Computational Geometry, 1996, Philadelphia, USA.

View full text

Article preview

Journal of Parallel and Distributed Computing

Abstract

Introduction

Section snippets

Some known useful results

Brief overview of the algorithm for convex hulls

Random sampling lemma

Clarkson and Shor [CS89], Rajasekaran and Sen [RS93]

Algorithm

Convex hulls

Vector maxima

Analysis

Sublogarithmic algorithm for 3d maxima

References (25)

Parallel construction of subdivision hierarchies

J. Comput. System Sci.

Optimal, output-sensitive algorithms for constructing planar hulls in parallel

Comput. Geom.: Theory Appl.

Lower bounds for algebraic decision trees, parallel complexity of convex hulls and related problems

Theoret. Comput. Sci.

The parallel 3d convex hull problem revisited

Internat. J. Comput. Geom. Appl.

Cascading divide-and-conquera technique for designing parallel algorithms

SIAM J. Comput.

Parallel evaluation of general arithmetic expressions

J. ACM

Cited by (12)

Space subdivision to speed-up convex hull construction in E<sup>3</sup>

GPU accelerated convex hull computation

CudaHull: Fast parallel 3D convex hull on the GPU

Cache-Oblivious Parallel Convex Hull in the Binary Forking Model

Randomized Incremental Convex Hull is Highly Parallel

Diameter and Convex Hull of Points Using Space Subdivision in E<sup>2</sup> and E<sup>3</sup>

Faster output-sensitive parallel algorithms for 3D convex hulls and vector maxima☆

Abstract

Introduction

Section snippets

Some known useful results

Brief overview of the algorithm for convex hulls

Random sampling lemma

Clarkson and Shor [CS89], Rajasekaran and Sen [RS93]

Algorithm

Convex hulls

Vector maxima

Analysis

Sublogarithmic algorithm for 3d maxima

J. Comput. System Sci.

Comput. Geom.: Theory Appl.

Theoret. Comput. Sci.

The parallel 3d convex hull problem revisited

Internat. J. Comput. Geom. Appl.

Cascading divide-and-conquera technique for designing parallel algorithms

SIAM J. Comput.

Parallel evaluation of general arithmetic expressions

J. ACM