ABSTRACT
(MATH) We consider the problem of approximating the distance of two d-dimensional vectors x and y in the data stream model. In this model, the 2d coordinates are presented as a "stream" of data in some arbitrary order, where each data item includes the index and value of some coordinate and a bit that identifies the vector (x or y) to which it belongs. The goal is to minimize the amount of memory needed to approximate the distance. For the case of Lp-distance with p ε [1,2], there are good approximation algorithms that run in polylogarithmic space in d (here we assume that each coordinate is an integer with O(log d) bits). Here we prove that they do not exist for p<2. In particular, we prove an optimal approximation-space tradeoff of approximating L∞ distance of two vectors. We show that any randomized algorithm that approximates L∞ distance of two length d vectors within factor of dδ requires ω(d1—4δ) space. As a consequence we show that for p<2/(1—4δ), any randomized algorithm that approximate Lp distance of two length d vectors within a factor dδ requires ω(d 1— 2< \over p—4δ) space.The lower bound follows from a lower bound on the two-party one-round communication complexity of this problem. This lower bound is proved using a combination of information theory and Fourier analysis.
- Noga Alon, Yossi Matias, and Mario Szegedy. The Space Complexity of Approximating the Frequency Moments. STOC 1996 20--29.]] Google ScholarDigital Library
- L. Babai, Peter Frankl, and Janos Simon. Complexity classes in communication complexity theory (preliminary version). FOCS 1986 337--347.]]Google Scholar
- Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. Information Theory Methods in Communication Complexity. to appear in Proceedings of IEEE Conference on Computational Complexity 2002.]] Google ScholarDigital Library
- Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar. Personal communication.]]Google Scholar
- T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., 1991.]] Google ScholarDigital Library
- Joan Feigenbaum, Sampath Kannan, Martin Strauss, and Mahesh Viswanathan. An Approximate L1-Difference Algorithm for Massive Data Streams. FOCS 1999 501--511.]] Google ScholarDigital Library
- Jessica H. Fong and Martin Strauss. An Approximate Lp-Difference Algorithm for Massive Data Streams. STACS 2000 193--204.]] Google ScholarDigital Library
- Piotr Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. FOCS 2000 189--197.]] Google ScholarDigital Library
- Eyal Kushilevitz and Noam Nisan. Communication complexity. Cambridge University Press, 1997.]] Google ScholarDigital Library
- Balasubramanian Kalyanasundaram and Georg Schnitger. The Probabilistic Communication Complexity of Set Intersection (preliminary Version) Proc. of 2nd Structure in Complexity Theory 1987 41--49.]]Google Scholar
- W. B. Johnson and J. Lindenstrauss. Extensions of Lipshitz mapping into Hilbert space. Contemporary (MATH)ematics, 26(1984) 189--206.]]Google Scholar
- A. A. Razborov. On the distributed complexity of disjointness Theoretical Computer Science 106(2) (1992) 385--390.]] Google ScholarDigital Library
- A. C. Yao. Lower bounds by probabilistic arguments. FOCS 1983 420--428.]]Google Scholar
Index Terms
- Space lower bounds for distance approximation in the data stream model
Recommendations
Distributed Lower Bounds for Ruling Sets
Given a graph $G=(V,E)$, an $(\alpha,\beta)$-ruling set is a subset $S\subseteq V$ such that the distance between any two vertices in $S$ is at least $\alpha$, and the distance between any vertex in $V$ and the closest vertex in $S$ is at most $\beta$. We ...
Super-linear time-space tradeoff lower bounds for randomized computation
FOCS '00: Proceedings of the 41st Annual Symposium on Foundations of Computer ScienceWe prove the first time-space lower bound tradeoffs for randomized computation of decision problems. The bounds hold even in the case that the computation is allowed to have arbitrary probability of error on a small fraction of inputs. Our techniques ...
Optimal Approximation Algorithms for Maximum Distance-Bounded Subgraph Problems
In this paper we study the (in)approximability of two distance-based relaxed variants of the maximum clique problem (Max Clique), named Max d-Clique and Max d-Club: A d-clique in a graph $$G = (V, E)$$G=(V,E) is a subset $$S\subseteq V$$S⊆V of vertices ...
Comments