skip to main content
10.1145/509907.509966acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Fast, small-space algorithms for approximate histogram maintenance

Published:19 May 2002Publication History

ABSTRACT

(MATH) A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A3." We give a sketching algorithm, that constructs a small sketch from the stream of updates, and a reconstruction algorithm, that produces a B-bucket piecewise-constant representation (histogram) H for A from the sketch, such that ||A—H||≤(1+ε)||A—Hopt||, where the error ||A—H|| is either $\ell_1$ (absolute) or $\ell_2$ (root-mean-square) error. The time to process a single update, time to reconstruct the histogram, and size of the sketch are each bounded by poly(B,log(N),log||A,1/ε. Our result is obtained in two steps. First we obtain what we call a robust histogram approximation for A, a histogram such that adding a small number of buckets does not help improve the representation quality significantly. From the robust histogram, we cull a histogram of desired accruacy and B buckets in the second step. This technique also provides similar results for Haar wavelet representations, under $\ell_2$ error. Our results have applications in summarizing data distributions fast and succinctly even in distributed settings.

References

  1. A. Aboulnaga, S. Chaudhuri. Self-tuning Histograms: Building Histograms Without Looking at Data. SIGMOD 1999, 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, Y. Matias, M. Szegedy. The Space Complexity of Approximating the Frequency Moments. JCSS 58(1): 137--147 (1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan. An Approximate L1-Difference Algorithm for Massive Data Streams. FOCS 1999, 501--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. B. Gibbons, Y. Matias. Synopsis Data Structures for Massive Data Sets SODA 1999, 909--910. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. B. Gibbons, Y. Matias, V. Poosala. Fast Incremental Maintenance of Approximate Histograms. VLDB 1997, 466--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss. Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. VLDB 2001, 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. Strauss. QuickSAND: Quick Summary and Analysis of Network Data DIMACS Technical Report 2001-43.Google ScholarGoogle Scholar
  8. S. Guha, N. Koudas. Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation. ICDE 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Guha, N. Koudas, K. Shim. Data-streams and histograms. STOC 2001, 471--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. FOCS 2000, 189--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, T. Suel. Optimal Histograms with Quality Guarantees. VLDB 1998, 275--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J.-H. Lee, D.-H. Kim, C.-W. Chung. Multi-dimensional selectivity estimation using compressed histogram information. SIGMOD 1999, 205--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Matias, J. S. Vitter, M. Wang. Dynamic Maintenance of Wavelet-Based Histograms. VLDB 2000, 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Naor, O. Reingold. Private communication, March, 1999.Google ScholarGoogle Scholar
  15. N. Nisan Pseudorandom Generators for Space-Bounded Computation. STOC 1990, 204--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Poosala. Histograms for selecitivty estimation. PhD Thesis, U. Wisconsin, Madison. 1997.Google ScholarGoogle Scholar
  17. N. Thaper, S. Guha, P. Indyk, N. Koudas. Dynamic Multidimensional Histograms. SIGMOD 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast, small-space algorithms for approximate histogram maintenance

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          STOC '02: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
          May 2002
          840 pages
          ISBN:1581134959
          DOI:10.1145/509907

          Copyright © 2002 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 May 2002

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          STOC '02 Paper Acceptance Rate91of287submissions,32%Overall Acceptance Rate1,469of4,586submissions,32%

          Upcoming Conference

          STOC '24
          56th Annual ACM Symposium on Theory of Computing (STOC 2024)
          June 24 - 28, 2024
          Vancouver , BC , Canada

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader