skip to main content
10.1145/1250790.1250891acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Lower bounds for randomized read/write stream algorithms

Published:11 June 2007Publication History

ABSTRACT

Motivated by the capabilities of modern storage architectures, we consider the following generalization of the data stream model where the algorithm has sequential access to multiple streams. Unlike the data stream model, where the stream is read only, in this new model (introduced in [8,9]) the algorithms can also write onto streams. There is no limit on the size of the streams but the number of passes made on the streams is restricted. On the other hand, the amount of internal memory used by the algorithm is scarce, similar to data stream model.

We resolve the main open problem in [7] of proving lower bounds in this model for algorithms that are allowed to have 2-sided error. Previously, such lower bounds were shown only for deterministic and 1-sided error randomized algorithms [9,7]. We consider the classical set disjointness problemthat has proved to be invaluable for deriving lower bounds for many other problems involving data streams and other randomized models of computation. For this problem, we show a near-linear lower bound on the size of the internal memory used by a randomized algorithm with 2-sided error that is allowed to have o(log N/log log N) passes over the streams. This bound is almost optimal sincethere is a simple algorithm that can solve this problem using logarithmic memory if the number of passes over the streams.

Applications include near-linear lower bounds onthe internal memory for well-known problems in the literature:(1) approximately counting the number of distinct elements in the input (F0);(2) approximating the frequency of the mod of an input sequence(F*);(3) computing the join of two relations; and (4) deciding if some node of an XML document matches an XQuery (or XPath) query. Our techniques involve a novel direct-sum type of argument that yields lower bounds for many other problems. Our results asymptotically improve previously known bounds for any problem even in deterministic and 1-sided error models of computation.

References

  1. {1} G. Aggarwal, M. Datar, S. Rajagopalan, and M. Ruhl. On the streaming model augmented with a sorting primitive. In 45th Symposium on Foundations of Computer Science (FOCS), pages 540-549, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {2} L. Babai, P. Frankl, and J. Simon. Complexity classes in communication complexity theory. In 27th Annual Symposium on Foundations of Computer Science (FOCS), pages 337-347, Toronto, Ontario, Oct. 1986. IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {3} B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proceedings of the 21st ACM Sympoisum on Principles of Database Systems (PODS), pages 1-16, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {4} P. Beame, M. Saks, X. Sun, and E. Vee. Time-space trade-off lower bounds for randomized computation of decision problems. Journal of the ACM, 50(2):154-195, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {5} J. Chen and C.-K. Yap. Reversal complexity. SIAM J. Comput., 20(4):622-638, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {6} B. Chor and O. Goldreich. Unbiased bits from sources of weak randomness and probabilistic communication complexity. SIAM J. Comput., 17(2):230-261, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {7} M. Grohe, A. Hernich, and N. Schweikardt. Randomized computations on large data sets: Tight lower bounds. In Proceedings of the 25th ACM Symposium on Principles of Database Systems (PODS), pages 243-252, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {8} M. Grohe, C. Koch, and N. Schweikardt. Tight lower bounds for query processing on streaming and external memory data external memory. In Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP), LNCS3580, pages 1076-1088, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {9} M. Grohe and N. Schweikardt. Lower bounds for sorting with few random accesses to external memory. In Proceedings of the 24th ACM Symposium on Principles of Database Systems (PODS), pages 238-249, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {10} P. Indyk and D. P. Woodruff. Tight lower bounds for the distinct elements problem. In 44th Symposium on Foundations of Computer Science (FOCS), pages 283-292, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {11} S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {12} A. A. Razborov. On the distributional complexity of disjointness. Theoretical Computer Science, 106(2):385-390, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {13} M. Ruhl. Efficient Algorithms for New Computational Models. PhD thesis, Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {14} J. S. Vitter. External memory algorithms and data structures. ACM Comput. Surv., 33(2):209-271, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {15} D. P. Woodruff. Optimal space lower bounds for all frequency moments. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 167-175, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Lower bounds for randomized read/write stream algorithms

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      STOC '07: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
      June 2007
      734 pages
      ISBN:9781595936318
      DOI:10.1145/1250790

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,469of4,586submissions,32%

      Upcoming Conference

      STOC '24
      56th Annual ACM Symposium on Theory of Computing (STOC 2024)
      June 24 - 28, 2024
      Vancouver , BC , Canada

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader