skip to main content
10.1145/2612669.2612695acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Parallel streaming frequency-based aggregates

Published:21 June 2014Publication History

ABSTRACT

We present efficient parallel streaming algorithms for fundamental frequency-based aggregates in both the sliding window and the infinite window settings. In the sliding window setting, we give a parallel algorithm for maintaining a space-bounded block counter (SBBC). Using SBBC, we derive algorithms for basic counting, frequency estimation, and heavy hitters that perform no more work than their best sequential counterparts. In the infinite window setting, we present algorithms for frequency estimation, heavy hitters, and count-min sketch. For both the infinite window and sliding window settings, our parallel algorithms process a "minibatch" of items using linear work and polylog parallel depth. We also prove a lower bound showing that the work of the parallel algorithm is optimal in the case of heavy hitters and frequency estimation. To our knowledge, these are the first parallel algorithms for these problems that are provably work efficient and have low depth.

References

  1. Chrisil Arackaparambil, Joshua Brody, and Amit Chakrabarti. Functional monitoring without monotonicity. In ICALP '09, pages 95--106, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. ACM Trans. Database Syst., 38(4), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Vladimir Braverman and Rafail Ostrovsky. Effective computations on sliding windows. SIAM J. Comput., 39(6):2113--2131, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Graham Cormode and Marios Hadjieleftheriou. Methods for finding frequent items in data streams. VLDB J., 19(1):3--20, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Graham Cormode, S. Muthukrishnan, Ke Yi, and Qin Zhang. Continuous sampling from distributed streams. J. ACM, 59(2), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Graham Cormode. The continuous distributed monitoring model. SIGMOD Record, 42(1):5--14, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Massimo Cafaro and Piergiulio Tempesta. Finding frequent items in parallel. Concurrency and Computation: Practice and Experience, 23(15):1774--1788, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sudipto Das, Shyam Antony, Divyakant Agrawal, and Amr El Abbadi. Thread cooperation in multicore architectures for frequency counting over multiple data streams. PVLDB, 2(1):217--228, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM J. Comput., 31(6), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. Frequency estimation of internet packet streams with limited space. In ESA, pages 348--360, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cristian Estan and George Varghese. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst., 21(3):270--313, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In SPAA'01, pages 281--291, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. Theory of Computing Systems, 37:457--478, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Regant Y. S. Hung, Lap-Kei Lee, and Hing-Fung Ting. Finding frequent items over sliding windows with constant update time. Inf. Process. Lett., 110(7):257--260, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joseph JáJá. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Richard M. Karp, Scott Shenker, and Christos H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst., 28:51--55, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lap-Kei Lee and H. F. Ting. Maintaining significant stream statistics over sliding windows. In SODA, pages 724--732, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lap-Kei Lee and H. F. Ting. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In PODS'06, pages 290--297, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst., 31(3):1095--1133, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jayadev Misra and David Gries. Finding repeated elements. Sci. Comput. Program., 2(2):143--152, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  23. Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB, pages 346--357, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sanguthevar Rajasekaran and John H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput., 18(3):594--607, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Srikanta Tirthapura and David P. Woodruff. Optimal random sampling from distributed streams revisited. In Proc. International Symposium on Distributed Computing (DISC), pages 283--297, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Xu, S. Tirthapura, and C. Busch. Sketching asynchronous data streams over sliding windows. Distributed Computing, 20(5):359--374, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proc. ACM Symposium on Operating Systems Principles (SOSP), pages 423--438, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yu Zhang, Yue Sun, Jianzhong Zhang, Jingdong Xu, and Ying Wu. An efficient framework for parallel and continuous frequent item monitoring. Concurrency and Computation: Practice and Experience, 2013.Google ScholarGoogle Scholar

Index Terms

  1. Parallel streaming frequency-based aggregates

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures
        June 2014
        356 pages
        ISBN:9781450328210
        DOI:10.1145/2612669
        • General Chair:
        • Guy Blelloch,
        • Program Chair:
        • Peter Sanders

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 June 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SPAA '14 Paper Acceptance Rate30of122submissions,25%Overall Acceptance Rate447of1,461submissions,31%

        Upcoming Conference

        SPAA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader