ABSTRACT
We present efficient parallel streaming algorithms for fundamental frequency-based aggregates in both the sliding window and the infinite window settings. In the sliding window setting, we give a parallel algorithm for maintaining a space-bounded block counter (SBBC). Using SBBC, we derive algorithms for basic counting, frequency estimation, and heavy hitters that perform no more work than their best sequential counterparts. In the infinite window setting, we present algorithms for frequency estimation, heavy hitters, and count-min sketch. For both the infinite window and sliding window settings, our parallel algorithms process a "minibatch" of items using linear work and polylog parallel depth. We also prove a lower bound showing that the work of the parallel algorithm is optimal in the case of heavy hitters and frequency estimation. To our knowledge, these are the first parallel algorithms for these problems that are provably work efficient and have low depth.
- Chrisil Arackaparambil, Joshua Brody, and Amit Chakrabarti. Functional monitoring without monotonicity. In ICALP '09, pages 95--106, 2009. Google ScholarDigital Library
- Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. ACM Trans. Database Syst., 38(4), 2013. Google ScholarDigital Library
- Vladimir Braverman and Rafail Ostrovsky. Effective computations on sliding windows. SIAM J. Comput., 39(6):2113--2131, 2010. Google ScholarDigital Library
- Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703, 2002. Google ScholarDigital Library
- Graham Cormode and Marios Hadjieleftheriou. Methods for finding frequent items in data streams. VLDB J., 19(1):3--20, 2010. Google ScholarDigital Library
- Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005. Google ScholarDigital Library
- Graham Cormode, S. Muthukrishnan, Ke Yi, and Qin Zhang. Continuous sampling from distributed streams. J. ACM, 59(2), 2012. Google ScholarDigital Library
- Graham Cormode. The continuous distributed monitoring model. SIGMOD Record, 42(1):5--14, 2013. Google ScholarDigital Library
- Massimo Cafaro and Piergiulio Tempesta. Finding frequent items in parallel. Concurrency and Computation: Practice and Experience, 23(15):1774--1788, 2011. Google ScholarDigital Library
- Sudipto Das, Shyam Antony, Divyakant Agrawal, and Amr El Abbadi. Thread cooperation in multicore architectures for frequency counting over multiple data streams. PVLDB, 2(1):217--228, 2009. Google ScholarDigital Library
- Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows. SIAM J. Comput., 31(6), 2002. Google ScholarDigital Library
- Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. Frequency estimation of internet packet streams with limited space. In ESA, pages 348--360, 2002. Google ScholarDigital Library
- Cristian Estan and George Varghese. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst., 21(3):270--313, 2003. Google ScholarDigital Library
- P. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In SPAA'01, pages 281--291, 2001. Google ScholarDigital Library
- P. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. Theory of Computing Systems, 37:457--478, 2004.Google ScholarDigital Library
- Regant Y. S. Hung, Lap-Kei Lee, and Hing-Fung Ting. Finding frequent items over sliding windows with constant update time. Inf. Process. Lett., 110(7):257--260, 2010. Google ScholarDigital Library
- Joseph JáJá. An Introduction to Parallel Algorithms. Addison-Wesley, 1992. Google ScholarDigital Library
- Richard M. Karp, Scott Shenker, and Christos H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst., 28:51--55, 2003. Google ScholarDigital Library
- Lap-Kei Lee and H. F. Ting. Maintaining significant stream statistics over sliding windows. In SODA, pages 724--732, 2006. Google ScholarDigital Library
- Lap-Kei Lee and H. F. Ting. A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In PODS'06, pages 290--297, 2006. Google ScholarDigital Library
- Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst., 31(3):1095--1133, 2006. Google ScholarDigital Library
- Jayadev Misra and David Gries. Finding repeated elements. Sci. Comput. Program., 2(2):143--152, 1982.Google ScholarCross Ref
- Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB, pages 346--357, 2002. Google ScholarDigital Library
- Sanguthevar Rajasekaran and John H. Reif. Optimal and sublogarithmic time randomized parallel sorting algorithms. SIAM J. Comput., 18(3):594--607, 1989. Google ScholarDigital Library
- Srikanta Tirthapura and David P. Woodruff. Optimal random sampling from distributed streams revisited. In Proc. International Symposium on Distributed Computing (DISC), pages 283--297, 2011. Google ScholarDigital Library
- B. Xu, S. Tirthapura, and C. Busch. Sketching asynchronous data streams over sliding windows. Distributed Computing, 20(5):359--374, 2008.Google ScholarCross Ref
- M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In Proc. ACM Symposium on Operating Systems Principles (SOSP), pages 423--438, 2013. Google ScholarDigital Library
- Yu Zhang, Yue Sun, Jianzhong Zhang, Jingdong Xu, and Ying Wu. An efficient framework for parallel and continuous frequent item monitoring. Concurrency and Computation: Practice and Experience, 2013.Google Scholar
Index Terms
- Parallel streaming frequency-based aggregates
Recommendations
Enabling Parallel Streaming of Multiple Video Sections by Segment Scheduling
MoMM 2015: Proceedings of the 13th International Conference on Advances in Mobile Computing and MultimediaA type of application that occupies most of the Internet is online multimedia, which is usually referred to online video or video streaming with providers such as Youtube, Netflix and Hulu. With the advancement in the Internet technology, people can ...
RMLStreamer-SISO: An RDF Stream Generator from Streaming Heterogeneous Data
The Semantic Web – ISWC 2022AbstractStream-reasoning query languages such as CQELS and C-SPARQL enable query answering over RDF streams. Unfortunately, there currently is a lack of efficient RDF stream generators to feed RDF stream reasoners. State-of-the-art RDF stream generators ...
Supporting MPI-distributed stream parallel patterns in GrPPI
EuroMPI '18: Proceedings of the 25th European MPI Users' Group MeetingIn the recent years, the large volumes of stream data and the near real-time requirements of data streaming applications have exacerbated the need for new scalable algorithms and programming interfaces for distributed and shared-memory platforms. To ...
Comments