research-article

Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time

Authors:
Kanat Tangwongsan

Mahidol University International College

Mahidol University International College
View Profile

,
Martin Hirzel

IBM Research

IBM Research
View Profile

,
Scott Schneider

IBM Research

IBM Research
View Profile

DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based SystemsJune 2017Pages 66–77https://doi.org/10.1145/3093742.3093925

Published:08 June 2017Publication History

DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems

Pages 66–77

ABSTRACT

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be cast as binary operators that are associative, but they are not necessarily commutative nor invertible. Non-invertible operators, however, are difficult to support efficiently. The best published algorithms require O(log n) aggregation steps per window operation, where n is the sliding-window size at that point. For a FIFO window, this can be improved to O(1) on average by using two aggregation stacks.

This paper presents DABA, a novel algorithm for aggregating FIFO sliding windows that significantly improves upon these time bounds. DABA requires only O(1) aggregation steps per operation in the worst case (not just on average). As such, DABA asymptotically improves the performance of sliding-window aggregation without restricting the operator to be invertible. Our experimental results demonstrate that these theoretical improvements hold in practice. DABA is a substantial improvement over the state of the art in terms of both latency and throughput.

References

2016. Apache Flink: Scalable Batch and Stream Data Processing. https://flink.apache.org.(2016). Retrieved Aug. 2016.Google Scholar
adamax. 2011. Re: Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations. http://stackoverflow.com/questions/4802038/. (2011). Retrieved Aug., 2016.Google Scholar
Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. In Conference on Very Large Data Bases (VLDB) Industrial Track. 734--746.Google ScholarDigital Library
Mohamed Ali, Badrish Chandramouli, Jonathan Goldstein, and Roman Schindlauer. 2011. The extensibility framework in Microsoft Stream Insight. In International Conference on Data Engineering (ICDE). 1242--1253. Google ScholarDigital Library
Arvind Arasu and Jennifer Widom. 2004. Resource sharing in continuous sliding window aggregates. In Conference on Very Large Data Bases (VLDB). 336--347. Google ScholarDigital Library
David Bacon, Perry Cheng, and V. T. Rajan. 2003. A Real-Time Garbage Collector with Low Overhead and Consistent Utilization. In Principles of Programming Languages (POPL). 285--298. Google ScholarDigital Library
Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM (CACM) 13, 7 (1970), 422--426. Google ScholarDigital Library
Oscar Boykin, Sam Ritchie, Ian O'Connell, and Jimmy Lin. 2014. Summingbird: A Framework for Integrating Batch and Online MapReduce Computations. In Conference on Very Large Data Bases (VLDB). 1441--1451. Google ScholarDigital Library
Graham Cormode and S. Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55, 1 (2005), 58--75. Google ScholarDigital Library
Chuck Cranor, Theodore Johnson, Oliver Spataschek, and Vladislav Shkapenyuk. 2003. Gigascope: A Stream Database for Network Applications. In International Conference on Management of Data (SIGMOD) Industrial Track. 647--651. Google ScholarDigital Library
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms (AofA). 127--146.Google Scholar
Buğra Gedik. 2013. Generic windowing support for extensible stream processing systems. Software Practice and Experience (SP&E) (2013), 1105--1128. Google ScholarDigital Library
Martin Hirzel, Henrique Andrade, Buğra Gedik, Gabriela Jacques-Silva, Rohit Khandekar, Vibhore Kumar, Mark Mendell, Howard Nasgaard, Scott Schneider, Robert Soulé, and Kun-Lung Wu. 2013. IBM Streams Processing Language: Analyzing Big Data in Motion. IBM Journal of Research and Development 57, 3/4 (2013). Google ScholarDigital Library
Robert Hood and Robert Melville. 1981. Real-Time Queue Operation in Pure LISP. Inform. Process. Lett. 13, 2 (1981), 50--54.Google ScholarCross Ref
Paul Hudak, Simon L. Peyton Jones, Philip Wadler, Brian Boutel, Jon Fairbairn, Joseph H. Fasel, María M. Guzmán, Kevin Hammond, John Hughes, Thomas Johnsson, Richard B. Kieburtz, Rishiyur S. Nikhil, Will Partain, and John Peterson. 1992. Report on the Programming Language Haskell, A Non-strict, Purely Functional Language. SIGPLAN Notices 27, 5 (1992), R1--R164. Google ScholarDigital Library
Sailesh Krishnamurthy, Michael J. Franklin, Jeffrey Davis, Daniel Farina, Pasha Golovko, Alan Li, and Neil Thombre. 2010. Continuous Analytics over Discontinuous Streams. In International Conference on Management of Data (SIGMOD). 1081--1092. Google ScholarDigital Library
Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly sharing for streamed aggregation. In International Conference on Management of Data (SIGMOD). 623--634. Google ScholarDigital Library
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In International Conference on Management of Data (SIGMOD). 239--250. Google ScholarDigital Library
Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Record 34, 1 (2005), 39--44. Google ScholarDigital Library
Bongki Moon, Inés Fernando Vega López, and Vijaykumar Immanuel. 2000. Scalable Algorithms for Large Temporal Aggregation. In International Conference on Data Engineering (ICDE). 145--154. Google ScholarDigital Library
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martin Abadi. 2013. Naiad: A Timely Dataflow System. In Symposium on Operating Systems Principles (SOSP). Google ScholarDigital Library
Chris Okasaki. 1995. Simple and efficient purely functional queues and deques. Journal of Functional Programming (JFP) 5, 4 (1995), 583--592.Google ScholarCross Ref
Scott Schneider, Martin Hirzel, Buğra Gedik, and Kun-Lung Wu. 2015. Safe Data Parallelism for General Streaming. IEEE Transactions on Computers (TC) 64, 2 (2015), 504--517.Google ScholarCross Ref
Jon Skeet. 2009. Re: design a stack such that getMinimum() should be O(1). http://stackoverflow.com/questions/685060/. (2009). Retrieved Aug., 2016.Google Scholar
Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Principles of Database Systems (PODS). 263--274. Google ScholarDigital Library
Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2015. Constant-Time Sliding Window Aggregation. Technical Report RC25574. IBM Research.Google Scholar
Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General Incremental Sliding-Window Aggregation. In Conference on Very Large Data Bases (VLDB). 702--713. Google ScholarDigital Library
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm @Twitter. In International Conference on Management of Data (SIGMOD). 147--156. Google ScholarDigital Library
Jun Yang and Jennifer Widom. 2001. Incremental computation and maintenance of temporal aggregates. In International Conference on Data Engineering (ICDE). 51--60. Google ScholarDigital Library
Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed aggregation for data-parallel computing: Interfaces and implementations. In Symposium on Operating Systems Principles (SOSP). 247--260. Google ScholarDigital Library
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Symposium on Operating Systems Principles (SOSP). 423--438. Google ScholarDigital Library

Index Terms

Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Stream management

Recommendations

LightSaber: Efficient Window Aggregation on Multi-core Processors
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Window aggregation queries are a core part of streaming applications. To support window aggregation efficiently, stream processing engines face a trade-off between exploiting parallelism (at the instruction/multi-core levels) and incremental computation ...
Read More
Sliding-Window Aggregation Algorithms: Tutorial
DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems

Stream processing is important for analyzing continuous streams of data in real time. Sliding-window aggregation is both needed for many streaming applications and surprisingly hard to do efficiently. Picking the wrong aggregation algorithm causes poor ...
Read More
Optimal and general out-of-order sliding-window aggregation

Sliding-window aggregation derives a user-defined summary of the most-recent portion of a data stream. For in-order streams, each window change can be handled in O(1) time even when the aggregation operator is not invertible. But streaming data often ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems
June 2017
393 pages
ISBN:9781450350655
DOI:10.1145/3093742

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
(de-)amortization
Real-time
continuous analytics
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
DEBS '17 Paper Acceptance Rate22of60submissions,37%Overall Acceptance Rate130of553submissions,24%
More
Upcoming Conference
DEBS '24

Sponsor:

sigmod

sigmod

The 18th ACM International Conference on Distributed and Event-based Systems

June 24 - 28, 2024

Villeurbanne , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 538
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time

DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

LightSaber: Efficient Window Aggregation on Multi-core Processors

Sliding-Window Aggregation Algorithms: Tutorial

Optimal and general out-of-order sliding-window aggregation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time

DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

LightSaber: Efficient Window Aggregation on Multi-core Processors

Sliding-Window Aggregation Algorithms: Tutorial

Optimal and general out-of-order sliding-window aggregation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media