skip to main content
10.1145/3093742.3093925acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time

Published:08 June 2017Publication History

ABSTRACT

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be cast as binary operators that are associative, but they are not necessarily commutative nor invertible. Non-invertible operators, however, are difficult to support efficiently. The best published algorithms require O(log n) aggregation steps per window operation, where n is the sliding-window size at that point. For a FIFO window, this can be improved to O(1) on average by using two aggregation stacks.

This paper presents DABA, a novel algorithm for aggregating FIFO sliding windows that significantly improves upon these time bounds. DABA requires only O(1) aggregation steps per operation in the worst case (not just on average). As such, DABA asymptotically improves the performance of sliding-window aggregation without restricting the operator to be invertible. Our experimental results demonstrate that these theoretical improvements hold in practice. DABA is a substantial improvement over the state of the art in terms of both latency and throughput.

References

  1. 2016. Apache Flink: Scalable Batch and Stream Data Processing. https://flink.apache.org.(2016). Retrieved Aug. 2016.Google ScholarGoogle Scholar
  2. adamax. 2011. Re: Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations. http://stackoverflow.com/questions/4802038/. (2011). Retrieved Aug., 2016.Google ScholarGoogle Scholar
  3. Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. In Conference on Very Large Data Bases (VLDB) Industrial Track. 734--746.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mohamed Ali, Badrish Chandramouli, Jonathan Goldstein, and Roman Schindlauer. 2011. The extensibility framework in Microsoft Stream Insight. In International Conference on Data Engineering (ICDE). 1242--1253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arvind Arasu and Jennifer Widom. 2004. Resource sharing in continuous sliding window aggregates. In Conference on Very Large Data Bases (VLDB). 336--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. David Bacon, Perry Cheng, and V. T. Rajan. 2003. A Real-Time Garbage Collector with Low Overhead and Consistent Utilization. In Principles of Programming Languages (POPL). 285--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM (CACM) 13, 7 (1970), 422--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Oscar Boykin, Sam Ritchie, Ian O'Connell, and Jimmy Lin. 2014. Summingbird: A Framework for Integrating Batch and Online MapReduce Computations. In Conference on Very Large Data Bases (VLDB). 1441--1451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Graham Cormode and S. Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55, 1 (2005), 58--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chuck Cranor, Theodore Johnson, Oliver Spataschek, and Vladislav Shkapenyuk. 2003. Gigascope: A Stream Database for Network Applications. In International Conference on Management of Data (SIGMOD) Industrial Track. 647--651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms (AofA). 127--146.Google ScholarGoogle Scholar
  12. Buğra Gedik. 2013. Generic windowing support for extensible stream processing systems. Software Practice and Experience (SP&E) (2013), 1105--1128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Martin Hirzel, Henrique Andrade, Buğra Gedik, Gabriela Jacques-Silva, Rohit Khandekar, Vibhore Kumar, Mark Mendell, Howard Nasgaard, Scott Schneider, Robert Soulé, and Kun-Lung Wu. 2013. IBM Streams Processing Language: Analyzing Big Data in Motion. IBM Journal of Research and Development 57, 3/4 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Robert Hood and Robert Melville. 1981. Real-Time Queue Operation in Pure LISP. Inform. Process. Lett. 13, 2 (1981), 50--54.Google ScholarGoogle ScholarCross RefCross Ref
  15. Paul Hudak, Simon L. Peyton Jones, Philip Wadler, Brian Boutel, Jon Fairbairn, Joseph H. Fasel, María M. Guzmán, Kevin Hammond, John Hughes, Thomas Johnsson, Richard B. Kieburtz, Rishiyur S. Nikhil, Will Partain, and John Peterson. 1992. Report on the Programming Language Haskell, A Non-strict, Purely Functional Language. SIGPLAN Notices 27, 5 (1992), R1--R164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sailesh Krishnamurthy, Michael J. Franklin, Jeffrey Davis, Daniel Farina, Pasha Golovko, Alan Li, and Neil Thombre. 2010. Continuous Analytics over Discontinuous Streams. In International Conference on Management of Data (SIGMOD). 1081--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly sharing for streamed aggregation. In International Conference on Management of Data (SIGMOD). 623--634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In International Conference on Management of Data (SIGMOD). 239--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Record 34, 1 (2005), 39--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bongki Moon, Inés Fernando Vega López, and Vijaykumar Immanuel. 2000. Scalable Algorithms for Large Temporal Aggregation. In International Conference on Data Engineering (ICDE). 145--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martin Abadi. 2013. Naiad: A Timely Dataflow System. In Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chris Okasaki. 1995. Simple and efficient purely functional queues and deques. Journal of Functional Programming (JFP) 5, 4 (1995), 583--592.Google ScholarGoogle ScholarCross RefCross Ref
  23. Scott Schneider, Martin Hirzel, Buğra Gedik, and Kun-Lung Wu. 2015. Safe Data Parallelism for General Streaming. IEEE Transactions on Computers (TC) 64, 2 (2015), 504--517.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jon Skeet. 2009. Re: design a stack such that getMinimum() should be O(1). http://stackoverflow.com/questions/685060/. (2009). Retrieved Aug., 2016.Google ScholarGoogle Scholar
  25. Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Principles of Database Systems (PODS). 263--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2015. Constant-Time Sliding Window Aggregation. Technical Report RC25574. IBM Research.Google ScholarGoogle Scholar
  27. Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General Incremental Sliding-Window Aggregation. In Conference on Very Large Data Bases (VLDB). 702--713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm @Twitter. In International Conference on Management of Data (SIGMOD). 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jun Yang and Jennifer Widom. 2001. Incremental computation and maintenance of temporal aggregates. In International Conference on Data Engineering (ICDE). 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yuan Yu, Pradeep Kumar Gunda, and Michael Isard. 2009. Distributed aggregation for data-parallel computing: Interfaces and implementations. In Symposium on Operating Systems Principles (SOSP). 247--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Symposium on Operating Systems Principles (SOSP). 423--438. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems
      June 2017
      393 pages
      ISBN:9781450350655
      DOI:10.1145/3093742

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 June 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      DEBS '17 Paper Acceptance Rate22of60submissions,37%Overall Acceptance Rate130of553submissions,24%

      Upcoming Conference

      DEBS '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader