research-article

Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

Authors:
Bonaventura Del Monte

Technische Universität Berlin & DFKI GmbH, Berlin, Germany

Technische Universität Berlin & DFKI GmbH, Berlin, Germany
View Profile

,
Steffen Zeuch

Technische Universität Berlin & DFKI GmbH, Berlin, Germany

Technische Universität Berlin & DFKI GmbH, Berlin, Germany
View Profile

,
Tilmann Rabl

Hasso Plattner Institute, University of Potsdam, Potsdam, Germany

Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
View Profile

,
Volker Markl

Technische Universität Berlin & DFKI GmbH, Berlin, Germany

Technische Universität Berlin & DFKI GmbH, Berlin, Germany
View Profile

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataJune 2020Pages 2471–2486https://doi.org/10.1145/3318464.3389723

Published:31 May 2020Publication History

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 2471–2486

ABSTRACT

Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of-the-art, and reduces latency by three orders of magnitude upon a reconfiguration.

Supplemental Material

3318464.3389723.mp4

mp4

152.6 MB

Download

References

Tyler Akidau, Alex Balikov, Kaya Bekiroug lu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: fault-tolerant stream processing at internet scale. PVLDB (2013).Google Scholar
Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, et al. 2015. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB (2015).Google Scholar
Alexander Alexandrov, Rico Bergmann, Stephan Ewen, J. Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinl"ander, Matthias Sax, Sebastian Schelter, Mareike Höger, Kostas Tzoumas, and Daniel Warneke. 2014. The Stratosphere Platform for Big Data Analytics. The VLDB Journal (2014).Google Scholar
Peter A. Alsberg and John D. Day. 1976. A Principle for Resilient Sharing of Distributed Resources. In ICSE.Google Scholar
Paris Carbone, Stephan Ewen, Gyula Fóra, Seif Haridi, Stefan Richter, and Kostas Tzoumas. 2017. State Management in Apache Flink: Consistent Stateful Distributed Stream Processing. PVLDB Endow. (2017).Google Scholar
Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2013. Integrating Scale out and Fault Tolerance in Stream Processing Using Operator State Management. In ACM SIGMOD.Google Scholar
Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2014. Making State Explicit for Imperative Big Data Processing. In USENIX ATC.Google Scholar
Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert Henry, Robert Bradshaw, and Nathan. 2010. FlumeJava: Easy, Efficient Data-Parallel Pipelines. In ACM SIGPLAN.Google Scholar
Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. SIGMOD.Google ScholarDigital Library
K. Mani Chandy and Leslie Lamport. 1985. Distributed Snapshots: Determining Global States of Distributed Systems. ACM TOCS (1985).Google ScholarDigital Library
Confluent. 2017. Running Kafka in Production. https://docs.confluent.io/current/kafka/deployment.htmlGoogle Scholar
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store. ACM SIGOPS (2007).Google ScholarDigital Library
Facebook. 2012. RocksDB.org. Facebook Open Source. https://rocksdb.org/Google Scholar
Facebook. 2017. RocksDB Tuning Guide. https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-GuideGoogle Scholar
Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2014. Making state explicit for imperative big data processing. In 2014 USENIX ATC).Google Scholar
Apache Flink. 2015. Apache Flink Configuration. https://ci.apache.org/projects/flink/flink-docs-master/Google Scholar
Avrilia Floratou, Ashvin Agrawal, Bill Graham, Sriram Rao, and Karthik Ramasamy. 2017. Dhalion: Self-Regulating Stream Processing in Heron. PVLDB (2017).Google ScholarDigital Library
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google File System. In ACM SOSP.Google Scholar
Thomas Heinze, Yuanzhen Ji, Lars Roediger, Valerio Pappalardo, Andreas Meister, Zbigniew Jerzak, and Christof Fetzer. 2015. FUGU: Elastic Data Stream Processing with Latency Constraints. IEEE Data Eng. Bull. (2015).Google Scholar
Moritz Hoffmann, Andrea Lattuada, Frank McSherry, Vasiliki Kalavri, and Timothy Roscoe. 2019 a. Megaphone: Latency-conscious state migration. https://github.com/strymon-system/megaphoneGoogle Scholar
Moritz Hoffmann, Andrea Lattuada, Frank McSherry, Vasiliki Kalavri, and Timothy Roscoe. 2019 b. Megaphone: Latency-conscious State Migration for Distributed Streaming Dataflows. VLDB (2019).Google Scholar
Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin, Ugur Cetintemel, Michael Stonebraker, and Stan Zdonik. 2005. High-Availability Algorithms for Distributed Stream Processing. In IEEE ICDE.Google Scholar
Gabriela Jacques-Silva, Ran Lei, Luwei Cheng, Guoqiang Jerry Chen, Kuen Ching, Tanji Hu, Yuan Mei, Kevin Wilfong, Rithin Shetty, Serhat Yilmaz, Anirban Banerjee, Benjamin Heintz, Shridar Iyer, and Anshul Jaiswal. 2018. Providing Streaming Joins As a Service at Facebook. PVLDB (2018).Google Scholar
Vasiliki Kalavri, John Liagouris, Moritz Hoffmann, Desislava Dimitrova, Matthew Forshaw, and Timothy Roscoe. 2018. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In USENIX OSDI.Google Scholar
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent Hashing and Random Trees.. In ACM STOC.Google Scholar
Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, and Volker Markl. 2018. Benchmarking Distributed Stream Data Processing Systems. In IEEE ICDE 2018.Google Scholar
Klaviyo. 2019. Apache Flink Performance Optimization. https://klaviyo.tech/flinkperf-c7bd28acc67Google Scholar
H. T. Kung, Trevor Blackwell, and Alan Chapman. 1994. Credit-Based Flow Control for ATM Networks: Credit Update Protocol, Adaptive Credit Allocation and Statistical Multiplexing. In ACM SIGCOMM.Google Scholar
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. ACM SIGOPS (2010).Google Scholar
Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Shivaram Venkataraman, Paolo Costa, Terry Kim, Saravanan Muthukrishnan, Vamsi Kuppa, Sudheer Dhulipalla, and Sriram Rao. 2018. Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems. VLDB (2018).Google ScholarDigital Library
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Mart'in Abadi. 2013. Naiad: A Timely Dataflow System. In ACM SOSP.Google ScholarDigital Library
Raghunath Nambiar, Meikel Poess, Andrew Masland, H. Reza Taheri, Andrew Bond, Forrest Carman, and Michael Majdalany. 2013. TPC State of the Council 2013. In 5th TPC Technology Conference on Performance Characterization and Benchmarking - Volume 8391.Google Scholar
Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, David Garc'i a-Soriano, Nicolas Kourtellis, and Marco Serafini. 2015. The power of both choices: Practical load balancing for distributed stream processing engines. In IEEE ICDE.Google Scholar
Netflix. 2017. Keystone Real-time Stream Processing Platform. https://medium.com/netflix-techblog/keystone-real-time-stream-processing-platform-a3ee651812aGoogle Scholar
Shadi A. Noghabi, Kartik Paramasivam, Yi Pan, Navina Ramesh, Jon Bringhurst, Indranil Gupta, and Roy H. Campbell. 2017. Samza: Stateful Scalable Stream Processing at LinkedIn. PVLDB (2017).Google Scholar
Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems. In ACM PODC.Google Scholar
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In IEEE MSST.Google Scholar
Tencent. 2018. Oceanus: A one-stop platform for real time stream processing. https://www.ververica.com/blog/oceanus-platform-powered-by-apache-flinkGoogle Scholar
Quoc-Cuong To, Juan Soto, and Volker Markl. 2018. A Survey of State Management in Big Data Processing Systems. The VLDB Journal (2018).Google Scholar
Pete Tucker, Kristin Tufte, Vassilis Papadimos, and David Maier. 2004. NEXMark - A Benchmark for Queries over Data Streams DRAFT. (2004).Google Scholar
Uber. 2018. Introducing AthenaX, Uber Engineering's Open Source Streaming Analytics Platform. https://eng.uber.com/athenax/Google Scholar
Robbert van Renesse and Fred B. Schneider. 2004. Chain Replication for Supporting High Throughput and Availability. In USENIX OSDI.Google ScholarDigital Library
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A Scalable, High-performance Distributed File System. In USENIX OSDI.Google Scholar
Chenggang Wu, Jose Faleiro, Yihan Lin, and Joseph Hellerstein. 2019. Anna: A kvs for any scale. IEEE TKDE (2019).Google Scholar
Yingjun Wu and Kian-Lee Tan. 2015. ChronoStream: Elastic stateful stream computation in the cloud. In IEEE ICDE.Google Scholar
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In ACM SOSP.Google ScholarDigital Library
Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. 2016. Apache Spark: A unified engine for big data processing. CACM (2016).Google ScholarDigital Library
Steffen Zeuch, Ankit Chaudhary, Bonaventura Del Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp M Grulich, Sebastian Breß, Jonas Traub, and Volker Markl. 2020. The NebulaStream Platform: Data and Application Management for the Internet of Things. In CIDR.Google Scholar
Yali Zhu, Elke Rundensteiner, and George Heineman. 2004. Dynamic Plan Migration for Continuous Queries over Data Streams. In ACM SIGMOD.Google Scholar

Index Terms

Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
        MapReduce-based systems
      2. Stream management

Recommendations

Rethinking Stateful Stream Processing with RDMA
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Remote Direct Memory Access (RDMA) hardware has bridged the gap between network and main memory speed and thus invalidated the common assumption that network is often the bottleneck in distributed data processing systems. However, high-speed networks do ...
Read More
Integrating scale out and fault tolerance in stream processing using operator state management
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

As users of "big data" applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the "pay-as-...
Read More
A new operator for efficient stream-relation join processing in data streaming engines
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

In the last decade, Stream Processing Engines (SPEs) have emerged as a new processing paradigm that can process huge amounts of data while retaining low latency and high-throughputs. Yet, it is often necessary to join streaming data with traditional ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed and parallel databases
stateful stream processing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 1,238
  Total Downloads
- Downloads (Last 12 months)163
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Rethinking Stateful Stream Processing with RDMA

Integrating scale out and fault tolerance in stream processing using operator state management

A new operator for efficient stream-relation join processing in data streaming engines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Rethinking Stateful Stream Processing with RDMA

Integrating scale out and fault tolerance in stream processing using operator state management

A new operator for efficient stream-relation join processing in data streaming engines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media