Abstract
As data becomes dynamic, large, and distributed, there is increasing demand for what have become known as distributed stream algorithms. Since continuously collecting the data to a central server and processing it there is infeasible, a common approach is to define local conditions at the distributed nodes, such that—as long as they are maintained—some desirable global condition holds.
Previous methods derived local conditions focusing on communication efficiency. While proving very useful for reducing the communication volume, these local conditions often suffer from heavy computational burden at the nodes. The computational complexity of the local conditions affects both the runtime and the energy consumption. These are especially critical for resource-limited devices like smartphones and sensor nodes. Such devices are becoming more ubiquitous due to the recent trend toward smart cities and the Internet of Things. To accommodate for high data rates and limited resources of these devices, it is crucial that the local conditions be quickly and efficiently evaluated.
Here we propose a novel approach, designated CB (for Convex/Concave Bounds). CB defines local conditions using suitably chosen convex and concave functions. Lightweight and simple, these local conditions can be rapidly checked on the fly. CB’s superiority over the state-of-the-art is demonstrated in its reduced runtime and power consumption, by up to six orders of magnitude in some cases. As an added bonus, CB also reduced communication overhead in all the tested application scenarios.
- Amir Ali Ahmadi and Georgina Hall. 2015. DC decomposition of nonconvex polynomials with algebraic techniques. Mathematical Programming (2015), 1--26.Google Scholar
- Chrisil Arackaparambil, Joshua Brody, and Amit Chakrabarti. 2009. Functional monitoring without monotonicity. In ICALP. 95--106. Google ScholarDigital Library
- B. Babcock and C. Olston. 2003. Distributed top-k monitoring. In SIGMOD. 28--39. Google ScholarDigital Library
- Shivnath Babu and Jennifer Widom. 2001. Continuous queries over data streams. SIGMOD 30, 3 (2001), 109--120. Google ScholarDigital Library
- Marco Balduini, Irene Celino, Daniele Dell’Aglio, Emanuele Della Valle, Yi Huang, Tony Lee, Seon-Ho Kim, and Volker Tresp. 2012. BOTTARI: An augmented reality mobile application to deliver personalized and location-based recommendations by continuous analysis of social media streams. Web Semant. 16 (2012), 33--41. Google ScholarDigital Library
- Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. 2002. Counting distinct elements in a data stream. In RANDOM. 1--10. Google ScholarDigital Library
- Steven Bird. 2006. NLTK: The natural language toolkit. In COLING/ACL. 69--72. Google ScholarDigital Library
- S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press. Google ScholarDigital Library
- Joshua Brody and Amit Chakrabarti. 2009. A multi-round communication lower bound for gap hamming and some consequences. In CCC. 358--368. Google ScholarDigital Library
- Sabbas Burdakis and Antonios Deligiannakis. 2012. Detecting outliers in sensor networks using the geometric approach. In ICDE. 1108--1119. Google ScholarDigital Library
- Graham Cormode. 2013. The continuous distributed monitoring model. SIGMOD Rec. 42, 1 (2013), 5--14. Google ScholarDigital Library
- Graham Cormode and Minos N. Garofalakis. 2005. Sketching streams through the net: Distributed approximate query tracking. In VLDB. 13--24. Google ScholarDigital Library
- Graham Cormode and Minos N. Garofalakis. 2008. Approximate continuous querying over distributed streams. ACM Trans. Database Syst. 33, 2 (2008), 1--39. Google ScholarDigital Library
- Abhinandan Das, Sumit Ganguly, Minos N. Garofalakis, and Rajeev Rastogi. 2004. Distributed set expression cardinality estimation. In VLDB. 312--323. Google ScholarDigital Library
- Mark Dilman and Danny Raz. 2002. Efficient reactive monitoring. SAC 20, 4 (2002), 668--676. Google ScholarDigital Library
- Manfredo P. Do Carmo. 2016. Differential Geometry of Curves and Surfaces: Revised and Updated (2nd ed.). Courier Dover Publications.Google Scholar
- Ky Fan. 1949. On a theorem of weyl concerning eigenvalues of linear transformations I. In Proc. Natl. Acad. Sci. U.S.A 35, 11 (1949), 652--655.Google ScholarCross Ref
- Arik Friedman, Izchak Sharfman, Daniel Keren, and Assaf Schuster. 2014. Privacy-preserving distributed stream monitoring. In NDSS. 1--12.Google Scholar
- Moshe Gabel, Daniel Keren, and Assaf Schuster. 2015. Monitoring least squares models of distributed streams. In SIGKDD. ACM, 319--328. Google ScholarDigital Library
- Moshe Gabel, Daniel Keren, and Assaf Schuster. 2017. Anarchists, unite: Practical entropy approximation for distributed streams. KDD, 837--846. Google ScholarDigital Library
- Moshe Gabel, Assaf Schuster, and Daniel Keren. 2014. Communication-efficient distributed variance monitoring and outlier detection for multivariate time series. In IPDPS. 37--47. Google ScholarDigital Library
- Minos N. Garofalakis, Daniel Keren, and Vasilis Samoladas. 2013. Sketch-based geometric monitoring of distributed stream queries. PVLDB 6, 10 (2013), 937--948. Google ScholarDigital Library
- Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis, Izchak Sharfman, and Assaf Schuster. 2014. Distributed geometric query monitoring using prediction models. TODS 39, 2 (2014), 16:1--16:42. Google ScholarDigital Library
- Nikos Giatrakos, Antonios Deligiannakis, Minos N. Garofalakis, Izchak Sharfman, and Assaf Schuster. 2012. Prediction-based geometric monitoring over distributed data streams. In SIGMOD. 265--276. Google ScholarDigital Library
- G. H. Golub and C. F. Van Loan. 1996. Matrix Computations, (3rd ed.). Johns Hopkins University Press. Google ScholarDigital Library
- Rajeev Gupta, Krithi Ramamritham, and Mukesh K. Mohania. 2013. Ratio threshold queries over distributed data sources. In Proceedings of the VLDB Endowment 6, 8 (2013), 565--576. Google ScholarDigital Library
- Didier Henrion, Jean-Bernard Lasserre, and Johan Löfberg. 2009. GloptiPoly 3: Moments, optimization and semidefinite programming. 24, 4--5 (2009), 761--779. Google ScholarDigital Library
- Ling Huang, Michael I. Jordan, Anthony Joseph, Minos Garofalakis, and Nina Taft. 2006. In-network PCA and anomaly detection. In In NIPS. 617--624. Google ScholarDigital Library
- Ling Huang, XuanLong Nguyen, Minos N. Garofalakis, Joseph M. Hellerstein, Michael I. Jordan, Anthony D. Joseph, and Nina Taft. 2007. Communication-efficient online detection of network-wide anomalies. In INFOCOM. 134--142. Google ScholarDigital Library
- Antonios Igglezakis, Antonios Deligiannakis, and Aggelos Bletsas. 2014. Geometric monitoring for CSI reduction in amplify-and-forward relay networks. In ICASSP. 2729--2733.Google Scholar
- S. M. Riazul Islam, Daehan Kwak, M. D. Humaun Kabir, Mahmud Hossain, and Kyung-Sup Kwak. 2015. The internet of things for health care: A comprehensive survey. IEEE Access 3 (2015), 678--708.Google ScholarCross Ref
- S. Ratnasamy Jain, J. M. Hellerstein, and D. Wetherall. 2004. A wakeup call for internet monitoring systems: The case for distributed triggers. In HotNets-III. 1--6.Google Scholar
- Jiong Jin, Jayavardhana Gubbi, Slaven Marusic, and Marimuthu Palaniswami. 2014. An information framework for creating a smart city through internet of things. IEEE Internet Things J. 1, 2 (2014), 112--121.Google ScholarCross Ref
- Bhargav Kanagal and Amol Deshpande. 2008. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE. 1160--1169. Google ScholarDigital Library
- Srinivas R. Kashyap, Jeyashankher Ramamirtham, Rajeev Rastogi, and Pushpraj Shukla. 2008. Efficient constraint monitoring using adaptive thresholds. In ICDE. 526--535. Google ScholarDigital Library
- Ram Keralapura, Graham Cormode, and Jeyashankher Ramamirtham. 2006. Communication-efficient distributed monitoring of thresholded counts. In SIGMOD. 289--300. Google ScholarDigital Library
- Daniel Keren, Guy Sagy, Amir Abboud, David Ben-David, Assaf Schuster, Izchak Sharfman, and Antonios Deligiannakis. 2014. Geometric monitoring of heterogeneous streams. TKDE 26, 8 (2014), 1890--1903.Google ScholarCross Ref
- Daniel Keren, Izchak Sharfman, Assaf Schuster, and Avishay Livne. 2012. Shape sensitive geometric monitoring. TKDE 24, 8 (2012), 1520--1535. Google ScholarDigital Library
- Anukool Lakhina, Mark Crovella, and Christophe Diot. 2004. Diagnosing network-wide traffic anomalies. In SIGCOMM. 219--230. Google ScholarDigital Library
- Arnon Lazerson, Moshe Gabel, Daniel Keren, and Assaf Schuster. 2017. One for all and all for one: Simultaneous approximation of multiple functions over distributed streams. In DEBS. 203--214. Google ScholarDigital Library
- Arnon Lazerson, Daniel Keren, and Assaf Schuster. 2016. Lightweight monitoring of distributed streams. In KDD. 1685--1694. Google ScholarDigital Library
- Arnon Lazerson, Izchak Sharfman, Daniel Keren, Assaf Schuster, Minos N. Garofalakis, and Vasilis Samoladas. 2015. Monitoring distributed streams using convex decompositions. PVLDB 8, 5 (2015), 545--556. Google ScholarDigital Library
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5 (2004), 361--397. Google ScholarDigital Library
- Feifei Li, Ke Yi, and Jeffrey Jestes. 2009. Ranking distributed probabilistic data. In SIGMOD. 361--374. Google ScholarDigital Library
- Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: Unified and discriminative influence model for inferring home locations. In KDD. 1023--1031. Google ScholarDigital Library
- Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. 2005. TinyDB: An acquisitional query processing system for sensor networks. ACM Trans. Database Syst. 30, 1 (2005), 122--173. Google ScholarDigital Library
- Sebastian Michel, Peter Triantafillou, and Gerhard Weikum. 2005. KLEE: A framework for distributed top-k query algorithms. In Proceedings of the VLDB Endowment. 637--648. Google ScholarDigital Library
- Ilya S. Molchanov and Pedro Terán. 2003. Distance transforms for real-valued functions. J. Math. Anal. Appl. 278, 2 (2003), 472--484.Google ScholarCross Ref
- Oluwole Okunola, A. Uzairu, C. Gimba, and G. Ndukwe. 2012. Assessment of gaseous pollutants along high traffic roads in Kano, Nigeria. Intl. J. Environment Sustainability 1, 1 (2012).Google ScholarCross Ref
- Themis Palpanas. 2013. Real-time data analytics in sensor networks. In Managing and Mining Sensor Data. 173--210.Google Scholar
- Themistoklis Palpanas, Dimitris Papadopoulos, Vana Kalogeraki, and Dimitrios Gunopulos. 2003. Distributed deviation detection in sensor networks. SIGMOD Record 32, 4 (2003), 77--82. Google ScholarDigital Library
- Odysseas Papapetrou and Minos Garofalakis. 2014. Continuous fragmented skylines over distributed streams. In ICDE. 124--135.Google Scholar
- Jeff M. Phillips, Elad Verbin, and Qin Zhang. 2012. Lower bounds for number-in-hand multiparty communication complexity, made easy. In SODA. 486--501. Google ScholarDigital Library
- Mohammad Rouhani and Angel Domingo Sappa. 2012. Implicit polynomial representation through a fast fitting error estimation. IEEE T. Image. Process. 21, 4 (2012), 2089--2098. Google ScholarDigital Library
- Guy Sagy, Daniel Keren, Izchak Sharfman, and Assaf Schuster. 2010. Distributed threshold querying of general functions by a difference of monotonic representation. In Proceedings of the VLDB Endowment 4, 2 (2010), 46--57. Google ScholarDigital Library
- Shetal Shah and Krithi Ramamritham. 2008. Handling non-linear polynomial queries over dynamic data. In ICDE. 1043--1052. Google ScholarDigital Library
- IzchaK. Sharfman, Assaf Schuster, and Daniel Keren. 2006. A geometric approach to monitoring threshold functions over distributed data streams. In SIGMOD. 301--312. Google ScholarDigital Library
- IzchaK. Sharfman, Assaf Schuster, and Daniel Keren. 2007. Aggregate threshold queries in sensor networks. In IPDPS. 1--10.Google Scholar
- IzchaK. Sharfman, Assaf Schuster, and Daniel Keren. 2007. A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans. Database Syst. 32, 4 (2007), 23. Google ScholarDigital Library
- IzchaK. Sharfman, Assaf Schuster, and Daniel Keren. 2008. Shape sensitive geometric monitoring. In PODS. 301--310. Google ScholarDigital Library
- Marshall H. Stone. 1948. The generalized weierstrass approximation theorem. Math. Mag. 21, 5 (1948), 237--254.Google ScholarCross Ref
- Mingwang Tang, Feifei Li, Jeff M. Phillips, and Jeffrey Jestes. 2012. Efficient threshold monitoring for distributed probabilistic data. In ICDE. 1120--1131. Google ScholarDigital Library
- Ran Wolff. 2015. Distributed convex thresholding. In PODC. 325--334. Google ScholarDigital Library
- Ran Wolff, Kanishka Bhaduri, and Hillol Kargupta. 2009. A generic local algorithm for mining data streams in large distributed systems. TKDE 21, 4 (2009), 465--478. Google ScholarDigital Library
- James Yeh. 2006. Real Analysis: Theory of Measure and Integration Second Edition. World Scientific Publishing Company.Google ScholarCross Ref
- Gal Yehuda, Daniel Keren, and Islam Akaria. 2017. Monitoring properties of large, distributed, dynamic graphs. In IPDPS. 2--11.Google Scholar
- B.-K. Yi, Nikolaos D. Sidiropoulos, Theodore Johnson, H. V. Jagadish, Christos Faloutsos, and Alexandros Biliris. 2000. Online data mining for co-evolving time sequences. In ICDE. 13--22. Google ScholarDigital Library
- Yunyue Zhu and Dennis Shasha. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB. 358--369. Google ScholarDigital Library
Index Terms
- Lightweight Monitoring of Distributed Streams
Recommendations
Lightweight Monitoring of Distributed Streams
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAs data becomes dynamic, large, and distributed, there is increasing demand for what have become known as distributed stream algorithms. Since continuously collecting the data to a central server and processing it there incurs very high communication ...
Approximate continuous querying over distributed streams
While traditional database systems optimize for performance on one-shot query processing, emerging large-scale monitoring applications require continuous tracking of complex data-analysis queries over collections of physically distributed streams. Thus, ...
Sketching distributed sliding-window data streams
While traditional data management systems focus on evaluating single, ad hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...
Comments