skip to main content
research-article

Lightweight Monitoring of Distributed Streams

Authors Info & Claims
Published:31 July 2018Publication History
Skip Abstract Section

Abstract

As data becomes dynamic, large, and distributed, there is increasing demand for what have become known as distributed stream algorithms. Since continuously collecting the data to a central server and processing it there is infeasible, a common approach is to define local conditions at the distributed nodes, such that—as long as they are maintained—some desirable global condition holds.

Previous methods derived local conditions focusing on communication efficiency. While proving very useful for reducing the communication volume, these local conditions often suffer from heavy computational burden at the nodes. The computational complexity of the local conditions affects both the runtime and the energy consumption. These are especially critical for resource-limited devices like smartphones and sensor nodes. Such devices are becoming more ubiquitous due to the recent trend toward smart cities and the Internet of Things. To accommodate for high data rates and limited resources of these devices, it is crucial that the local conditions be quickly and efficiently evaluated.

Here we propose a novel approach, designated CB (for Convex/Concave Bounds). CB defines local conditions using suitably chosen convex and concave functions. Lightweight and simple, these local conditions can be rapidly checked on the fly. CB’s superiority over the state-of-the-art is demonstrated in its reduced runtime and power consumption, by up to six orders of magnitude in some cases. As an added bonus, CB also reduced communication overhead in all the tested application scenarios.

References

  1. Amir Ali Ahmadi and Georgina Hall. 2015. DC decomposition of nonconvex polynomials with algebraic techniques. Mathematical Programming (2015), 1--26.Google ScholarGoogle Scholar
  2. Chrisil Arackaparambil, Joshua Brody, and Amit Chakrabarti. 2009. Functional monitoring without monotonicity. In ICALP. 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock and C. Olston. 2003. Distributed top-k monitoring. In SIGMOD. 28--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Shivnath Babu and Jennifer Widom. 2001. Continuous queries over data streams. SIGMOD 30, 3 (2001), 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Marco Balduini, Irene Celino, Daniele Dell’Aglio, Emanuele Della Valle, Yi Huang, Tony Lee, Seon-Ho Kim, and Volker Tresp. 2012. BOTTARI: An augmented reality mobile application to deliver personalized and location-based recommendations by continuous analysis of social media streams. Web Semant. 16 (2012), 33--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. 2002. Counting distinct elements in a data stream. In RANDOM. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Steven Bird. 2006. NLTK: The natural language toolkit. In COLING/ACL. 69--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joshua Brody and Amit Chakrabarti. 2009. A multi-round communication lower bound for gap hamming and some consequences. In CCC. 358--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sabbas Burdakis and Antonios Deligiannakis. 2012. Detecting outliers in sensor networks using the geometric approach. In ICDE. 1108--1119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Graham Cormode. 2013. The continuous distributed monitoring model. SIGMOD Rec. 42, 1 (2013), 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Graham Cormode and Minos N. Garofalakis. 2005. Sketching streams through the net: Distributed approximate query tracking. In VLDB. 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Graham Cormode and Minos N. Garofalakis. 2008. Approximate continuous querying over distributed streams. ACM Trans. Database Syst. 33, 2 (2008), 1--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Abhinandan Das, Sumit Ganguly, Minos N. Garofalakis, and Rajeev Rastogi. 2004. Distributed set expression cardinality estimation. In VLDB. 312--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mark Dilman and Danny Raz. 2002. Efficient reactive monitoring. SAC 20, 4 (2002), 668--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Manfredo P. Do Carmo. 2016. Differential Geometry of Curves and Surfaces: Revised and Updated (2nd ed.). Courier Dover Publications.Google ScholarGoogle Scholar
  17. Ky Fan. 1949. On a theorem of weyl concerning eigenvalues of linear transformations I. In Proc. Natl. Acad. Sci. U.S.A 35, 11 (1949), 652--655.Google ScholarGoogle ScholarCross RefCross Ref
  18. Arik Friedman, Izchak Sharfman, Daniel Keren, and Assaf Schuster. 2014. Privacy-preserving distributed stream monitoring. In NDSS. 1--12.Google ScholarGoogle Scholar
  19. Moshe Gabel, Daniel Keren, and Assaf Schuster. 2015. Monitoring least squares models of distributed streams. In SIGKDD. ACM, 319--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Moshe Gabel, Daniel Keren, and Assaf Schuster. 2017. Anarchists, unite: Practical entropy approximation for distributed streams. KDD, 837--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Moshe Gabel, Assaf Schuster, and Daniel Keren. 2014. Communication-efficient distributed variance monitoring and outlier detection for multivariate time series. In IPDPS. 37--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Minos N. Garofalakis, Daniel Keren, and Vasilis Samoladas. 2013. Sketch-based geometric monitoring of distributed stream queries. PVLDB 6, 10 (2013), 937--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis, Izchak Sharfman, and Assaf Schuster. 2014. Distributed geometric query monitoring using prediction models. TODS 39, 2 (2014), 16:1--16:42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nikos Giatrakos, Antonios Deligiannakis, Minos N. Garofalakis, Izchak Sharfman, and Assaf Schuster. 2012. Prediction-based geometric monitoring over distributed data streams. In SIGMOD. 265--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. H. Golub and C. F. Van Loan. 1996. Matrix Computations, (3rd ed.). Johns Hopkins University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rajeev Gupta, Krithi Ramamritham, and Mukesh K. Mohania. 2013. Ratio threshold queries over distributed data sources. In Proceedings of the VLDB Endowment 6, 8 (2013), 565--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Didier Henrion, Jean-Bernard Lasserre, and Johan Löfberg. 2009. GloptiPoly 3: Moments, optimization and semidefinite programming. 24, 4--5 (2009), 761--779. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ling Huang, Michael I. Jordan, Anthony Joseph, Minos Garofalakis, and Nina Taft. 2006. In-network PCA and anomaly detection. In In NIPS. 617--624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ling Huang, XuanLong Nguyen, Minos N. Garofalakis, Joseph M. Hellerstein, Michael I. Jordan, Anthony D. Joseph, and Nina Taft. 2007. Communication-efficient online detection of network-wide anomalies. In INFOCOM. 134--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Antonios Igglezakis, Antonios Deligiannakis, and Aggelos Bletsas. 2014. Geometric monitoring for CSI reduction in amplify-and-forward relay networks. In ICASSP. 2729--2733.Google ScholarGoogle Scholar
  31. S. M. Riazul Islam, Daehan Kwak, M. D. Humaun Kabir, Mahmud Hossain, and Kyung-Sup Kwak. 2015. The internet of things for health care: A comprehensive survey. IEEE Access 3 (2015), 678--708.Google ScholarGoogle ScholarCross RefCross Ref
  32. S. Ratnasamy Jain, J. M. Hellerstein, and D. Wetherall. 2004. A wakeup call for internet monitoring systems: The case for distributed triggers. In HotNets-III. 1--6.Google ScholarGoogle Scholar
  33. Jiong Jin, Jayavardhana Gubbi, Slaven Marusic, and Marimuthu Palaniswami. 2014. An information framework for creating a smart city through internet of things. IEEE Internet Things J. 1, 2 (2014), 112--121.Google ScholarGoogle ScholarCross RefCross Ref
  34. Bhargav Kanagal and Amol Deshpande. 2008. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE. 1160--1169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Srinivas R. Kashyap, Jeyashankher Ramamirtham, Rajeev Rastogi, and Pushpraj Shukla. 2008. Efficient constraint monitoring using adaptive thresholds. In ICDE. 526--535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ram Keralapura, Graham Cormode, and Jeyashankher Ramamirtham. 2006. Communication-efficient distributed monitoring of thresholded counts. In SIGMOD. 289--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Daniel Keren, Guy Sagy, Amir Abboud, David Ben-David, Assaf Schuster, Izchak Sharfman, and Antonios Deligiannakis. 2014. Geometric monitoring of heterogeneous streams. TKDE 26, 8 (2014), 1890--1903.Google ScholarGoogle ScholarCross RefCross Ref
  38. Daniel Keren, Izchak Sharfman, Assaf Schuster, and Avishay Livne. 2012. Shape sensitive geometric monitoring. TKDE 24, 8 (2012), 1520--1535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Anukool Lakhina, Mark Crovella, and Christophe Diot. 2004. Diagnosing network-wide traffic anomalies. In SIGCOMM. 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Arnon Lazerson, Moshe Gabel, Daniel Keren, and Assaf Schuster. 2017. One for all and all for one: Simultaneous approximation of multiple functions over distributed streams. In DEBS. 203--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Arnon Lazerson, Daniel Keren, and Assaf Schuster. 2016. Lightweight monitoring of distributed streams. In KDD. 1685--1694. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Arnon Lazerson, Izchak Sharfman, Daniel Keren, Assaf Schuster, Minos N. Garofalakis, and Vasilis Samoladas. 2015. Monitoring distributed streams using convex decompositions. PVLDB 8, 5 (2015), 545--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5 (2004), 361--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Feifei Li, Ke Yi, and Jeffrey Jestes. 2009. Ranking distributed probabilistic data. In SIGMOD. 361--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: Unified and discriminative influence model for inferring home locations. In KDD. 1023--1031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. 2005. TinyDB: An acquisitional query processing system for sensor networks. ACM Trans. Database Syst. 30, 1 (2005), 122--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sebastian Michel, Peter Triantafillou, and Gerhard Weikum. 2005. KLEE: A framework for distributed top-k query algorithms. In Proceedings of the VLDB Endowment. 637--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ilya S. Molchanov and Pedro Terán. 2003. Distance transforms for real-valued functions. J. Math. Anal. Appl. 278, 2 (2003), 472--484.Google ScholarGoogle ScholarCross RefCross Ref
  49. Oluwole Okunola, A. Uzairu, C. Gimba, and G. Ndukwe. 2012. Assessment of gaseous pollutants along high traffic roads in Kano, Nigeria. Intl. J. Environment Sustainability 1, 1 (2012).Google ScholarGoogle ScholarCross RefCross Ref
  50. Themis Palpanas. 2013. Real-time data analytics in sensor networks. In Managing and Mining Sensor Data. 173--210.Google ScholarGoogle Scholar
  51. Themistoklis Palpanas, Dimitris Papadopoulos, Vana Kalogeraki, and Dimitrios Gunopulos. 2003. Distributed deviation detection in sensor networks. SIGMOD Record 32, 4 (2003), 77--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Odysseas Papapetrou and Minos Garofalakis. 2014. Continuous fragmented skylines over distributed streams. In ICDE. 124--135.Google ScholarGoogle Scholar
  53. Jeff M. Phillips, Elad Verbin, and Qin Zhang. 2012. Lower bounds for number-in-hand multiparty communication complexity, made easy. In SODA. 486--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Mohammad Rouhani and Angel Domingo Sappa. 2012. Implicit polynomial representation through a fast fitting error estimation. IEEE T. Image. Process. 21, 4 (2012), 2089--2098. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Guy Sagy, Daniel Keren, Izchak Sharfman, and Assaf Schuster. 2010. Distributed threshold querying of general functions by a difference of monotonic representation. In Proceedings of the VLDB Endowment 4, 2 (2010), 46--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Shetal Shah and Krithi Ramamritham. 2008. Handling non-linear polynomial queries over dynamic data. In ICDE. 1043--1052. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. IzchaK. Sharfman, Assaf Schuster, and Daniel Keren. 2006. A geometric approach to monitoring threshold functions over distributed data streams. In SIGMOD. 301--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. IzchaK. Sharfman, Assaf Schuster, and Daniel Keren. 2007. Aggregate threshold queries in sensor networks. In IPDPS. 1--10.Google ScholarGoogle Scholar
  59. IzchaK. Sharfman, Assaf Schuster, and Daniel Keren. 2007. A geometric approach to monitoring threshold functions over distributed data streams. ACM Trans. Database Syst. 32, 4 (2007), 23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. IzchaK. Sharfman, Assaf Schuster, and Daniel Keren. 2008. Shape sensitive geometric monitoring. In PODS. 301--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Marshall H. Stone. 1948. The generalized weierstrass approximation theorem. Math. Mag. 21, 5 (1948), 237--254.Google ScholarGoogle ScholarCross RefCross Ref
  62. Mingwang Tang, Feifei Li, Jeff M. Phillips, and Jeffrey Jestes. 2012. Efficient threshold monitoring for distributed probabilistic data. In ICDE. 1120--1131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ran Wolff. 2015. Distributed convex thresholding. In PODC. 325--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Ran Wolff, Kanishka Bhaduri, and Hillol Kargupta. 2009. A generic local algorithm for mining data streams in large distributed systems. TKDE 21, 4 (2009), 465--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. James Yeh. 2006. Real Analysis: Theory of Measure and Integration Second Edition. World Scientific Publishing Company.Google ScholarGoogle ScholarCross RefCross Ref
  66. Gal Yehuda, Daniel Keren, and Islam Akaria. 2017. Monitoring properties of large, distributed, dynamic graphs. In IPDPS. 2--11.Google ScholarGoogle Scholar
  67. B.-K. Yi, Nikolaos D. Sidiropoulos, Theodore Johnson, H. V. Jagadish, Christos Faloutsos, and Alexandros Biliris. 2000. Online data mining for co-evolving time sequences. In ICDE. 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Yunyue Zhu and Dennis Shasha. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB. 358--369. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Lightweight Monitoring of Distributed Streams

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 43, Issue 2
          Best of ICDT 2017 and Regular Papers
          June 2018
          154 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/3243648
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 July 2018
          • Accepted: 1 May 2018
          • Revised: 1 March 2018
          • Received: 1 September 2017
          Published in tods Volume 43, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader