research-article

Learning Based Distributed Tracking

Authors:
Hao WU

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

,
Junhao Gan

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

,
Rui Zhang

The University of Melbourne, Melbourne, VIC, Australia

The University of Melbourne, Melbourne, VIC, Australia
View Profile

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAugust 2020Pages 2040–2050https://doi.org/10.1145/3394486.3403255

Published:20 August 2020Publication History

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 2040–2050

ABSTRACT

Inspired by the great success of machine learning in the past decade, people have been thinking about the possibility of improving the theoretical results by exploring data distribution. In this paper, we revisit a fundamental problem called Distributed Tracking (DT) under an assumption that the data follows a certain (known or unknown) distribution, and propose a number Data-dependent algorithms with improved theoretical bounds. Informally, in the DT problem, there is a coordinator and k players, where the coordinator holds a threshold N and each player has a counter. At each time stamp, at most one counter can be increased by one. The job of the coordinator is to capture the exact moment when the sum of all these k counters reaches N. The goal is to minimise the communication cost. While our first type of algorithms assume the concrete data distribution is known in advance, our second type of algorithms can learn the distribution on the fly. Both of the algorithms achieve a communication cost bounded by O(k log log N) with high probability, improving the state-of-the-art data-independent bound O(k log N/k). We further propose a number of implementation optimisation heuristics to improve both efficiency and robustness of the algorithms. Finally, we conduct extensive experiments on three real datasets and four synthetic datasets. The experimental results show that the communication cost of our algorithms is as least as $20%$ of that of the state-of-the-art algorithms.

Supplemental Material

3394486.3403255.mp4

mp4

276.6 MB

Download

References

Anders Aamand, Piotr Indyk, and Ali Vakilian. 2019. (Learned) Frequency Estimation Algorithms under Zipfian Distribution. CoRR, Vol. abs/1908.05198 (2019). arxiv: 1908.05198Google Scholar
Jean-Yves Audibert, Ré mi Munos, and Csaba Szepesvá ri. 2009. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci., Vol. 410, 19 (2009), 1876--1902.Google ScholarDigital Library
Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Workshop Track Proceedings.Google Scholar
Fan R. K. Chung and Lincoln Lu. 2006. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Mathematics, Vol. 3, 1 (2006), 79--127.Google ScholarCross Ref
Graham Cormode. 2013. The continuous distributed monitoring model. SIGMOD Record, Vol. 42, 1 (2013), 5--14.Google ScholarDigital Library
Graham Cormode, Minos N. Garofalakis, S. Muthukrishnan, and Rajeev Rastogi. 2005. Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14--16, 2005. 25--36.Google ScholarDigital Library
Graham Cormode, S. Muthukrishnan, and Ke Yi. 2011. Algorithms for distributed functional monitoring. ACM Trans. Algorithms, Vol. 7, 2 (2011), 21:1--21:20.Google ScholarDigital Library
Nikos Giatrakos, Antonios Deligiannakis, Minos N. Garofalakis, Izchak Sharfman, and Assaf Schuster. 2012. Prediction-based geometric monitoring over distributed data streams. In Proceedings of the International Conference on Management of Data, SIGMOD, Scottsdale, AZ, USA, May 20--24, 2012. 265--276.Google ScholarDigital Library
Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. 2019. Learning-Based Frequency Estimation Algorithms. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019.Google Scholar
Zengfeng Huang, Ke Yi, and Qin Zhang. 2019. Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks. Algorithmica, Vol. 81, 6 (2019), 2222--2243.Google ScholarDigital Library
Ram Keralapura, Graham Cormode, and Jeyashankher Ramamirtham. 2006. Communication-efficient distributed monitoring of thresholded counts. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27--29, 2006. 289--300.Google ScholarDigital Library
Elias B. Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning Combinatorial Optimization Algorithms over Graphs. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA. 6348--6358.Google Scholar
David Kotz, Tristan Henderson, Ilya Abyzov, and Jihwang Yeo. 2009. CRAWDAD dataset dartmouth/campus (v. 2009-09-09).Google Scholar
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD 2018. 489--504.Google ScholarDigital Library
Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3--8 December 2018, Montré al, Canada. 462--471.Google Scholar
Miao Qiao, Junhao Gan, and Yufei Tao. 2016. Range Thresholding on Streams. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016. 571--582.Google ScholarDigital Library
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. In Annual Conference on Neural Information Processing Systems (NeuIPS), December 7--12, 2015, Montreal, Quebec, Canada. 2692--2700.Google Scholar

Index Terms

Learning Based Distributed Tracking
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic algorithms

Recommendations

Randomized algorithms for tracking distributed count, frequencies, and ranks
PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems

We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the count-tracking problem, where there are k players, each holding a counter n_i that gets incremented over time, and ...
Read More
Perfect $L_p$ Sampling in a Data Stream

In this paper, we resolve the one-pass space complexity of perfect $L_p$ sampling for $p \in (0,2)$ in a stream. Given a stream of updates (insertions and deletions) to the coordinates of an underlying vector $f \in \mathbb{R}^n$, a perfect $L_p$ sampler ...
Read More
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

In this paper, we present near-optimal space bounds for L_p-samplers. Given a stream of updates (additions and subtraction) to the coordinates of an underlying vector x in Rⁿ, a perfect L_p sampler outputs the i-th coordinate with probability x_i^pxpp. In ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithms
distributed tracking
machine learning
sampling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 212
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning Based Distributed Tracking

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Randomized algorithms for tracking distributed count, frequencies, and ranks

Perfect $L_p$ Sampling in a Data Stream

Tight bounds for Lp samplers, finding duplicates in streams, and related problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning Based Distributed Tracking

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Randomized algorithms for tracking distributed count, frequencies, and ranks

Perfect $L_p$ Sampling in a Data Stream

Tight bounds for Lp samplers, finding duplicates in streams, and related problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media