ABSTRACT
Time-series of count data are generated in many different contexts, such as web access logging, freeway traffic monitoring, and security logs associated with buildings. Since this data measures the aggregated behavior of individual human beings, it typically exhibits a periodicity in time on a number of scales (daily, weekly,etc.) that reflects the rhythms of the underlying human activity and makes the data appear non-homogeneous. At the same time, the data is often corrupted by a number of bursty periods of unusual behavior such as building events, traffic accidents, and so forth. The data mining problem of finding and extracting these anomalous events is made difficult by both of these elements. In this paper we describe a framework for unsupervised learning in this context, based on a time-varying Poisson process model that can also account for anomalous events. We show how the parameters of this model can be learned from count time series using statistical estimation techniques. We demonstrate the utility of this model on two datasets for which we have partial ground truth in the form of known events, one from freeway traffic data and another from building access data, and show that the model performs significantly better than a non-probabilistic, threshold-based technique. We also describe how the model can be used to investigate different degrees of periodicity in the data, including systematic day-of-week and time-of-day effects, and make inferences about the detected events (e.g., popularity or level of attendance). Our experimental results indicate that the proposed time-varying Poisson model provides a robust and accurate framework for adaptively and autonomously learning how to separate unusual bursty events from traces of normal human activity.
- E. Keogh, S. Lonardi, and B. Y. chi' Chiu, "Finding surprising patterns in a time series database in linear time and space," in KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM Press, 2002, pp. 550--556. Google ScholarDigital Library
- V. Guralnik and J. Srivastava, "Event detection from time series data," in KDD '99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM Press, 1999, pp. 33--42. Google ScholarDigital Library
- M. Salmenkivi and H. Mannila, "Using markov chain monte carlo and dynamic programming for event sequence data," Knowledge and Information Systems, vol. 7, no. 3, pp. 267--288, 2005. Google ScholarDigital Library
- J. Kleinberg, "Bursty and hierarchical structure in streams," in KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM Press, 2002, pp. 91--101. Google ScholarDigital Library
- S. L. Scott and P. Smyth, "The Markov modulated Poisson process and Markov Poisson cascade with applications to web traffic data," Bayesian Statistics, vol. 7, pp. 671--680, 2003.Google Scholar
- S. Scott, "Detecting network intrusion using a Markov modulated nonhomogeneous Poisson process," http://www-rcf.usc.edu/~sls/mmnhpp.ps.gz.Google Scholar
- Freeway Performance Measurement System (PeMS), "http://pems.eecs.berkeley.edu/."Google Scholar
- S. Scott, "Bayesian methods and extensions for the two state Markov modulated Poisson process," Ph.D. dissertation, Harvard University, Dept. of Statistics, 1998.Google Scholar
- H. Heffes and D. M. Lucantoni, "A Markov-modulated characterization of packetized voice and data traffic and related statistical multiplexer performance," IEEE J. Sel. Areas Comm., vol. 4, no. 6, pp. 856--868, 1984.Google ScholarDigital Library
- S. Geman and D. Geman, "Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images," IEEE Trans. PAMI, vol. 6, no. 6, pp. 721--741, Nov. 1984.Google ScholarDigital Library
- A. E. Gelfand and A. F. M. Smith, "Sampling-based approaches to calculating marginal densities," J. Amer. Stat. Assoc., vol. 85, pp. 398--409, 1990.Google ScholarCross Ref
- L. E. Baum, T. Petrie, G. Soules, and N. Weiss, "A maximization technique occurring in statistical analysis of probabilistic functions of Markov chains," Ann. Math. Stat., vol. 41, no. 1, pp. 164--171, February 1970.Google ScholarCross Ref
- A. E. Gelfand and D. K. Dey, "Bayesian model choice: asymptotics and exact calculations," J. R. Stat. Soc. B, vol. 56, no. 3, pp. 501--514, 1990.Google Scholar
- S. Chib, "Marginal likelihood from the Gibbs output," J. Amer. Stat. Assoc., vol. 90, no. 432, pp. 1313--1321, Dec. 1995.Google ScholarCross Ref
Index Terms
- Adaptive event detection with time-varying poisson processes
Recommendations
Learning to detect events with Markov-modulated poisson processes
Time-series of count data occur in many different contexts, including Internet navigation logs, freeway traffic monitoring, and security logs associated with buildings. In this article we describe a framework for detecting anomalous events in such data ...
Unsupervised Event Detection with Infinite Poisson Mixture Model
BIGDATACONGRESS '15: Proceedings of the 2015 IEEE International Congress on Big DataLarge amount of time series data generated by sensors and Web users is great source of contextual information. Detecting outliers with unusually high values in time series data is crucial for inferring about any events in the real world. In this work, ...
Improving Event Detection by Automatically Assessing Validity of Event Occurrence in Text
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementManually inspecting text to assess whether an event occurs in a document collection is an onerous and time consuming task. Although a manual inspection to discard the false events would increase the precision of automatically detected sets of events, it ...
Comments