Abstract
This paper describes an unsupervised algorithm for segmenting categorical time series. The algorithm first collects statistics about the frequency and boundary entropy of ngrams, then passes a window over the series and has two “expert methods” decide where in the window boundaries should be drawn. The algorithm segments text into words successfully in three languages. We claim that the algorithm finds meaningful episodes in categorical time series, because it exploits two statistical characteristics of meaningful episodes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cohen, Paul. Fluent learning: Elucidating the structure of episodes. This volume.
M. Garofalakis, R. Rastogi, and K. Shim. Spirit: sequential pattern mining with regular expression constraints. In Proc. of the VLDB Conference, Edinburgh, Scotland, September 1999.
Magerman D. and Marcus, M. 1990. Parsing a natural language using mutual information statistics. In Proceedings of AAAI-90, Eighth National Conference on Artificial Intelligence, 984989
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3), 1997.
Nevill-Manning, C.G. and Witten, I.H. (1997) Identifying Hierarchical Structure in Sequences: A linear-time algorithm, Volume 7, pages 67–82.
Tim Oates, Laura Firoiu, Paul Cohen. Using Dynamic Time Warping to Bootstrap HMM-Based Clustering of Time Series. In Sequence Learning: Paradigms, Algorithms and Applications. Ron Sun and C. L. Giles (Eds.) Springer-Verlag: LNAI 1828. 2001
Paola Sebastiani, Marco Ramoni, Paul Cohen. Sequence Learning via Bayesian Clustering by Dynamics. In Sequence Learning: Paradigms, Algorithms and Applications. Ron Sun and C. L. Giles (Eds.) Springer-Verlag: LNAI 1828. 2001
Teahan, W.J., Y. Wen, R. McNab and I.H. Witten. A compression-based algorithm for Chinese word segmentation. Computational Linguistics, v 26, no 3, September, 2000, p 375–393.
Weiss, G. M., and Hirsh, H. 1998. Learning to Predict Rare Events in Categorical Time-Series Data, Proceedings of the 1998 AAAI/ICML Workshop on Time-Series Analysis, Madison, Wisconsin.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cohen, P., Adams, N. (2001). An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds) Advances in Intelligent Data Analysis. IDA 2001. Lecture Notes in Computer Science, vol 2189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44816-0_20
Download citation
DOI: https://doi.org/10.1007/3-540-44816-0_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42581-6
Online ISBN: 978-3-540-44816-7
eBook Packages: Springer Book Archive