ABSTRACT
Efficiently searching for similarities among time series and discovering interesting patterns is an important and non-trivial problem with applications in many domains. The high dimensionality of the data makes the analysis very challenging. To solve this problem, many dimensionality reduction methods have been proposed. PCA (Piecewise Constant Approximation) and its variant have been shown efficient in time series indexing and similarity retrieval. However, in certain applications, too many false alarms introduced by the approximation may reduce the overall performance dramatically. In this paper, we introduce a new piecewise dimensionality reduction technique that is based on Vector Quantization. The new technique, PVQA (Piecewise Vector Quantized Approximation), partitions each sequence into equi-length segments and uses vector quantization to represent each segment by the closest (based on a distance metric) codeword from a codebook of key-sequences. The efficiency of calculations is improved due to the significantly lower dimensionality of the new representation. We demonstrate the utility and efficiency of the proposed technique on real and simulated datasets. By exploiting prior knowledge about the data, the proposed technique generally outperforms PCA and its variants in similarity searches.
- Gersho, A. & Gray R. M. (1992). Vector Quantization and Signal Compression. Kluwer Academic, Boston. Google ScholarDigital Library
- Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra, S. (2000). "Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases", Knowledge and Information Systems 3(3): 263--286.Google ScholarCross Ref
- Lin, J., Keogh, E., Patel, P. & Lonardi, S. (2002). "Finding motifs in time series", 2nd Workshop on Temporal Data Mining at the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23-26. Edmonton, Alberta, Canada.Google Scholar
- Lloyd, S. P. (1982). "Least squares quantization in PCM", IEEE Transactions on Information Theory, IT(28), pp. 127--135.Google Scholar
- Stanford Genomic Resources. http://genome-www.stanford.edu/nci60Google Scholar
- UCI KDD Archive. http://kdd.ics.uci.eduGoogle Scholar
- Yi, B-K & Faloutsos, C. (2000). "Fast Time Sequence Indexing for Arbitrary Lp Norms", in Proceedings of the VLDB, Cairo, Egypt, pp. 385--394. Google ScholarDigital Library
Index Terms
- A dimensionality reduction technique for efficient similarity analysis of time series databases
Recommendations
A dimensionality reduction technique for efficient time series similarity analysis
We propose a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. In contrast to piecewise constant approximation (PCA) techniques that approximate each time series ...
Dimensionality reduction-based spoken emotion recognition
To improve effectively the performance on spoken emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space. In this paper, a new supervised ...
Dimensionality Reduction and Similarity Computation by Inner-Product Approximations
As databases increasingly integrate different types of information such as multimedia, spatial, time-series, and scientific data, it becomes necessary to support efficient retrieval of multidimensional data. Both the dimensionality and the amount of ...
Comments