Article

Locally adaptive dimensionality reduction for indexing large time series databases

Authors:
Eamonn Keogh

Department of Information and Computer Science, University of California, Irvine, California

Department of Information and Computer Science, University of California, Irvine, California
View Profile

,
Kaushik Chakrabarti

Department of Information and Computer Science, University of California, Irvine, California

Department of Information and Computer Science, University of California, Irvine, California
View Profile

,
Michael Pazzani

Department of Information and Computer Science, University of California, Irvine, California

Department of Information and Computer Science, University of California, Irvine, California
View Profile

,
Sharad Mehrotra

Department of Information and Computer Science, University of California, Irvine, California

Department of Information and Computer Science, University of California, Irvine, California
View Profile

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of dataMay 2001Pages 151–162https://doi.org/10.1145/375663.375680

Published:01 May 2001Publication History

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data

Pages 151–162

ABSTRACT

Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.

References

1.Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases. Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms. Google ScholarDigital Library
2.Agrawal, R., Psaila, G., Wimmers, E. L., & Zait, M. (1995). Querying shapes of histories. Proceedings of the 21st International Conference on Very Large Databases. Google ScholarDigital Library
3.Agrawal, R., Lin, K. I., Sawhney, H. S., & Shim, K. (1995). Fast similarity search in the presence of noise, scaling, and translation in times-series databases. Proceedings of 21th International Conference on Very Large Data Bases. Zurich. pp 490-50. Google ScholarDigital Library
4.Bay, S. D. (2000). The UCI KDD Archive {http://kdd.ics.uci.edu}. Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar
5.Bennett, K., Fayyad, U. & Geiger. D. (1999). Density-based indexing for approximate nearest-neighbor queries. Proceedings 5th International Conference on Knowledge Discovery and Data Mining. pp. 233-243, ACM Press, New York. Google ScholarDigital Library
6.Chakrabarti, K & Mehrotra, S (2000). Local dimensionality reduction: A new approach to indexing high dimensional spaces. Proceedings of the 26th Conference on Very Large Databases, Cairo, Egypt. Google ScholarDigital Library
7.Chakrabarti, K & Mehrotra, S (2000). Local dimensionality reduction: A new approach to indexing high dimensional spaces. Proceedings of the 26th Conference on Very Large Databases, Cairo, Egypt. Google ScholarDigital Library
8.Chakrabarti, K., Ortega-Binderberger, M., Porkaew, K & Mehrotra, S. (2000) Similar shape retrieval in MARS. Proceeding of IEEE International Conference on Multimedia and Expo.Google ScholarCross Ref
9.Chan, K. & Fu, W. (1999). Efficient time series matching by wavelets. Proceedings of the 15th IEEE International Conference on Data Engineering. Google ScholarDigital Library
10.Chandrasekaran, S., Manjunath, B.S., Wang, Y. F. Winkeler, J. & Zhang. H. (1997). An eigenspace update algorithm for image analysis. Graphical Models and Image Processing, Vol. 59, No. 5, pp. 321-332. Google ScholarDigital Library
11.Chu, K & Wong, M. (1999). Fast time-series searching with scaling and shifting. Proceedings of the 18th ACM Symposium on Principles of Database Systems, Philadelphia. Google ScholarDigital Library
12.Das, G., Lin, K. Mannila, H., Renganathan, G., & Smyth, P. (1998). Rule discovery from time series. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 16-22.Google Scholar
13.Debregeas, A. & Hebrail, G. (1998). Interactive interpretation of Kohonen maps applied to curves. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 179-183.Google Scholar
14.Evangelidis, G., Lomet, D. & Salzberg B (1997). The hB-Pi-Tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB Journal 6(1): 1-25. Google ScholarDigital Library
15.Faloutsos, C., Jagadish, H., Mendelzon, A. & Milo, T. (1997). A signature technique for similarity-based queries. SEQUENCES 97, Positano-Salerno, Italy. Google ScholarDigital Library
16.Faloutsos, C., Ranganathan, M., & Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases. Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. Minneapolis. Google ScholarDigital Library
17.Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. Proceedings ACM SIGMOD Conference. pp 47-57. Google ScholarDigital Library
18.Hellerstein, J. M., Papadimitriou, C. H., & Koutsoupias, E. (1997). Towards an analysis of indexing schemes. Sixteenth ACM Symposium on Principles of Database Systems. Google ScholarDigital Library
19.Hjaltason, G., Samet, H (1995). Ranking in spatial databases. Symposium on Large Spatial Databases. pp 83-95. Google ScholarDigital Library
20.Huang, Y. W., Yu, P. (1999). Adaptive Query processing for time-series data. Proceedings of the 5th International Conference of Knowledge Discovery and Data Mining. pp 282-286. Google ScholarDigital Library
21.Jonsson. H., & Badal. D. (1997). Using signature files for querying time-series data. First European Symposium on Principles of Data Mining and Knowledge Discovery. Google ScholarDigital Library
22.Kahveci, T. & Singh, A (2001). Variable length queries for time series data. Proceedings 17th International Conference on Data Engineering. Heidelberg, Germany. Google ScholarDigital Library
23.Kanth, K.V., Agrawal, D., & Singh, A. (1998). Dimensionality reduction for similarity searching in dynamic databases. Proceedings ACM SIGMOD Conf., pp. 166-176. Google ScholarDigital Library
24.Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra (2000) Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems.Google Scholar
25.Keogh, E. & Pazzani, M. (1999). Relevance feedback retrieval of time series data. Proceedings of the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
26.Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 239-241, AAAI Press.Google Scholar
27.Keogh, E., & Smyth, P. (1997). A probabilistic approach to fast pattern matching in time series databases. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 24-20.Google Scholar
28.Korn, F., Jagadish, H & Faloutsos. C. (1997). Efficiently supporting ad hoc queries in large datasets of time sequences. Proceedings of SIGMOD '97, Tucson, AZ, pp 289-300. Google ScholarDigital Library
29.Lam, S., & Wong, M (1998) A fast projection algorithm for sequence data searching. Data & Knowledge Engineering 28(3): 321-339. Google ScholarDigital Library
30.Li, C,. Yu, P. & Castelli V.(1998). MALM: A framework for mining sequence database at multiple abstraction levels. CIKM. pp 267-272. Google ScholarDigital Library
31.Loh, W., Kim, S & Whang, K. (2000). Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases. Proceedings 9th International Conference on Information and Knowledge Management. Google ScholarDigital Library
32.Moody, G. (2000). MIT-BIH Database DistributionGoogle Scholar
33.Ng, M. K., Huang, Z., & Hegland, M. (1998). Data-mining massive time series astronomical data sets - a case study. Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp 401-402 Google ScholarDigital Library
34.Park, S., Lee, D., & Chu, W. (1999). Fast retrieval of similar subsequences in long sequence databases. In 3rd IEEE Knowledge and Data Engineering Exchange Workshop. Google ScholarDigital Library
35.Pavlidis, T. (1976). Waveform segmentation through functional approximation. IEEE Transcations on Computers, Vol C-22, NO. 7 July.Google Scholar
36.Perng, C., Wang, H., Zhang, S., & Parker, S. (2000). Landmarks: a new model for similarity-based pattern querying in time series databases. Proceedings 16th International Conference on Data Engineering. San Diego, USA. Google ScholarDigital Library
37.Porkaew, K., Chakrabarti, K. & Mehrotra, S. (1999). Query refinement for multimedia similarity retrieval in MARS. Proceedings of the ACM International Multimedia Conference, Orlando, Florida, pp 235-238 Google ScholarDigital Library
38.Qu, Y., Wang, C. & Wang, S. (1998). Supporting fast search in time series for movement patterns in multiples scales. Proceedings 7th International Conference on Information and Knowledge Management. Washington, DC. Google ScholarDigital Library
39.Refiei, D. (1999). On similarity-based queries for time series data. Proc of the 15th IEEE International Conference on Data Engineering. Sydney, Australia. Google ScholarDigital Library
40.Roussopoulos, N., Kelley, S. & Vincent, F. (1995). Nearest neighbor queries. SIGMOD Conference 1995: 71-79. Google ScholarDigital Library
41.Seidl, T. & Kriegel, H. (1998). Optimal multi-step k-nearest neighbor search. SIGMOD Conference: pp 154-165. Google ScholarDigital Library
42.Shatkay, H., & Zdonik, S. (1996). Approximate queries and representations for large data sequences. Proceedings 12th IEEE International Conference on Data Engineering. pp 546-553. Google ScholarDigital Library
43.Shevchenko, M. (2000). {http://www.iki.rssi.ru/} Space Research Institute. Moscow, Russia.Google Scholar
44.Stollnitz, E., DeRose, T., & Salesin, D. (1995). Wavelets for computer graphics A primer: IEEE Computer Graphics and Applications. Google ScholarDigital Library
45.Struzik, Z. & Siebes, A. (1999). The Haar wavelet transform in the time series similarity paradigm. Proceedings 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases. pp 12-22. Google ScholarDigital Library
46.Wang, C. & Wang, S. (2000). Supporting content-based searches on time Series via approximation. International Conference on Scientific and Statistical Database Management. Google ScholarDigital Library
47.Weigend, A. (1994). The Santa Fe Time Series Competition DataGoogle Scholar
48.Welch. D. & Quinn. P (1999). http://wwwmacho.mcmaster.ca/Project/Overview/status.htmlGoogle Scholar
49.Wu, Y., Agrawal, D. & Abbadi, A.(2000). A Comparison of DFT and DWT based Similarity Search in Time-Series Databases. Proceedings of the 9th International Conference on Information and Knowledge Management. Google ScholarDigital Library
50.Wu, D., Agrawal, D., El Abbadi, A. Singh, A. & Smith, T. R. (1996). Efficient retrieval for browsing large image databases. Proc of the 5th International Conference on Knowledge Information. pp 11-18, Rockville, MD. Google ScholarDigital Library
51.Yi, B,K., Jagadish, H., & Faloutsos, C. (1998). Efficient retrieval of similar time sequences under time warping. IEEEE International Conference on Data Engineering. pp 201-208. Google ScholarDigital Library
52.Yi, B,K., & Faloutsos, C.(2000). Fast time sequence indexing for arbitrary Lp norms. Proceedings of the 26st International Conference on Very Large Databases, Cairo, Egypt. Google ScholarDigital Library

Index Terms

Locally adaptive dimensionality reduction for indexing large time series databases

Recommendations

Locally adaptive dimensionality reduction for indexing large time series databases

Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data. The most promising solutions involve performing dimensionality reduction ...
Read More
Locally adaptive dimensionality reduction for indexing large time series databases

Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction ...
Read More
Dimensionality reduction-based spoken emotion recognition

To improve effectively the performance on spoken emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space. In this paper, a new supervised ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Editors:
Timos Sellis,
Sharad Mehrotra
ACM SIGMOD Record Volume 30, Issue 2
June 2001
625 pages
ISSN:0163-5808
DOI:10.1145/376284
Editors:
Timos Sellis
National Technical Univ. of Athens
,
Sharad Mehrotra
Univ. of California at Irvine
Issue’s Table of Contents
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
content-based retrieval
dimensionality reduction
indexing
Qualifiers
- Article
Conference

Acceptance Rates
SIGMOD '01 Paper Acceptance Rate44of293submissions,15%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 626
  Total Citations
  View Citations
- 2,849
  Total Downloads
- Downloads (Last 12 months)93
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Locally adaptive dimensionality reduction for indexing large time series databases

Locally adaptive dimensionality reduction for indexing large time series databases

Dimensionality reduction-based spoken emotion recognition