ABSTRACT
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.
- 1.Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases. Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms. Google ScholarDigital Library
- 2.Agrawal, R., Psaila, G., Wimmers, E. L., & Zait, M. (1995). Querying shapes of histories. Proceedings of the 21st International Conference on Very Large Databases. Google ScholarDigital Library
- 3.Agrawal, R., Lin, K. I., Sawhney, H. S., & Shim, K. (1995). Fast similarity search in the presence of noise, scaling, and translation in times-series databases. Proceedings of 21th International Conference on Very Large Data Bases. Zurich. pp 490-50. Google ScholarDigital Library
- 4.Bay, S. D. (2000). The UCI KDD Archive {http://kdd.ics.uci.edu}. Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar
- 5.Bennett, K., Fayyad, U. & Geiger. D. (1999). Density-based indexing for approximate nearest-neighbor queries. Proceedings 5th International Conference on Knowledge Discovery and Data Mining. pp. 233-243, ACM Press, New York. Google ScholarDigital Library
- 6.Chakrabarti, K & Mehrotra, S (2000). Local dimensionality reduction: A new approach to indexing high dimensional spaces. Proceedings of the 26th Conference on Very Large Databases, Cairo, Egypt. Google ScholarDigital Library
- 7.Chakrabarti, K & Mehrotra, S (2000). Local dimensionality reduction: A new approach to indexing high dimensional spaces. Proceedings of the 26th Conference on Very Large Databases, Cairo, Egypt. Google ScholarDigital Library
- 8.Chakrabarti, K., Ortega-Binderberger, M., Porkaew, K & Mehrotra, S. (2000) Similar shape retrieval in MARS. Proceeding of IEEE International Conference on Multimedia and Expo.Google ScholarCross Ref
- 9.Chan, K. & Fu, W. (1999). Efficient time series matching by wavelets. Proceedings of the 15th IEEE International Conference on Data Engineering. Google ScholarDigital Library
- 10.Chandrasekaran, S., Manjunath, B.S., Wang, Y. F. Winkeler, J. & Zhang. H. (1997). An eigenspace update algorithm for image analysis. Graphical Models and Image Processing, Vol. 59, No. 5, pp. 321-332. Google ScholarDigital Library
- 11.Chu, K & Wong, M. (1999). Fast time-series searching with scaling and shifting. Proceedings of the 18th ACM Symposium on Principles of Database Systems, Philadelphia. Google ScholarDigital Library
- 12.Das, G., Lin, K. Mannila, H., Renganathan, G., & Smyth, P. (1998). Rule discovery from time series. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 16-22.Google Scholar
- 13.Debregeas, A. & Hebrail, G. (1998). Interactive interpretation of Kohonen maps applied to curves. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 179-183.Google Scholar
- 14.Evangelidis, G., Lomet, D. & Salzberg B (1997). The hB-Pi-Tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB Journal 6(1): 1-25. Google ScholarDigital Library
- 15.Faloutsos, C., Jagadish, H., Mendelzon, A. & Milo, T. (1997). A signature technique for similarity-based queries. SEQUENCES 97, Positano-Salerno, Italy. Google ScholarDigital Library
- 16.Faloutsos, C., Ranganathan, M., & Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases. Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. Minneapolis. Google ScholarDigital Library
- 17.Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. Proceedings ACM SIGMOD Conference. pp 47-57. Google ScholarDigital Library
- 18.Hellerstein, J. M., Papadimitriou, C. H., & Koutsoupias, E. (1997). Towards an analysis of indexing schemes. Sixteenth ACM Symposium on Principles of Database Systems. Google ScholarDigital Library
- 19.Hjaltason, G., Samet, H (1995). Ranking in spatial databases. Symposium on Large Spatial Databases. pp 83-95. Google ScholarDigital Library
- 20.Huang, Y. W., Yu, P. (1999). Adaptive Query processing for time-series data. Proceedings of the 5th International Conference of Knowledge Discovery and Data Mining. pp 282-286. Google ScholarDigital Library
- 21.Jonsson. H., & Badal. D. (1997). Using signature files for querying time-series data. First European Symposium on Principles of Data Mining and Knowledge Discovery. Google ScholarDigital Library
- 22.Kahveci, T. & Singh, A (2001). Variable length queries for time series data. Proceedings 17th International Conference on Data Engineering. Heidelberg, Germany. Google ScholarDigital Library
- 23.Kanth, K.V., Agrawal, D., & Singh, A. (1998). Dimensionality reduction for similarity searching in dynamic databases. Proceedings ACM SIGMOD Conf., pp. 166-176. Google ScholarDigital Library
- 24.Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra (2000) Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems.Google Scholar
- 25.Keogh, E. & Pazzani, M. (1999). Relevance feedback retrieval of time series data. Proceedings of the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Google ScholarDigital Library
- 26.Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 239-241, AAAI Press.Google Scholar
- 27.Keogh, E., & Smyth, P. (1997). A probabilistic approach to fast pattern matching in time series databases. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 24-20.Google Scholar
- 28.Korn, F., Jagadish, H & Faloutsos. C. (1997). Efficiently supporting ad hoc queries in large datasets of time sequences. Proceedings of SIGMOD '97, Tucson, AZ, pp 289-300. Google ScholarDigital Library
- 29.Lam, S., & Wong, M (1998) A fast projection algorithm for sequence data searching. Data & Knowledge Engineering 28(3): 321-339. Google ScholarDigital Library
- 30.Li, C,. Yu, P. & Castelli V.(1998). MALM: A framework for mining sequence database at multiple abstraction levels. CIKM. pp 267-272. Google ScholarDigital Library
- 31.Loh, W., Kim, S & Whang, K. (2000). Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases. Proceedings 9th International Conference on Information and Knowledge Management. Google ScholarDigital Library
- 32.Moody, G. (2000). MIT-BIH Database DistributionGoogle Scholar
- 33.Ng, M. K., Huang, Z., & Hegland, M. (1998). Data-mining massive time series astronomical data sets - a case study. Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp 401-402 Google ScholarDigital Library
- 34.Park, S., Lee, D., & Chu, W. (1999). Fast retrieval of similar subsequences in long sequence databases. In 3rd IEEE Knowledge and Data Engineering Exchange Workshop. Google ScholarDigital Library
- 35.Pavlidis, T. (1976). Waveform segmentation through functional approximation. IEEE Transcations on Computers, Vol C-22, NO. 7 July.Google Scholar
- 36.Perng, C., Wang, H., Zhang, S., & Parker, S. (2000). Landmarks: a new model for similarity-based pattern querying in time series databases. Proceedings 16th International Conference on Data Engineering. San Diego, USA. Google ScholarDigital Library
- 37.Porkaew, K., Chakrabarti, K. & Mehrotra, S. (1999). Query refinement for multimedia similarity retrieval in MARS. Proceedings of the ACM International Multimedia Conference, Orlando, Florida, pp 235-238 Google ScholarDigital Library
- 38.Qu, Y., Wang, C. & Wang, S. (1998). Supporting fast search in time series for movement patterns in multiples scales. Proceedings 7th International Conference on Information and Knowledge Management. Washington, DC. Google ScholarDigital Library
- 39.Refiei, D. (1999). On similarity-based queries for time series data. Proc of the 15th IEEE International Conference on Data Engineering. Sydney, Australia. Google ScholarDigital Library
- 40.Roussopoulos, N., Kelley, S. & Vincent, F. (1995). Nearest neighbor queries. SIGMOD Conference 1995: 71-79. Google ScholarDigital Library
- 41.Seidl, T. & Kriegel, H. (1998). Optimal multi-step k-nearest neighbor search. SIGMOD Conference: pp 154-165. Google ScholarDigital Library
- 42.Shatkay, H., & Zdonik, S. (1996). Approximate queries and representations for large data sequences. Proceedings 12th IEEE International Conference on Data Engineering. pp 546-553. Google ScholarDigital Library
- 43.Shevchenko, M. (2000). {http://www.iki.rssi.ru/} Space Research Institute. Moscow, Russia.Google Scholar
- 44.Stollnitz, E., DeRose, T., & Salesin, D. (1995). Wavelets for computer graphics A primer: IEEE Computer Graphics and Applications. Google ScholarDigital Library
- 45.Struzik, Z. & Siebes, A. (1999). The Haar wavelet transform in the time series similarity paradigm. Proceedings 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases. pp 12-22. Google ScholarDigital Library
- 46.Wang, C. & Wang, S. (2000). Supporting content-based searches on time Series via approximation. International Conference on Scientific and Statistical Database Management. Google ScholarDigital Library
- 47.Weigend, A. (1994). The Santa Fe Time Series Competition DataGoogle Scholar
- 48.Welch. D. & Quinn. P (1999). http://wwwmacho.mcmaster.ca/Project/Overview/status.htmlGoogle Scholar
- 49.Wu, Y., Agrawal, D. & Abbadi, A.(2000). A Comparison of DFT and DWT based Similarity Search in Time-Series Databases. Proceedings of the 9th International Conference on Information and Knowledge Management. Google ScholarDigital Library
- 50.Wu, D., Agrawal, D., El Abbadi, A. Singh, A. & Smith, T. R. (1996). Efficient retrieval for browsing large image databases. Proc of the 5th International Conference on Knowledge Information. pp 11-18, Rockville, MD. Google ScholarDigital Library
- 51.Yi, B,K., Jagadish, H., & Faloutsos, C. (1998). Efficient retrieval of similar time sequences under time warping. IEEEE International Conference on Data Engineering. pp 201-208. Google ScholarDigital Library
- 52.Yi, B,K., & Faloutsos, C.(2000). Fast time sequence indexing for arbitrary Lp norms. Proceedings of the 26st International Conference on Very Large Databases, Cairo, Egypt. Google ScholarDigital Library
Index Terms
- Locally adaptive dimensionality reduction for indexing large time series databases
Recommendations
Locally adaptive dimensionality reduction for indexing large time series databases
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data. The most promising solutions involve performing dimensionality reduction ...
Locally adaptive dimensionality reduction for indexing large time series databases
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction ...
Dimensionality reduction-based spoken emotion recognition
To improve effectively the performance on spoken emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space. In this paper, a new supervised ...
Comments