skip to main content
10.1145/375663.375680acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Locally adaptive dimensionality reduction for indexing large time series databases

Authors Info & Claims
Published:01 May 2001Publication History

ABSTRACT

Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments of varying lengths such that their individual reconstruction errors are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a non-lower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searching, and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its superiority.

References

  1. 1.Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efficient similarity search in sequence databases. Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Agrawal, R., Psaila, G., Wimmers, E. L., & Zait, M. (1995). Querying shapes of histories. Proceedings of the 21st International Conference on Very Large Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Agrawal, R., Lin, K. I., Sawhney, H. S., & Shim, K. (1995). Fast similarity search in the presence of noise, scaling, and translation in times-series databases. Proceedings of 21th International Conference on Very Large Data Bases. Zurich. pp 490-50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Bay, S. D. (2000). The UCI KDD Archive {http://kdd.ics.uci.edu}. Irvine, CA: University of California, Department of Information and Computer Science.Google ScholarGoogle Scholar
  5. 5.Bennett, K., Fayyad, U. & Geiger. D. (1999). Density-based indexing for approximate nearest-neighbor queries. Proceedings 5th International Conference on Knowledge Discovery and Data Mining. pp. 233-243, ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Chakrabarti, K & Mehrotra, S (2000). Local dimensionality reduction: A new approach to indexing high dimensional spaces. Proceedings of the 26th Conference on Very Large Databases, Cairo, Egypt. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.Chakrabarti, K & Mehrotra, S (2000). Local dimensionality reduction: A new approach to indexing high dimensional spaces. Proceedings of the 26th Conference on Very Large Databases, Cairo, Egypt. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Chakrabarti, K., Ortega-Binderberger, M., Porkaew, K & Mehrotra, S. (2000) Similar shape retrieval in MARS. Proceeding of IEEE International Conference on Multimedia and Expo.Google ScholarGoogle ScholarCross RefCross Ref
  9. 9.Chan, K. & Fu, W. (1999). Efficient time series matching by wavelets. Proceedings of the 15th IEEE International Conference on Data Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Chandrasekaran, S., Manjunath, B.S., Wang, Y. F. Winkeler, J. & Zhang. H. (1997). An eigenspace update algorithm for image analysis. Graphical Models and Image Processing, Vol. 59, No. 5, pp. 321-332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Chu, K & Wong, M. (1999). Fast time-series searching with scaling and shifting. Proceedings of the 18th ACM Symposium on Principles of Database Systems, Philadelphia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Das, G., Lin, K. Mannila, H., Renganathan, G., & Smyth, P. (1998). Rule discovery from time series. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 16-22.Google ScholarGoogle Scholar
  13. 13.Debregeas, A. & Hebrail, G. (1998). Interactive interpretation of Kohonen maps applied to curves. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 179-183.Google ScholarGoogle Scholar
  14. 14.Evangelidis, G., Lomet, D. & Salzberg B (1997). The hB-Pi-Tree: A multi-attribute index supporting concurrency, recovery and node consolidation. VLDB Journal 6(1): 1-25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Faloutsos, C., Jagadish, H., Mendelzon, A. & Milo, T. (1997). A signature technique for similarity-based queries. SEQUENCES 97, Positano-Salerno, Italy. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Faloutsos, C., Ranganathan, M., & Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases. Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. Minneapolis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. Proceedings ACM SIGMOD Conference. pp 47-57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.Hellerstein, J. M., Papadimitriou, C. H., & Koutsoupias, E. (1997). Towards an analysis of indexing schemes. Sixteenth ACM Symposium on Principles of Database Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.Hjaltason, G., Samet, H (1995). Ranking in spatial databases. Symposium on Large Spatial Databases. pp 83-95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.Huang, Y. W., Yu, P. (1999). Adaptive Query processing for time-series data. Proceedings of the 5th International Conference of Knowledge Discovery and Data Mining. pp 282-286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.Jonsson. H., & Badal. D. (1997). Using signature files for querying time-series data. First European Symposium on Principles of Data Mining and Knowledge Discovery. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.Kahveci, T. & Singh, A (2001). Variable length queries for time series data. Proceedings 17th International Conference on Data Engineering. Heidelberg, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.Kanth, K.V., Agrawal, D., & Singh, A. (1998). Dimensionality reduction for similarity searching in dynamic databases. Proceedings ACM SIGMOD Conf., pp. 166-176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra (2000) Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems.Google ScholarGoogle Scholar
  25. 25.Keogh, E. & Pazzani, M. (1999). Relevance feedback retrieval of time series data. Proceedings of the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. pp 239-241, AAAI Press.Google ScholarGoogle Scholar
  27. 27.Keogh, E., & Smyth, P. (1997). A probabilistic approach to fast pattern matching in time series databases. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. pp 24-20.Google ScholarGoogle Scholar
  28. 28.Korn, F., Jagadish, H & Faloutsos. C. (1997). Efficiently supporting ad hoc queries in large datasets of time sequences. Proceedings of SIGMOD '97, Tucson, AZ, pp 289-300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29.Lam, S., & Wong, M (1998) A fast projection algorithm for sequence data searching. Data & Knowledge Engineering 28(3): 321-339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30.Li, C,. Yu, P. & Castelli V.(1998). MALM: A framework for mining sequence database at multiple abstraction levels. CIKM. pp 267-272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31.Loh, W., Kim, S & Whang, K. (2000). Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases. Proceedings 9th International Conference on Information and Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32.Moody, G. (2000). MIT-BIH Database DistributionGoogle ScholarGoogle Scholar
  33. 33.Ng, M. K., Huang, Z., & Hegland, M. (1998). Data-mining massive time series astronomical data sets - a case study. Proceedings of the 2nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp 401-402 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34.Park, S., Lee, D., & Chu, W. (1999). Fast retrieval of similar subsequences in long sequence databases. In 3rd IEEE Knowledge and Data Engineering Exchange Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35.Pavlidis, T. (1976). Waveform segmentation through functional approximation. IEEE Transcations on Computers, Vol C-22, NO. 7 July.Google ScholarGoogle Scholar
  36. 36.Perng, C., Wang, H., Zhang, S., & Parker, S. (2000). Landmarks: a new model for similarity-based pattern querying in time series databases. Proceedings 16th International Conference on Data Engineering. San Diego, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. 37.Porkaew, K., Chakrabarti, K. & Mehrotra, S. (1999). Query refinement for multimedia similarity retrieval in MARS. Proceedings of the ACM International Multimedia Conference, Orlando, Florida, pp 235-238 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. 38.Qu, Y., Wang, C. & Wang, S. (1998). Supporting fast search in time series for movement patterns in multiples scales. Proceedings 7th International Conference on Information and Knowledge Management. Washington, DC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. 39.Refiei, D. (1999). On similarity-based queries for time series data. Proc of the 15th IEEE International Conference on Data Engineering. Sydney, Australia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. 40.Roussopoulos, N., Kelley, S. & Vincent, F. (1995). Nearest neighbor queries. SIGMOD Conference 1995: 71-79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. 41.Seidl, T. & Kriegel, H. (1998). Optimal multi-step k-nearest neighbor search. SIGMOD Conference: pp 154-165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. 42.Shatkay, H., & Zdonik, S. (1996). Approximate queries and representations for large data sequences. Proceedings 12th IEEE International Conference on Data Engineering. pp 546-553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. 43.Shevchenko, M. (2000). {http://www.iki.rssi.ru/} Space Research Institute. Moscow, Russia.Google ScholarGoogle Scholar
  44. 44.Stollnitz, E., DeRose, T., & Salesin, D. (1995). Wavelets for computer graphics A primer: IEEE Computer Graphics and Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. 45.Struzik, Z. & Siebes, A. (1999). The Haar wavelet transform in the time series similarity paradigm. Proceedings 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases. pp 12-22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. 46.Wang, C. & Wang, S. (2000). Supporting content-based searches on time Series via approximation. International Conference on Scientific and Statistical Database Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. 47.Weigend, A. (1994). The Santa Fe Time Series Competition DataGoogle ScholarGoogle Scholar
  48. 48.Welch. D. & Quinn. P (1999). http://wwwmacho.mcmaster.ca/Project/Overview/status.htmlGoogle ScholarGoogle Scholar
  49. 49.Wu, Y., Agrawal, D. & Abbadi, A.(2000). A Comparison of DFT and DWT based Similarity Search in Time-Series Databases. Proceedings of the 9th International Conference on Information and Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. 50.Wu, D., Agrawal, D., El Abbadi, A. Singh, A. & Smith, T. R. (1996). Efficient retrieval for browsing large image databases. Proc of the 5th International Conference on Knowledge Information. pp 11-18, Rockville, MD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. 51.Yi, B,K., Jagadish, H., & Faloutsos, C. (1998). Efficient retrieval of similar time sequences under time warping. IEEEE International Conference on Data Engineering. pp 201-208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. 52.Yi, B,K., & Faloutsos, C.(2000). Fast time sequence indexing for arbitrary Lp norms. Proceedings of the 26st International Conference on Very Large Databases, Cairo, Egypt. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Locally adaptive dimensionality reduction for indexing large time series databases

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
                  May 2001
                  630 pages
                  ISBN:1581133324
                  DOI:10.1145/375663

                  Copyright © 2001 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 May 2001

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                  Acceptance Rates

                  SIGMOD '01 Paper Acceptance Rate44of293submissions,15%Overall Acceptance Rate785of4,003submissions,20%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader