skip to main content
10.1145/1081870.1081966acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fast window correlations over uncooperative time series

Published:21 August 2005Publication History

ABSTRACT

Data arriving in time order (a data stream) arises in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves. Further, the number of sensors is increasing, so correlating data between sensors becomes ever more critical in order to distill knowlege from the data. In many applications such as finance, recent correlations are of far more interest than long-term correlation, so correlation over sliding windows (windowed correlation) is the desired operation. Fast response is desirable in many applications (e.g., to aim a telescope at an activity of interest or to perform a stock trade). These three factors -- data size, windowed correlation, and fast response -- motivate this work.Previous work [10, 14] showed how to compute Pearson correlation using Fast Fourier Transforms and Wavelet transforms, but such techniques don't work for time series in which the energy is spread over many frequency components, thus resembling white noise. For such "uncooperative" time series, this paper shows how to combine several simple techniques -- sketches (random projections), convolution, structured random vectors, grid structures, and combinatorial design -- to achieve high performance windowed Pearson correlation over a variety of data sets.

References

  1. A. Gionis, P. Indyk and R. Motwani, Similarity Search in High Dimensions via Hashing, VLDB, 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Mallat and Z. Zhang, Matching Pursuit With Time-Frequency Dictionaries, IEEE Transactions on Signal Processing, 1993.Google ScholarGoogle Scholar
  3. T. M. Cover and J. A. Thomas, Elements of Information Theory, New York, Wiley, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Cole, D. Shasha and X. J. Zhao, Fast Window Correlations Over Uncooperative Time Series, Department of Computer Science, New York University, New York, NY, Technical Report, 2005.Google ScholarGoogle Scholar
  5. C. Alexander, Market Models: A Guide to Financial Data Analysis, John Wiley & Sons, 2001.Google ScholarGoogle Scholar
  6. Wharton Research Data Services(WRDS), http://wrds.wharton.upenn.edu/Google ScholarGoogle Scholar
  7. E. Keogh and T. Folias, The UCR Time Series Data Mining Archive. Riverside CA. University of California - Computer Science & Engineering Department, http://www.cs.ucr.edu/~eamonn/TSDMA/index.html, 2002.Google ScholarGoogle Scholar
  8. B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall/CRC, 1994.Google ScholarGoogle Scholar
  9. D. M. Cohen, S. R. Dalal, J. Parelius and G. C. Patton, The Combinatorial Design Approach to Automatic Test Generation, IEEE Software, 13, 83--87, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Shasha and Y. Zhu, High Performance Discovery in Time Series: Techniques and Case Studies, Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Achlioptas, Database-friendly Random Projections, ACM SIGMOD-PODS, May, Santa Barbara, CA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Indyk, Stable distributions, pseudorandom generators, embeddings and data stream computation, Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 0-7695-0850-2, 189, IEEE Computer Society, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Kushilevitz, R. Ostrovsky and Y. Ranbani, Efficient Search for Approximate Nearest Neighbors in High Dimensional Spaces, STOC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Zhu and D. Shasha, StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, VLDB, Hong Kong, China, August, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan and M. Strauss, Surfing wavelets on streams: One-pass summaries for approximate aggregate queries, VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Thaper, S. Guha, P. Indyk and N. Koudas, Dynamic multidimensional histograms, SIGMOD, Madison, Wisconsin, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Indyk, N. Koudas and S. Muthukrishnan, Identifying Representative Trends in Massive Time Series Data Sets using Sketches, VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Cormode, P. Indyk, N. Koudas and S. Muthukrishnan, Fast mining of massive tabular data via approximate distance computations, ICDE, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  19. W. Johnson and J. Lindenstrauss, Extensions of Lipschitz mapping into hilbert space, Contemporary Mathematics, 26, 189--206, 1984.Google ScholarGoogle Scholar
  20. M. Vlachos, M. Hadjieleftheriou, D. Gunopulos and E. Keogh, Indexing multi-dimensional time-series with support for multiple distance measures, SIGKDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Keogh, Exact indexing of dynamic time warping, VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. K. Yi and C. Faloutsos, Fast time sequence indexing for arbitrary Lp forms, VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Palpanas, M. Vlachos, E. Keogh, D. Gunopulos and W. Truppel, Online amnesic approximation of streaming time series, ICDE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Keogh, K. Chakrabarti, S. Mehrotra and M. Pazzani, Locally Adaptive Dimensionality Reduction for Indexing large Time Series Databases, SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Korn, H. V. Jagadish and C. Faloutsos, Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences, SIGMOD, 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. L. Wu, D. Agrawal and A. E. Abbadi, A comparison of dft and dwt based similarity search in time-series databases, The 9th ACM CIKM Int'l Conference on Information and Knowledge Management, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. I. Popivanov and R. Miller, Similarity Search Over Time Series Data Using Wavelets, ICDE, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  28. K. Chan and A. W. Fu, Efficient Time Series Matching by Wavelets, ICDE, 1999.Google ScholarGoogle Scholar
  29. D. Rafier and A. Mendelzon, Similarity-based queries for time series data, SIGMOD, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. S. Li, P. S. Yu and V. Castelli, Hierarchyscan: A hierarchical similarity search algorithm for databases of long sequences, ICDE, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Drinea, P. Drineas and P. Huggins, A Randomized Singular Value Decomposition Algorithm for Image Processing, Panhellenic Conference on Informatics (PCI), 2001.Google ScholarGoogle Scholar
  32. G. Manku, S. Rajagopalan and B. Lindsay, Random Sampling Techniques for Space Efficient Online Computation of order Statistics of Large Datasets, SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Greenwald and S. Khanna, Space-Efficient Online Computation of Quantile Summaries, SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Agrawal, C. Faloutsos and A. Swami, Efficient Similarity Searching In Sequence Databases, Proceedings of the 4th International Conference of Foundations of Data organization and Algorithms (FODO), Springer Verlag, Chicago, Illinois, MN, 69--84, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Faloutsos, M. Ranganathan and Y. Manolopoulos, Fast subsequence matching in time-series databases, SIGMOD, Minneapolis, MN, May, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. E. Keogh, K. Chakrabarti, M. Pazzani and S. Mehrotra, Dimensionality Reduction for fast similarity search in large time series databases, Knowledge and Information Systems, 3, 263--286, 2000.Google ScholarGoogle Scholar

Index Terms

  1. Fast window correlations over uncooperative time series

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
              August 2005
              844 pages
              ISBN:159593135X
              DOI:10.1145/1081870

              Copyright © 2005 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 21 August 2005

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate1,133of8,635submissions,13%

              Upcoming Conference

              KDD '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader