Article

Fast window correlations over uncooperative time series

Authors:
Richard Cole

New York University

New York University
View Profile

,
Dennis Shasha

New York University

New York University
View Profile

,
Xiaojian Zhao

New York University

New York University
View Profile

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningAugust 2005Pages 743–749https://doi.org/10.1145/1081870.1081966

Published:21 August 2005Publication History

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 743–749

ABSTRACT

Data arriving in time order (a data stream) arises in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves. Further, the number of sensors is increasing, so correlating data between sensors becomes ever more critical in order to distill knowlege from the data. In many applications such as finance, recent correlations are of far more interest than long-term correlation, so correlation over sliding windows (windowed correlation) is the desired operation. Fast response is desirable in many applications (e.g., to aim a telescope at an activity of interest or to perform a stock trade). These three factors -- data size, windowed correlation, and fast response -- motivate this work.Previous work [10, 14] showed how to compute Pearson correlation using Fast Fourier Transforms and Wavelet transforms, but such techniques don't work for time series in which the energy is spread over many frequency components, thus resembling white noise. For such "uncooperative" time series, this paper shows how to combine several simple techniques -- sketches (random projections), convolution, structured random vectors, grid structures, and combinatorial design -- to achieve high performance windowed Pearson correlation over a variety of data sets.

References

A. Gionis, P. Indyk and R. Motwani, Similarity Search in High Dimensions via Hashing, VLDB, 518--529, 1999. Google ScholarDigital Library
S. Mallat and Z. Zhang, Matching Pursuit With Time-Frequency Dictionaries, IEEE Transactions on Signal Processing, 1993.Google Scholar
T. M. Cover and J. A. Thomas, Elements of Information Theory, New York, Wiley, 1991. Google ScholarDigital Library
R. Cole, D. Shasha and X. J. Zhao, Fast Window Correlations Over Uncooperative Time Series, Department of Computer Science, New York University, New York, NY, Technical Report, 2005.Google Scholar
C. Alexander, Market Models: A Guide to Financial Data Analysis, John Wiley & Sons, 2001.Google Scholar
Wharton Research Data Services(WRDS), http://wrds.wharton.upenn.edu/Google Scholar
E. Keogh and T. Folias, The UCR Time Series Data Mining Archive. Riverside CA. University of California - Computer Science & Engineering Department, http://www.cs.ucr.edu/~eamonn/TSDMA/index.html, 2002.Google Scholar
B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall/CRC, 1994.Google Scholar
D. M. Cohen, S. R. Dalal, J. Parelius and G. C. Patton, The Combinatorial Design Approach to Automatic Test Generation, IEEE Software, 13, 83--87, 1996. Google ScholarDigital Library
D. Shasha and Y. Zhu, High Performance Discovery in Time Series: Techniques and Case Studies, Springer, 2003. Google ScholarDigital Library
D. Achlioptas, Database-friendly Random Projections, ACM SIGMOD-PODS, May, Santa Barbara, CA, 2001. Google ScholarDigital Library
P. Indyk, Stable distributions, pseudorandom generators, embeddings and data stream computation, Proceedings of the 41st Annual Symposium on Foundations of Computer Science, 0-7695-0850-2, 189, IEEE Computer Society, 2000. Google ScholarDigital Library
E. Kushilevitz, R. Ostrovsky and Y. Ranbani, Efficient Search for Approximate Nearest Neighbors in High Dimensional Spaces, STOC, 1998. Google ScholarDigital Library
Y. Zhu and D. Shasha, StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, VLDB, Hong Kong, China, August, 2002. Google ScholarDigital Library
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan and M. Strauss, Surfing wavelets on streams: One-pass summaries for approximate aggregate queries, VLDB, 2001. Google ScholarDigital Library
N. Thaper, S. Guha, P. Indyk and N. Koudas, Dynamic multidimensional histograms, SIGMOD, Madison, Wisconsin, 2002. Google ScholarDigital Library
P. Indyk, N. Koudas and S. Muthukrishnan, Identifying Representative Trends in Massive Time Series Data Sets using Sketches, VLDB, 2000. Google ScholarDigital Library
G. Cormode, P. Indyk, N. Koudas and S. Muthukrishnan, Fast mining of massive tabular data via approximate distance computations, ICDE, 2002.Google ScholarCross Ref
W. Johnson and J. Lindenstrauss, Extensions of Lipschitz mapping into hilbert space, Contemporary Mathematics, 26, 189--206, 1984.Google Scholar
M. Vlachos, M. Hadjieleftheriou, D. Gunopulos and E. Keogh, Indexing multi-dimensional time-series with support for multiple distance measures, SIGKDD, 2003. Google ScholarDigital Library
E. Keogh, Exact indexing of dynamic time warping, VLDB, 2002. Google ScholarDigital Library
B. K. Yi and C. Faloutsos, Fast time sequence indexing for arbitrary Lp forms, VLDB, 2000. Google ScholarDigital Library
T. Palpanas, M. Vlachos, E. Keogh, D. Gunopulos and W. Truppel, Online amnesic approximation of streaming time series, ICDE, 2004. Google ScholarDigital Library
E. Keogh, K. Chakrabarti, S. Mehrotra and M. Pazzani, Locally Adaptive Dimensionality Reduction for Indexing large Time Series Databases, SIGMOD, 2001. Google ScholarDigital Library
F. Korn, H. V. Jagadish and C. Faloutsos, Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences, SIGMOD, 1997 Google ScholarDigital Library
Y. L. Wu, D. Agrawal and A. E. Abbadi, A comparison of dft and dwt based similarity search in time-series databases, The 9th ACM CIKM Int'l Conference on Information and Knowledge Management, 2000. Google ScholarDigital Library
I. Popivanov and R. Miller, Similarity Search Over Time Series Data Using Wavelets, ICDE, 2002.Google ScholarCross Ref
K. Chan and A. W. Fu, Efficient Time Series Matching by Wavelets, ICDE, 1999.Google Scholar
D. Rafier and A. Mendelzon, Similarity-based queries for time series data, SIGMOD, 1997. Google ScholarDigital Library
C. S. Li, P. S. Yu and V. Castelli, Hierarchyscan: A hierarchical similarity search algorithm for databases of long sequences, ICDE, 1996. Google ScholarDigital Library
E. Drinea, P. Drineas and P. Huggins, A Randomized Singular Value Decomposition Algorithm for Image Processing, Panhellenic Conference on Informatics (PCI), 2001.Google Scholar
G. Manku, S. Rajagopalan and B. Lindsay, Random Sampling Techniques for Space Efficient Online Computation of order Statistics of Large Datasets, SIGMOD, 1999. Google ScholarDigital Library
M. Greenwald and S. Khanna, Space-Efficient Online Computation of Quantile Summaries, SIGMOD, 2001. Google ScholarDigital Library
R. Agrawal, C. Faloutsos and A. Swami, Efficient Similarity Searching In Sequence Databases, Proceedings of the 4th International Conference of Foundations of Data organization and Algorithms (FODO), Springer Verlag, Chicago, Illinois, MN, 69--84, 1993. Google ScholarDigital Library
C. Faloutsos, M. Ranganathan and Y. Manolopoulos, Fast subsequence matching in time-series databases, SIGMOD, Minneapolis, MN, May, 1994. Google ScholarDigital Library
E. Keogh, K. Chakrabarti, M. Pazzani and S. Mehrotra, Dimensionality Reduction for fast similarity search in large time series databases, Knowledge and Information Systems, 3, 263--286, 2000.Google Scholar

Index Terms

Fast window correlations over uncooperative time series

Recommendations

An entropy-based measure of correlation for time series
Abstract
In this paper, an information-based measure of association between time series, called information-based correlation coefficient (ICC), is introduced to potentially overcome some of the problems related to Pearson's correlation ...
Read More
Fast correlation coefficient estimation algorithm for HBase-based massive time series data

In recent years, the rapid development of Internet of Things and sensor networks makes the time series data experiencing explosive growth. OpenTSDB and other emerging systems begin to use Hadoop, HBase to store massive time series data, and how to use ...
Read More
Mining partial periodic correlations in time series

Recently, periodic pattern mining from time series data has been studied extensively. However, an interesting type of periodic pattern, called partial periodic (PP) correlation in this paper, has not been investigated. An example of PP correlation is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
correlation
randomized algorithms
time series
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 48
  Total Citations
  View Citations
- 831
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fast window correlations over uncooperative time series

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

An entropy-based measure of correlation for time series

Fast correlation coefficient estimation algorithm for HBase-based massive time series data

Mining partial periodic correlations in time series