research-article

Debunking Four Long-Standing Misconceptions of Time-Series Distance Measures

Authors:
John Paparrizos

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Chunwei Liu

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Aaron J. Elmore

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Michael J. Franklin

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataJune 2020Pages 1887–1905https://doi.org/10.1145/3318464.3389760

Published:31 May 2020Publication History

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 1887–1905

ABSTRACT

Distance measures are core building blocks in time-series analysis and the subject of active research for decades. Unfortunately, the most detailed experimental study in this area is outdated (over a decade old) and, naturally, does not reflect recent progress. Importantly, this study (i) omitted multiple distance measures, including a classic measure in the time-series literature; (ii) considered only a single time-series normalization method; and (iii) reported only raw classification error rates without statistically validating the findings, resulting in or fueling four misconceptions in the time-series literature. Motivated by the aforementioned drawbacks and our curiosity to shed some light on these misconceptions, we comprehensively evaluate 71 time-series distance measures. Specifically, our study includes (i) 8 normalization methods; (ii) 52 lock-step measures; (iii) 4 sliding measures; (iv) 7 elastic measures; (v) 4 kernel functions; and (vi) 4 embedding measures. We extensively evaluate these measures across 128 time-series datasets using rigorous statistical analysis. Our findings debunk four long-standing misconceptions that significantly alter the landscape of what is known about existing distance measures. With the new foundations in place, we discuss open challenges and promising directions.

Supplemental Material

3318464.3389760.mp4

mp4

81.8 MB

Download

References

Amaia Abanda, Usue Mori, and Jose A Lozano. 2019. A review on distance based time series classification. Data Mining and Knowledge Discovery 33, 2 (2019), 378--412.Google ScholarDigital Library
Rakesh Agrawal, Christos Faloutsos, and Arun N. Swami. 1993. Efficient Similarity Search In Sequence Databases. In FODO. 69--84.Google ScholarDigital Library
Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, and Kyuseok Shim. 1995. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In Proceeding of the 21th International Conference on Very Large Data Bases. Citeseer, 490--501.Google ScholarDigital Library
Shadab Alam, Franco D Albareti, Carlos Allende Prieto, Friedrich Anders, Scott F Anderson, Timothy Anderton, Brett H Andrews, Eric Armengaud, Éric Aubourg, Stephen Bailey, et al.2015. The eleventh and twelfth data releases of the Sloan Digital Sky Survey: final data from SDSS-III. The Astrophysical Journal Supplement Series 219, 1(2015), 12.Google ScholarCross Ref
Jonathan Alon, Stan Sclaroff, George Kollios, and Vladimir Pavlovic. 2003. Discovering clusters in motion time-series data. In CVPR. 375--381.Google Scholar
Francisco Martinez Alvarez, Alicia Troncoso, Jose C Riquelme, and Jesus S Aguilar Ruiz. 2010. Energy time series forecasting based on pattern sequence similarity. IEEE Transactions on Knowledge and Data Engineering 23, 8 (2010), 1230--1243.Google ScholarDigital Library
Henrik André-Jönsson and Dushan Z Badal. 1997. Using signature files for querying time-series data. In European Symposium on Principles of Data Mining and Knowledge Discovery. Springer, 211--220.Google ScholarCross Ref
Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Peter Kunath, Alexey Pryakhin, and Matthias Renz. 2006. Similarity search on time series based on threshold queries. In International Conference on Extending Database Technology. Springer, 276--294.Google Scholar
Martin Bach-Andersen, Bo Rømer-Odgaard, and Ole Winther. 2017. Flexible non-linear predictive models for large-scale wind turbine diagnostics. Wind Energy 20, 5 (2017), 753--764.Google ScholarCross Ref
Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. 2018.The UEA multivariate time series classification archive, 2018. arXivpreprint arXiv:1811.00075(2018).Google Scholar
Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: are view and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31, 3 (2017), 606--660.Google ScholarDigital Library
Anthony J Bagnall and Gareth J Janacek. 2004. Clustering time series from ARMA models with clipped data. In KDD. 49--58.Google Scholar
Ziv Bar-Joseph. 2004. Analyzing time series gene expression data. Bioinformatics 20, 16 (2004), 2493--2503.Google ScholarDigital Library
Ziv Bar-Joseph, Georg K Gerber, David K Gifford, Tommi S Jaakkola, and Itamar Simon. 2003. Continuous representations of time-series gene expression data.Journal of Computational Biology 10, 3--4 (2003),341--356.Google Scholar
Ziv Bar-Joseph, Anthony Gitter, and Itamar Simon. 2012. Studying and modelling dynamic biological processes using time-series gene expression data.Nature Reviews Genetics13, 8 (2012), 552.Google Scholar
Gustavo EAPA Batista, Eamonn J Keogh, Oben Moses Tataw, and Vinicius MA De Souza. 2014. CID: an efficient complexity-invariant distance for time series.Data Mining and Knowledge Discovery 28, 3(2014), 634--669.Google Scholar
Nurjahan Begum and Eamonn Keogh. 2014. Rare time series motif discovery from unbounded streams. Proceedings of the VLDB Endowment 8, 2 (2014), 149--160.Google ScholarDigital Library
Donald J Berndt and James Clifford. 1994. Using Dynamic TimeWarping to Find Patterns in Time Series. In AAAI Workshop on KDD. 359--370.Google Scholar
Bharat B Biswal, Maarten Mennes, Xi-Nian Zuo, Suril Gohel, ClareKelly, Steve M Smith, Christian F Beckmann, Jonathan S Adelstein, Randy L Buckner, Stan Colcombe, et al. 2010. Toward discovery science of human brain function. Proceedings of the National Academy of Sciences 107, 10 (2010), 4734--4739.Google ScholarCross Ref
R Bracewell. 1965. Pentagram notation for cross correlation. The Fourier transform and its applications. New York: McGraw-Hill46(1965), 243.Google Scholar
Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. In ACM sigmod record, Vol. 29. ACM, 93--104.Google ScholarDigital Library
Peter J Brockwell and Richard A Davis. 2016.Introduction to timeseries and forecasting. springer.Google Scholar
Lisa Gottesfeld Brown. 1992. A survey of image registration techniques. ACM computing surveys (CSUR)24, 4 (1992), 325--376.Google Scholar
Yuhan Cai and Raymond Ng. 2004. Indexing spatio-temporal trajectories with Chebyshev polynomials. In SIGMOD. 599--610.Google Scholar
Alessandro Camerra, Themis Palpanas, Jin Shieh, and Eamonn Keogh. 2010. iSAX 2.0: Indexing and mining one billion time series. In 2010 IEEE International Conference on Data Mining. IEEE, 58--67.Google ScholarDigital Library
Sung-Hyuk Cha. 2007. Comprehensive survey on distance/similarity measures between probability density functions. City1, 2 (2007), 1.Google Scholar
Lei Chen and Raymond Ng. 2004. On the marriage of Lp-norms and edit distance. InVLDB. 792--803.Google Scholar
Lei Chen, M Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In SIGMOD. 491--502.Google Scholar
Qiuxia Chen, Lei Chen, Xiang Lian, Yunhao Liu, and Jeffrey Xu Yu.2007. Indexable PLA for efficient similarity search. In VLDB. 435--446.Google Scholar
Yueguo Chen, Mario A Nascimento, Beng Chin Ooi, and Anthony KHTung. 2007. Spade: On shape-based pattern detection in streaming time series. In ICDE. 786--795.Google Scholar
Bill Chiu, Eamonn Keogh, and Stefano Lonardi. 2003. Probabilistic discovery of time series motifs. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--498.Google ScholarDigital Library
Kelvin Kam Wing Chu and Man Hon Wong. 1999. Fast time-series searching with scaling and shifting. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. Citeseer, 237--248.Google Scholar
Richard Cole, Dennis Shasha, and Xiaojian Zhao. 2005. Fast window correlations over uncooperative time series. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 743--749.Google ScholarDigital Library
James W Cooley and John W Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comp.19, 90(1965), 297--301.Google ScholarCross Ref
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.Google ScholarDigital Library
Madalena Costa, Ary L Goldberger, and C-K Peng. 2002. Multiscale entropy analysis of complex physiologic time series.Physical review letters 89, 6 (2002), 068102.Google Scholar
Nello Cristianini and John Shawe-Taylor. 2000. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press.Google ScholarCross Ref
Marco Cuturi. 2011. Fast global alignment kernels. In Proceedings of the 28th international conference on machine learning (ICML-11). 929--936.Google ScholarDigital Library
Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, and Themis Palpanas. 2012. Uncertain time-series similarity: Return to the basics. Proceedings of the VLDB Endowment 5, 11 (2012), 1662--1673.Google ScholarDigital Library
Michele Dallachiesa, Themis Palpanas, and Ihab F Ilyas. 2014. Top-k nearest neighbor search in uncertain data series.Proceedings of the VLDB Endowment 8, 1 (2014), 13--24.Google Scholar
Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia MichaelYeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana,Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. 2018. The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.Google Scholar
Janez Demar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7 (2006),1--30.Google ScholarDigital Library
Michel-Marie Deza and Elena Deza. 2006.Dictionary of distances. Elsevier.Google Scholar
Michel Marie Deza and Elena Deza. 2009. Encyclopedia of distances. In Encyclopedia of distances. Springer, 1--583.Google Scholar
Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment 1, 2 (2008), 1542--1552.Google ScholarDigital Library
Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang,and Dongmei Zhang. 2015. Yading: Fast clustering of large-scale time series data. Proceedings of the VLDB Endowment 8, 5 (2015), 473--484.Google ScholarDigital Library
Alejandro Domínguez. 2015. A history of the convolution operation [Retrospectroscope]. IEEE pulse6, 1 (2015), 38--49.Google ScholarCross Ref
Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, and Houda Benbrahim. 2018. The lernaean hydra of data series similarity search:An experimental evaluation of the state of the art. Proceedings of the VLDB Endowment 12, 2 (2018), 112--127.Google ScholarDigital Library
Jason Ernst and Ziv Bar-Joseph. 2006. STEM: a tool for the analysis of short time series gene expression data. BMC bioinformatics 7, 1(2006), 191.Google Scholar
Philippe Esling and Carlos Agon. 2012. Time-series data mining. ACM Computing Surveys (CSUR)45, 1 (2012), 12.Google Scholar
Christos Faloutsos, M. Ranganathan, and Yannis Manolopoulos. 1994. Fast Subsequence Matching in Time-series Databases. In SIGMOD. 419--429.Google Scholar
Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems?The journal of machine learning research15,1 (2014), 3133--3181.Google Scholar
Elias Frentzos, Kostas Gratsias, and Yannis Theodoridis. 2007. Index-based most similar trajectory search. In ICDE. 816--825.Google Scholar
Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Amer. Statist. Assoc. 32 (1937), 675--701.Google ScholarCross Ref
Daniel G Gavin, W Wyatt Oswald, Eugene R Wahl, and John W Williams. 2003. A statistical approach to evaluating distance metrics and analog assignments for pollen records.Quaternary Research 60, 3 (2003), 356--367.Google Scholar
Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, and Rajeev Motwani. 2000. Mining the stock market: Which measure is best. In Proc. of the 6th ACM SIGKDD. 487--496.Google Scholar
Rafael Giusti and Gustavo EAPA Batista. 2013. An Empirical Comparison of Dissimilarity Measures for Time Series Classification. In BRACIS. 82--88.Google Scholar
Steve Goddard, Sherri K Harms, Stephen E Reichenbach, Tsegaye Tadesse, and William J Waltman. 2003. Geospatial decision support for drought risk management. Commun. ACM46, 1 (2003), 35--37.Google Scholar
Dina Q Goldin and Paris C Kanellakis. 1995. On similarity queries for time-series data: constraint specification and implementation. In International Conference on Principles and Practice of Constraint Programming. Springer, 137--153.Google ScholarCross Ref
Tomasz Górecki and Maciej Luczak. 2013. Using derivatives in time series classification.Data Mining and Knowledge Discovery 26, 2(2013), 310--331.Google Scholar
Aditya Grover, Ashish Kapoor, and Eric Horvitz. 2015. A deep hybrid model for weather forecasting. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 379--386.Google ScholarDigital Library
Joel Grus. 2019. Data science from scratch: first principles with python. O'Reilly Media.Google Scholar
Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp, and Anthony Bagnall. 2014. Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery 28, 4 (2014), 851--881.Google ScholarDigital Library
Ove Hoegh-Guldberg, Peter J Mumby, Anthony J Hooten, Robert S Steneck, Paul Greenfield, Edgardo Gomez, C Drew Harvell, Peter FSale, Alasdair J Edwards, Ken Caldeira, et al. 2007. Coral reefs under rapid climate change and ocean acidification. Science 318, 5857 (2007), 1737--1742.Google Scholar
Rie Honda, Shuai Wang, Tokio Kikuchi, and Osamu Konishi. 2002.Mining of moving objects from time-series images and its application to satellite weather imagery. Journal of Intelligent Information Systems 19, 1 (2002), 79--93.Google ScholarDigital Library
Bing Hu, Yanping Chen, and Eamonn Keogh. 2013. Time Series Classification under More Realistic Assumptions. In SDM. 578--586.Google Scholar
Pablo Huijse, Pablo A Estevez, Pavlos Protopapas, Jose C Principe, and Pablo Zegers. 2014. Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Computational Intelligence Magazine 9, 3 (2014), 27--39.Google ScholarDigital Library
Young-Seon Jeong, Myong K Jeong, and Olufemi A Omitaomu. 2011. Weighted dynamic time warping for time series classification. Pattern Recognition 44, 9 (2011), 2231--2240.Google ScholarDigital Library
Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta.2001. Distance measures for effective clustering of ARIMA time-series. In ICDM. 273--280.Google Scholar
Kunio Kashino, Gavin Smith, and Hiroshi Murase. 1999. Time-series active search for quick retrieval of audio and video. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), Vol. 6. IEEE, 2993--2996.Google ScholarDigital Library
Shrikant Kashyap and Panagiotis Karras. 2011. Scalable knn search on vertically stored time series. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1334--1342.Google ScholarDigital Library
Eamonn Keogh. 2006. A decade of progress in indexing and mining large time series databases. In VLDB. 1268--1268.Google Scholar
Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. 2001. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In SIGMOD. 151--162.Google Scholar
Eamonn Keogh and Jessica Lin. 2005. Clustering of time-series subsequences is meaningless: Implications for previous and future research. Knowledge and Information Systems 8, 2 (2005), 154--177.Google ScholarDigital Library
Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowledge and Information Systems 7, 3 (2005), 358--386.Google ScholarDigital Library
Chan Kin-pong and Fu Ada. 1999. Efficient Time Series Matching by Wavelets. In ICDE. 126--133.Google Scholar
S Knieling, J Niediek, E Kutter, J Bostroem, CE Elger, and F Mormann. 2017. An online adaptive screening procedure for selective neuronal responses. Journal of neuroscience methods291 (2017), 36--42.Google Scholar
Flip Korn, H. V. Jagadish, and Christos Faloutsos. 1997. Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. In SIGMOD. 289--300.Google Scholar
Yann LeCun, Yoshua Bengio, et al.1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361, 10 (1995), 1995.Google Scholar
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google Scholar
Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2012. Efficient backprop. InNeural networks: Tricks of the trade. Springer, 9--48.Google Scholar
Qi Lei, Jinfeng Yi, Roman Vaculin, Lingfei Wu, and Inderjit S Dhillon.2017. Similarity preserving representation learning for time series analysis. arXiv preprint arXiv:1702.03584(2017).Google Scholar
Chung-Sheng Li, Philip S. Yu, and Vittorio Castelli. 1996. Hierarchyscan: A hierarchical similarity search algorithm for databases of long sequences. In ICDE. IEEE, 546--553.Google Scholar
Xiang Lian, Lei Chen, Jeffrey Xu Yu, Guoren Wang, and Ge Yu. 2007. Similarity match over high speed time-series streams. InICDE. 1086--1095.Google Scholar
Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. 2004. Iterative incremental clustering of time series. In EDBT. 106--122.Google Scholar
Michele Linardi and Themis Palpanas. 2018. Scalable, variable-length similarity search in data series: The ULISSE approach. Proceedings of the VLDB Endowment 11, 13 (2018), 2236--2248.Google ScholarDigital Library
Jason Lines and Anthony Bagnall. 2015. Time series classification with ensembles of elastic distance measures.Data Mining and Knowledge Discovery 29, 3 (2015), 565--592.Google Scholar
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In2008 Eighth IEEE International Conference on Data Mining. IEEE, 413--422.Google ScholarDigital Library
Helmut Lütkepohl, Markus Krätzig, and Peter CB Phillips. 2004. Applied time series econometrics. Cambridge university press.Google Scholar
Mohammad Saeid Mahdavinejad, Mohammadreza Rezvan, Moham-madamin Barekatain, Peyman Adibi, Payam Barnaghi, and Amit P Sheth. 2017. Machine learning for Internet of Things data analysis: Asurvey. Digital Communications and Networks(2017).Google Scholar
Rosario N Mantegna. 1999. Hierarchical structure in financial markets.The European Physical Journal B-Condensed Matter and Complex Systems 11, 1 (1999), 193--197.Google Scholar
Pierre-François Marteau. 2008. Time warp edit distance with stiffness adjustment for time series matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2 (2008), 306--318.Google ScholarDigital Library
Pierre-François Marteau and Sylvie Gibet. 2014. On recursive edit distance kernels with application to time series classification. IEEE transactions on neural networks and learning systems 26, 6 (2014),1121--1133.Google Scholar
Francisco Martínez-Álvarez, Alicia Troncoso, Gualberto Asencio-Cortés, and José Riquelme. 2015. A survey on data mining techniques applied to electricity-related time series forecasting. Energies 8, 11(2015), 13162--13193.Google ScholarCross Ref
Richard McCleary, Richard A Hay, Erroll E Meidinger, and David McDowall. 1980.Applied time series analysis for the social sciences. Sage Publications Beverly Hills, CA.Google Scholar
Vasileios Megalooikonomou, Qiang Wang, Guo Li, and Christos Faloutsos. 2005. A multiresolution symbolic representation of time series. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on. IEEE, 668--679.Google ScholarDigital Library
Katsiaryna Mirylenka, Vassilis Christophides, Themis Palpanas, Ioannis Pefkianakis, and Martin May. 2016. Characterizing home device usage from wireless traffic time series.Google Scholar
Katsiaryna Mirylenka, Michele Dallachiesa, and Themis Palpanas. 2017. Data series similarity using correlation-aware measures. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. 1--12.Google ScholarDigital Library
A Morales-Esteban, Francisco Martínez-Álvarez, A Troncoso, JL Justo,and Cristina Rubio-Escudero. 2010. Pattern recognition to forecast seismic time series.Expert Systems with Applications 37, 12 (2010),8333--8342.Google Scholar
Michael D Morse and Jignesh M Patel. 2007. An efficient and accurate method for evaluating time series similarity. In SIGMOD. 569--580.Google Scholar
Abdullah Mueen, Eamonn Keogh, and Neal Young. 2011. Logical-shapelets: An expressive primitive for time series classification. In KDD. 1154--1162.Google ScholarDigital Library
Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, and Brandon Westover. 2009. Exact discovery of time series motifs. In Proceedings of the 2009 SIAM international conference on data mining. SIAM, 473--484.Google ScholarCross Ref
Abdullah Mueen, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Gupta, and Eamonn Keogh. 2017.The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance. http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html.Google Scholar
Peter Nemenyi. 1963. Distribution-free Multiple Comparisons. Ph.D. Dissertation. Princeton University.Google Scholar
Themis Palpanas. 2015. Data series management: the road to big sequence analytics. ACM SIGMOD Record 44, 2 (2015), 47--52.Google ScholarDigital Library
Themis Palpanas. 2016. Big sequence management: A glimpse of the past, the present, and the future. InInternational Conference onCurrent Trends in Theory and Practice of Informatics. Springer, 63--80.Google ScholarDigital Library
Panagiotis Papapetrou, Vassilis Athitsos, Michalis Potamias, GeorgeKollios, and Dimitrios Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. TODS 36, 3 (2011), 17.Google ScholarDigital Library
John Paparrizos. 2019. 2018 UCR Time-Series Archive: Backward Compatibility, Missing Values, and Varying Lengths. https://github.com/johnpaparrizos/UCRArchiveFixes.Google Scholar
John Paparrizos and Michael J Franklin. 2019. GRAIL: efficient time-series representation learning. Proceedings of the VLDB Endowment12, 11 (2019), 1762--1777.Google ScholarDigital Library
John Paparrizos and Luis Gravano. 2015. k-shape: Efficient and accurate clustering of time series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1855--1870.Google ScholarDigital Library
John Paparrizos and Luis Gravano. 2017. Fast and Accurate Time-Series Clustering. ACM Transactions on Database Systems (TODS)42, 2 (2017), 8.Google Scholar
Athanasios Papoulis. 1962. The Fourier integral and its applications. McGraw-Hill.Google Scholar
C-K Peng, Shlomo Havlin, H Eugene Stanley, and Ary L Goldberger. 1995. Quantification of scaling exponents and crossover phenomenain nonstationary heartbeat time series. Chaos: An Interdisciplinary Journal of Nonlinear Science 5, 1 (1995), 82--87.Google ScholarCross Ref
François Petitjean, Germain Forestier, Geoffrey I Webb, Ann E Nicholson, Yanping Chen, and Eamonn Keogh. 2014. Dynamic time warping averaging of time series allows faster and more accurate classification. In 2014 IEEE international conference on data mining. IEEE, 470--479.Google ScholarDigital Library
François Petitjean, Germain Forestier, Geoffrey I Webb, Ann E Nicholson, Yanping Chen, and Eamonn Keogh. 2016. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowledge and Information Systems 47, 1 (2016), 1--26.Google ScholarDigital Library
François Petitjean, Alain Ketterlin, and Pierre Gançarski. 2011. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition 44, 3 (2011), 678--693.Google ScholarDigital Library
Davood Rafiei and Alberto Mendelzon. 1997. Similarity-based queries for time series data. In ACM SIGMOD Record, Vol. 26. ACM, 13--25.Google ScholarDigital Library
Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. InKDD. 262--270.Google Scholar
Chotirat Ann Ralanamahatana, Jessica Lin, Dimitrios Gunopulos, Eamonn Keogh, Michail Vlachos, and Gautam Das. 2005. Mining time series data. InData mining and knowledge discovery handbook. Springer, 1069--1103.Google Scholar
Chotirat Ann Ratanamahatana and Eamonn Keogh. 2004. Making time-series classification more accurate using learned constraints. In SDM. 11--22.Google Scholar
Usman Raza, Alessandro Camerra, Amy L Murphy, Themis Palpanas, and Gian Pietro Picco. 2015. Practical data prediction for real-world wireless sensor networks.IEEE Transactions on Knowledge and DataEngineering 27, 8 (2015), 2231--2244.Google ScholarDigital Library
John Rice. 2006.Mathematical statistics and data analysis. Cengage Learning.Google Scholar
Joshua S Richman and J Randall Moorman. 2000. Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology 278, 6(2000), H2039--H2049.Google ScholarCross Ref
Kexin Rong, Clara E Yoon, Karianne J Bergen, Hashem Elezabi, Peter Bailis, Philip Levis, and Gregory C Beroza. 2018. Locality-sensitive hashing for earthquake detection: A case study of scaling data-driven science. Proceedings of the VLDB Endowment11, 11 (2018), 1674--1687.Google ScholarDigital Library
Eduardo J Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, and Alejandro Jaimes. 2012. Correlating financial time series with micro-blogging activity. InProceedings of the fifth ACM international conference on Web search and data mining. ACM, 513--522.Google Scholar
Hiroaki Sakoe and Seibi Chiba. 1971. A dynamic programming approach to continuous speech recognition. In ICA. 65--69.Google Scholar
Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing 26, 1 (1978), 43--49.Google ScholarCross Ref
Yasushi Sakurai, Spiros Papadimitriou, and Christos Faloutsos. 2005.Braid: Stream mining through group lag correlations. In SIGMOD. ACM, 599--610.Google Scholar
Patrick Schäfer and Mikael Högqvist. 2012. SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. InProceedings of the 15th International Conference on Extend-ing Database Technology. ACM, 516--527.Google ScholarDigital Library
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1997. Kernel principal component analysis. InInternational Conference on Artificial Neural Networks. Springer, 583--588.Google ScholarCross Ref
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation 10, 5 (1998), 1299--1319.Google ScholarDigital Library
Bernhard Schölkopf and Alexander J Smola. 2002. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.Google Scholar
Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi,Arnold P Boedihardjo, Crystal Chen, and Susan Frankenstein. 2015. Time series anomaly discovery with grammar-based compression. In Edbt. 481--492.Google Scholar
Dennis Shasha. 1999. Tuning time series queries in finance: Case studies and recommendations. IEEE Data Eng. Bull. 22, 2 (1999),40--46.Google Scholar
Jin Shieh and Eamonn Keogh. 2008. i SAX: indexing and mining terabyte sized time series. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM, 623--631.Google ScholarDigital Library
Yutao Shou, Nikos Mamoulis, and David Cheung. 2005. Fast and exact warping of time series using adaptive segmental approximations.Machine Learning 58, 2--3 (2005), 231--267.Google Scholar
Alexandra Stefan, Vassilis Athitsos, and Gautam Das. 2013. The move-split-merge metric for time series. TKDE 25, 6 (2013), 1425--1438.Google ScholarDigital Library
Ruey S Tsay. 2014. Financial Time Series. Wiley StatsRef: Statistics Reference Online(2014), 1--23.Google ScholarCross Ref
Kuniaki Uehara and Mitsuomi Shimada. 2002. Extraction of primitive motion and discovery of association rules from human motion data. In Progress in Discovery Science. Springer, 338--348.Google Scholar
Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, and Eamonn Keogh. 2006. Indexing multidimensional time-series. The VLDB Journal 15, 1 (2006), 1--20.Google ScholarDigital Library
Michail Vlachos, George Kollios, and Dimitrios Gunopulos. 2002. Discovering similar multidimensional trajectories. In Proceedings 18th international conference on data engineering. IEEE, 673--684.Google ScholarCross Ref
Gabriel Wachman, Roni Khardon, Pavlos Protopapas, and Charles R Alcock. 2009. Kernels for periodic time series arising in astronomy. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 489--505.Google ScholarCross Ref
Hao Wang, Yilun Cai, Yin Yang, Shiming Zhang, and Nikos Mamoulis. 2014. Durable Queries over Historical Time Series. TKDE 26, 3 (2014),595--607.Google ScholarDigital Library
Xiaoyue Wang, Abdullah Mueen, Hui Ding, Goce Trajcevski, Peter Scheuermann, and Eamonn Keogh. 2013. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery(2013), 1--35.Google Scholar
Xiaozhe Wang, Kate Smith, and Rob Hyndman. 2006. Characteristic-based clustering for time series data. Data mining and knowledge Discovery 13, 3 (2006), 335--364.Google Scholar
Yang Wang, Peng Wang, Jian Pei, Wei Wang, and Sheng Huang. 2013. A data-adaptive and dynamic segmentation index for whole matching on time series. Proceedings of the VLDB Endowment 6, 10 (2013), 793--804.Google ScholarDigital Library
T Warren Liao. 2005. Clustering of time series data - a survey. Pattern Recognition 38, 11 (2005), 1857--1874.Google ScholarDigital Library
Peter J Webster, Greg J Holland, Judith A Curry, and H-R Chang.2005. Changes in tropical cyclone number, duration, and intensity in a warming environment. Science 309, 5742 (2005), 1844--1846.Google Scholar
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin(1945), 80--83.Google Scholar
Billy M Williams and Lester A Hoel. 2003. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. Journal of transportation engineering 129, 6(2003), 664--672.Google ScholarCross Ref
Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock. 2018. Random Warping Series: A Random Features Method for Time-Series Embedding. In AISTATS. 793--802.Google Scholar
Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei, and Chotirat Ann Ratanamahatana. 2006. Fast time series classification using numerosity reduction. In Proceedings of the 23rd international conference on Machine learning. ACM, 1033--1040.Google ScholarDigital Library
Yimin Xiong and Dit-Yan Yeung. 2002. Mixtures of ARMA models for model-based time series clustering. In ICDM. 717--720.Google Scholar
Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In WSDM. 177--186.Google Scholar
Dragomir Yankov, Eamonn Keogh, and Umaa Rebbapragada. 2008. Disk aware discord discovery: Finding unusual time series in terabyte sized datasets. Knowledge and Information Systems 17, 2 (2008), 241--262.Google ScholarDigital Library
Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 947--956.Google ScholarDigital Library
Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. 2016. Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining(ICDM). IEEE, 1317--1322.Google Scholar
Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum,Yifei Ding, Hoang Anh Dau, Zachary Zimmerman, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. 2018. Time series joins,motifs, discords and shapelets: a unifying view that exploits the matrix profile. Data Mining and Knowledge Discovery 32, 1 (2018), 83--123.Google ScholarDigital Library
Mi-Yen Yeh, Kun-Lung Wu, Philip S Yu, and Ming-Syan Chen. 2009.PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. InProceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, 684--695.Google ScholarDigital Library
Byoung-Kee Yi and Christos Faloutsos. 2000. Fast time sequence indexing for arbitrary Lp norms. VLDB.Google Scholar
Jesin Zakaria, Abdullah Mueen, and Eamonn Keogh. 2012. Clustering Time Series Using Unsupervised-Shapelets. In ICDM. 785--794.Google Scholar
Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. 2006. Similarity search: the metric space approach. Vol. 32. Springer Science & Business Media.Google Scholar
Guoqing Zheng, Yiming Yang, and Jaime Carbonell. 2016. Efficient shift-invariant dictionary learning. In SIGKDD. ACM, 2095--2104.Google ScholarDigital Library
Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2016. ADS: the adaptive data series index.The VLDB Journal-The International Journal on Very Large Data Bases 25, 6 (2016), 843--866.Google Scholar

Index Terms

Recommendations

Query-sensitive distance measure selection for time series nearest neighbor classification

Many distance or similarity measures have been proposed for time series similarity search. However, none of these measures is guaranteed to be optimal when used for 1-Nearest Neighbor (NN) classification. In this paper we study the problem of selecting ...
Read More
When Similarity Measures Lie
SISAP 2015: Proceedings of the 8th International Conference on Similarity Search and Applications - Volume 9371

Do similarity or distance measures ever go wrong? The inherent subjectivity in similarity discernment has long supported the view that all judgements of similarity are equally valid, and that any selected similarity measure may only be considered more ...
Read More
On efficient network similarity measures
Highlights
- The approach is novel and application oriented.
- It outperforms classical graph ...
Abstract
This paper presents novel graph similarity measures which can be applied to simple directed and undirected networks. To define the graph similarity measures, we first map graphs to real numbers by utilizing structural graph measures. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distance measures
elastic measures
embedding measures
kernel functions
lock-step measures
nearest-neighbor classifier
sliding measures
statistical analysis
time series
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 993
  Total Downloads
- Downloads (Last 12 months)137
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Debunking Four Long-Standing Misconceptions of Time-Series Distance Measures

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Query-sensitive distance measure selection for time series nearest neighbor classification

When Similarity Measures Lie

On efficient network similarity measures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Debunking Four Long-Standing Misconceptions of Time-Series Distance Measures

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Query-sensitive distance measure selection for time series nearest neighbor classification

When Similarity Measures Lie

On efficient network similarity measures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media