research-article

Public Access

A Transformer-based Framework for Multivariate Time Series Representation Learning

Authors:
George Zerveas

Brown University, Providence, RI, USA

Brown University, Providence, RI, USA
View Profile

,
Srideepika Jayaraman

IBM Research, Yorktown Heights, NY, USA

IBM Research, Yorktown Heights, NY, USA
View Profile

,
Dhaval Patel

IBM Research, Yorktown Heights, NY, USA

IBM Research, Yorktown Heights, NY, USA
View Profile

,
Anuradha Bhamidipaty

IBM Research, Yorktown Heights, NY, USA

IBM Research, Yorktown Heights, NY, USA
View Profile

,
Carsten Eickhoff

Brown University, Providence, RI, USA

Brown University, Providence, RI, USA
View Profile

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningAugust 2021Pages 2114–2124https://doi.org/10.1145/3447548.3467401

Published:14 August 2021Publication History

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 2114–2124

ABSTRACT

We present a novel framework for multivariate time series representation learning based on the transformer encoder architecture. The framework includes an unsupervised pre-training scheme, which can offer substantial performance benefits over fully supervised learning on downstream tasks, both with but even without leveraging additional unlabeled data, i.e., by reusing the existing data samples. Evaluating our framework on several public multivariate time series datasets from various domains and with diverse characteristics, we demonstrate that it performs significantly better than the best currently available methods for regression and classification, even for datasets which consist of only a few hundred training samples. Given the pronounced interest in unsupervised learning for nearly all domains in the sciences and in industry, these findings represent an important landmark, presenting the first unsupervised method shown to push the limits of state-of-the-art performance for multivariate time series regression and classification.

Supplemental Material

a_transformerbased_framework_for_multivariate-george_zerveas-srideepika_jayaraman-38957975-xf1A.mp4

mp4

192 MB

Download

References

Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. 2018. The UEA multivariate time series classification archive, 2018. arXiv:1811.00075 [cs, stat] (Oct. 2018).Google Scholar
A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh. 2017. The Great Time Series Classification Bake Off: a Review and Experimental Evaluation of Recent Algorithmic Advances. Data Mining and Knowledge Discovery, Vol. 31 (2017), 606--660. Issue 3.Google ScholarDigital Library
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. arXiv:2004.05150 [cs] (April 2020).Google Scholar
Filippo Maria Bianchi, Lorenzo Livi, Karl Øyvind Mikalsen, Michael Kampffmeyer, and Robert Jenssen. 2019. Learning representations of multivariate time series with missing data. Pattern Recognition, Vol. 96 (Dec. 2019), 106973. https://doi.org/10.1016/j.patcog.2019.106973Google ScholarDigital Library
T. Brown, B. Mann, et al. 2020. Language Models are Few-Shot Learners. arxiv: 2005.14165 [cs.CL]Google Scholar
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. arXiv:1901.02860 [cs, stat] (June 2019).Google Scholar
Edward De Brouwer, Jaak Simm, Adam Arany, and Yves Moreau. 2019. GRU-ODE-Bayes: Continuous Modeling of Sporadically-Observed Time Series. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 7379--7390.Google Scholar
Angus Dempster, Franccois Petitjean, and Geoffrey I. Webb. 2020 a. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery (2020). https://doi.org/10.1007/s10618-020-00701-zGoogle ScholarDigital Library
Angus Dempster, Daniel F. Schmidt, and Geoffrey I. Webb. 2020 b. MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification. arXiv:2012.08791 [cs, stat] (Dec. 2020).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv: 1810.04805Google Scholar
Hassan Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. 2019 a. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, Vol. 33, 4 (July 2019), 917--963. https://doi.org/10.1007/s10618-019-00619-1Google ScholarDigital Library
H. Fawaz, B. Lucas, et al. 2019 b. InceptionTime: Finding AlexNet for Time Series Classification. ArXiv (2019). https://doi.org/10.1007/s10618-020-00710-yGoogle ScholarDigital Library
Vincent Fortuin, M. Hüser, Francesco Locatello, Heiko Strathmann, and G. Rätsch. 2019. SOM-VAE: Interpretable Discrete Representation Learning on Time Series. ICLR (2019).Google Scholar
Jean-Yves Franceschi, Aymeric Dieuleveut, and Martin Jaggi. 2019. Unsupervised Scalable Representation Learning for Multivariate Time Series. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 4650--4661.Google Scholar
Cheng-Zhi Anna Huang, Ashish Vaswani, et al. 2018. Music transformer: Generating music with long-term structure. In International Conference on Learning Representations.Google Scholar
A. Jansen, M. Plakal, Ratheet Pandya, D. Ellis, Shawn Hershey, Jiayang Liu, R. C. Moore, and R. A. Saurous. 2018. Unsupervised Learning of Semantic Audio Representations. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018). https://doi.org/10.1109/ICASSP.2018.8461684Google ScholarDigital Library
A. Kopf, Vincent Fortuin, Vignesh Ram Somnath, and M. Claassen. 2019. Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations. ICLR 2019 (2019).Google Scholar
Qi Lei, Jinfeng Yi, R. Vaculín, Lingfei Wu, and I. Dhillon. 2017. Similarity Preserving Representation Learning for Time Series Analysis. ArXiv (2017).Google Scholar
Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, and Xifeng Yan. 2019. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In Advances in Neural Information Processing Systems. 5243--5253.Google Scholar
Bryan Lim, Sercan O. Arik, Nicolas Loeff, and Tomas Pfister. 2020. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. arxiv: 1912.09363 [stat.ML]Google Scholar
J. Lines, Sarah Taylor, and Anthony J. Bagnall. 2018. Time Series Classification with HIVE-COTE. ACM Trans. Knowl. Discov. Data (2018). https://doi.org/10.1145/3182382Google ScholarDigital Library
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2020. On the Variance of the Adaptive Learning Rate and Beyond. arXiv:1908.03265 [cs, stat] (April 2020).Google Scholar
Benjamin Lucas, Ahmed Shifaz, et al. 2019. Proximity Forest: An effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery, Vol. 33, 3 (May 2019), 607--635. https://doi.org/10.1007/s10618-019-00617-3Google ScholarDigital Library
Xinrui Lyu, Matthias Hueser, Stephanie L. Hyland, George Zerveas, and Gunnar Raetsch. 2018. Improving Clinical Predictions through Unsupervised Time Series Representation Learning. In Proceedings of the NeurIPS 2018 Workshop on Machine Learning for Health. arxiv: 1812.00490Google Scholar
J. Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, A. Vetro, and S. Chang. 2019. CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation. arxiv: 1905.09904 [cs.CS]Google Scholar
P. Malhotra, T. Vishnu, L. Vig, Puneet Agarwal, and G. Shroff. 2017. TimeNet: Pre-trained deep recurrent neural network for time series classification. ESANN (2017).Google Scholar
Colin Raffel, Noam Shazeer, et al. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiv, Vol. abs/1910.10683 (2019).Google Scholar
Sheng Shen, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. PowerNorm: Rethinking Batch Normalization in Transformers. arXiv:2003.07845 [cs] (June 2020).Google Scholar
Ahmed Shifaz, Charlotte Pelletier, F. Petitjean, and Geoffrey I. Webb. 2020. TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Mining and Knowledge Discovery (2020). https://doi.org/10.1007/s10618-020-00679-8Google ScholarDigital Library
C. Tan, C. Bergmeir, François Petitjean, and Geoffrey I. Webb. 2020 a. Monash University, UEA, UCR Time Series Regression Archive. ArXiv (2020).Google Scholar
Chang Wei Tan, Christoph Bergmeir, Francois Petitjean, and Geoffrey I Webb. 2020 b. Time Series Regression. arXiv preprint arXiv:2006.12672 (2020).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998--6008.Google ScholarDigital Library
Neo Wu, Bradley Green, Xue Ben, and Shawn O'Banion. 2020. Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case. arxiv: 2001.08317 [cs.LG]Google Scholar
Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, C. Lumezanu, Wei Cheng, Jingchao Ni, B. Zong, H. Chen, and Nitesh V. Chawla. 2019. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In AAAI. https://doi.org/10.1609/aaai.v33i01.33011409Google ScholarDigital Library

Index Terms

A Transformer-based Framework for Multivariate Time Series Representation Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
      2. Unsupervised learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification

In reality, the number of labeled time series data is often small and there is a huge number of unlabeled data. Manually labeling these unlabeled examples is time-consuming and expensive, and sometimes it is even impossible. In this paper, we combine ...
Read More
Unsupervised Cell Segmentation in Fluorescence Microscopy Images via Self-supervised Learning
Pattern Recognition and Artificial Intelligence
Abstract
Cell segmentation in microscopy images is challenging particularly when only few or no annotations available. Existing unsupervised deep learning-based segmentation methods rely on large data sets to train large networks, use synthetic training ...
Read More
Towards federated unsupervised representation learning
EdgeSys '20: Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking

Making deep learning models efficient at inferring nowadays requires training with an extensive number of labeled data that are gathered in a centralized system. However, gathering labeled data is an expensive and time-consuming process, centralized ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
deep learning
framework
imputation
multivariate time series
regression
self-supervised learning
transformer
unsupervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 130
  Total Citations
  View Citations
- 18,715
  Total Downloads
- Downloads (Last 12 months)9,525
- Downloads (Last 6 weeks)1,198
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Transformer-based Framework for Multivariate Time Series Representation Learning

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification

Unsupervised Cell Segmentation in Fluorescence Microscopy Images via Self-supervised Learning

Towards federated unsupervised representation learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Transformer-based Framework for Multivariate Time Series Representation Learning

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification

Unsupervised Cell Segmentation in Fluorescence Microscopy Images via Self-supervised Learning

Towards federated unsupervised representation learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media