Uncertainty-DTW for Time Series and Sequences

Wang, Lei; Koniusz, Piotr

doi:10.1007/978-3-031-19803-8_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13681))

Included in the following conference series:

European Conference on Computer Vision

2113 Accesses
12 Citations

Abstract

Dynamic Time Warping (DTW) is used for matching pairs of sequences and celebrated in applications such as forecasting the evolution of time series, clustering time series or even matching sequence pairs in few-shot action recognition. The transportation plan of DTW contains a set of paths; each path matches frames between two sequences under a varying degree of time warping, to account for varying temporal intra-class dynamics of actions. However, as DTW is the smallest distance among all paths, it may be affected by the feature uncertainty which varies across time steps/frames. Thus, in this paper, we propose to model the so-called aleatoric uncertainty of a differentiable (soft) version of DTW. To this end, we model the heteroscedastic aleatoric uncertainty of each path by the product of likelihoods from Normal distributions, each capturing variance of pair of frames. (The path distance is the sum of base distances between features of pairs of frames of the path.) The Maximum Likelihood Estimation (MLE) applied to a path yields two terms: (i) a sum of Euclidean distances weighted by the variance inverse, and (ii) a sum of log-variance regularization terms. Thus, our uncertainty-DTW is the smallest weighted path distance among all paths, and the regularization term (penalty for the high uncertainty) is the aggregate of log-variances along the path. The distance and the regularization term can be used in various objectives. We showcase forecasting the evolution of time series, estimating the Fréchet mean of time series, and supervised/unsupervised few-shot action recognition of the articulated human 3D body joints.

L. Wang and P. Koniusz—Equal contribution. Code: https://github.com/LeiWangR/uDTW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use temporal blocks as they were shown more robust than frame-wise FSAR [50] models.

References

Abid, A., Zou, J.: AutoWarp: learning a warping distance from unlabeled time series using sequence autoencoders. In: NIPS 2018. Curran Associates Inc., Red Hook (2018)
Google Scholar
Ben-Ari, R., Shpigel Nacson, M., Azulai, O., Barzelay, U., Rotman, D.: TAEN: temporal aware embedding network for few-shot action recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2780–2788 (2021)
Google Scholar
Blondel, M., Mensch, A., Vert, J.P.: Differentiable divergences between time series. In: Banerjee, A., Fukumizu, K. (eds.) Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 130, pp. 3853–3861. PMLR (2021)
Google Scholar
Cao, K., Ji, J., Cao, Z., Chang, C.Y., Niebles, J.C.: Few-shot video classification via temporal alignment. In: CVPR (2020)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–1339 (2014)
Article Google Scholar
Cuturi, M.: Fast global alignment kernels. In: International Conference on Machine Learning (ICML) (2011)
Google Scholar
Cuturi, M., Blondel, M.: Soft-DTW: a differentiable loss function for time-series. In: International Conference on Machine Learning (ICML) (2017)
Google Scholar
Dau, H.A., et al.: The UCR Time Series Classification Archive (2018). https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Dempster, A., Schmidt, D.F., Webb, G.I.: MINIROCKET: a very fast (almost) deterministic transform for time series classification. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD 2021, pp. 248–257. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3447548.3467231
Donahue, J., Dieleman, S., Binkowski, M., Elsen, E., Simonyan, K.: End-to-end adversarial text-to-speech. In: International Conference on Learning Representations (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Google Scholar
García-García, D., Parrado Hernández, E., Díaz-de María, F.: A new distance measure for model-based sequence clustering. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1325–1331 (2009). https://doi.org/10.1109/TPAMI.2008.268
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, Springer, New York (2001)
Book MATH Google Scholar
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457–506 (2021). https://doi.org/10.1007/s10994-021-05946-3
Article MathSciNet MATH Google Scholar
Indrayan, A.: Medical Biostatistics, 2nd edn. Chapman & Hall/CRC, Boca Raton (2008). https://www.loc.gov/catdir/toc/ecip0723/2007030353.html
Kay, W., et al.: The kinetics human action video dataset (2017)
Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Kiureghian, A.D., Ditlevsen, O.: Aleatory or epistemic? Does it matter? Struct. Saf. 31(2), 105–112 (2009). https://doi.org/10.1016/j.strusafe.2008.06.020. Risk Acceptance and Risk Communication
Koniusz, P., Mikolajczyk, K.: Soft assignment of visual words as linear coordinate coding and optimisation of its reconstruction error. In: 2011 18th IEEE International Conference on Image Processing, pp. 2413–2416 (2011). https://doi.org/10.1109/ICIP.2011.6116129
Koniusz, P., Wang, L., Cherian, A.: Tensor representations for action recognition. TPAMI 44, 648–665 (2020)
Article Google Scholar
Koniusz, P., Yan, F., Mikolajczyk, K.: Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection. Comput. Vis. Image Underst. 117(5), 479–492 (2013). https://doi.org/10.1016/j.cviu.2012.10.010
Article Google Scholar
Li, S., et al.: TTAN: two-stage temporal alignment network for few-shot action recognition. CoRR (2021)
Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)
Article MathSciNet MATH Google Scholar
Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.Y., Kot, A.C.: NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2916873
Liu, L., Wang, L., Liu, X.: In defense of soft-assignment coding. In: 2011 International Conference on Computer Vision, pp. 2486–2493 (2011). https://doi.org/10.1109/ICCV.2011.6126534
Lohit, S., Wang, Q., Turaga, P.: Temporal transformer networks: joint learning of invariant and discriminative time warping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3D human pose estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2659–2668 (2017). https://doi.org/10.1109/ICCV.2017.288
Matthies, H.G.: Quantifying uncertainty: modern computational representation of probability and applications. In: Ibrahimbegovic, A., Kozar, I. (eds.) Extreme Man-Made and Natural Hazards in Dynamics of Structures, pp. 105–135. Springer, Netherlands, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-5656-7_4
Chapter Google Scholar
Memmesheimer, R., Häring, S., Theisen, N., Paulus, D.: Skeleton-DML: deep metric learning for skeleton-based one-shot action recognition (2021)
Google Scholar
Memmesheimer, R., Theisen, N., Paulus, D.: Signal level deep metric learning for multimodal one-shot action recognition (2020)
Google Scholar
Mensch, A., Blondel, M.: Differentiable dynamic programming for structured prediction and attention. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 3462–3471. PMLR (2018)
Google Scholar
Mina, B., Zoumpourlis, G., Patras, I.: Tarn: temporal attentive relation network for few-shot and zero-shot action recognition. In: Sidorov, K., Hicks, Y. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 130.1–130.14. BMVA Press (2019). https://doi.org/10.5244/C.33.130
Perrett, T., Masullo, A., Burghardt, T., Mirmehdi, M., Damen, D.: Temporal-relational crosstransformers for few-shot action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 475–484 (2021)
Google Scholar
Ramachandran, P., Liu, P.J., Le, Q.V.: Unsupervised pretraining for sequence to sequence learning (2018)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 26(1), 43–49 (1978). https://doi.org/10.1109/TASSP.1978.1163055
Article MATH Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017, pp. 4077–4087 (2017)
Google Scholar
Su, B., Hua, G.: Order-preserving optimal transport for distances between sequences. IEEE Trans. Pattern Anal. Mach. Intell. 41(12), 2961–2974 (2019). https://doi.org/10.1109/TPAMI.2018.2870154
Article Google Scholar
Su, B., Wen, J.R.: Temporal alignment prediction for supervised representation learning and few-shot sequence classification. In: International Conference on Learning Representations (2022)
Google Scholar
Su, B., Zhou, J., Wu, Y.: Order-preserving Wasserstein discriminant analysis. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9884–9893 (2019). https://doi.org/10.1109/ICCV.2019.00998
Tan, S., Yang, R.: Learning similarity: feature-aligning network for few-shot action recognition. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2019)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016, pp. 3630–3638 (2016)
Google Scholar
Wang, L.: Analysis and evaluation of Kinect-based action recognition algorithms. Master’s thesis, School of the Computer Science and Software Engineering, The University of Western Australia (2017)
Google Scholar
Wang, L., Huynh, D.Q., Koniusz, P.: A comparative review of recent Kinect-based action recognition algorithms. IEEE Trans. Image Process. 29, 15–28 (2020)
Article MathSciNet MATH Google Scholar
Wang, L., Huynh, D.Q., Mansour, M.R.: Loss switching fusion with similarity search for video classification. In: ICIP (2019)
Google Scholar
Wang, L., Koniusz, P.: Self-supervising action recognition by statistical moment and subspace descriptors, pp. 4324–4333. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3474085.3475572
Wang, L., Koniusz, P., Huynh, D.Q.: Hallucinating IDT descriptors and I3D optical flow features for action recognition with CNNs. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)
Google Scholar
Yang, C.H.H., Tsai, Y.Y., Chen, P.Y.: Voice2series: reprogramming acoustic models for time series classification. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 11808–11819. PMLR (2021)
Google Scholar
Zhang, H., Zhang, L., Qi, X., Li, H., Torr, P.H.S., Koniusz, P.: Few-shot action recognition with permutation-invariant attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 525–542. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_31
Chapter Google Scholar
Zhu, H., Koniusz, P.: Simple spectral graph convolution. In: International Conference on Learning Representations (ICLR) (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Australian National University, Canberra, Australia
Lei Wang & Piotr Koniusz
Data61/CSIRO, Canberra, Australia
Lei Wang & Piotr Koniusz

Authors

Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Koniusz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piotr Koniusz .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 668 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Koniusz, P. (2022). Uncertainty-DTW for Time Series and Sequences. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13681. Springer, Cham. https://doi.org/10.1007/978-3-031-19803-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-19803-8_11
Published: 23 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19802-1
Online ISBN: 978-3-031-19803-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Uncertainty-DTW for Time Series and Sequences