History Repeats Itself: Human Motion Prediction via Motion Attention

Mao, Wei; Liu, Miaomiao; Salzmann, Mathieu

doi:10.1007/978-3-030-58568-6_28

Wei Mao¹²,
Miaomiao Liu¹² &
Mathieu Salzmann¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12359))

Included in the following conference series:

European Conference on Computer Vision

5076 Accesses
104 Citations

Abstract

Human motion prediction aims to forecast future human poses given a past motion. Whether based on recurrent or feed-forward neural networks, existing methods fail to model the observation that human motion tends to repeat itself, even for complex sports actions and cooking activities. Here, we introduce an attention-based feed-forward network that explicitly leverages this observation. In particular, instead of modeling frame-wise attention via pose similarity, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences. Aggregating the relevant past motions and processing the result with a graph convolutional network allows us to effectively exploit motion patterns from the long-term history to predict the future poses. Our experiments on Human3.6M, AMASS and 3DPW evidence the benefits of our approach for both periodical and non-periodical actions. Thanks to our attention model, it yields state-of-the-art results on all three datasets. Our code is available at https://github.com/wei-mao-2019/HisRepItself.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Described at https://github.com/nghorbani/amass.
2.
Available at https://amass.is.tue.mpg.de/dataset.

References

Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: ICLR (2017)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2015)
Google Scholar
Brand, M., Hertzmann, A.: Style machines. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 183–192. ACM Press/Addison-Wesley Publishing Company (2000)
Google Scholar
Butepage, J., Black, M.J., Kragic, D., Kjellstrom, H.: Deep representation learning for human motion prediction and classification. In: CVPR (July 2017)
Google Scholar
Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: ICCV. pp. 4346–4354 (2015)
Google Scholar
Gong, H., Sim, J., Likhachev, M., Shi, J.: Multi-hypothesis motion planning for visual object tracking. In: ICCV, pp. 619–626. IEEE (2011)
Google Scholar
Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., Ororbia, A.G.: A neural temporal model for human motion prediction. In: CVPR, pp. 12116–12125 (2019)
Google Scholar
Gui, L.Y., Wang, Y.X., Liang, X., Moura, J.M.: Adversarial geometry-aware human motion prediction. In: ECCV, pp. 786–803 (2018)
Google Scholar
Hernandez, A., Gall, J., Moreno-Noguer, F.: Human motion prediction via spatio-temporal inpainting. In: ICCV, pp. 7134–7143 (2019)
Google Scholar
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014)
Article Google Scholar
Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: Deep learning on spatio-temporal graphs. In: CVPR, pp. 5308–5317 (2016)
Google Scholar
Kiros, R., et al.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015)
Google Scholar
Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071. Tokyo (2013)
Google Scholar
Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. In: ACM SIGGRAPH 2008 classes, pp. 1–10 (2008)
Google Scholar
Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071. Tokyo (2013)
Google Scholar
Li, C., Zhang, Z., Lee, W.S., Lee, G.H.: Convolutional sequence to sequence model for human dynamics. In: CVPR, pp. 5226–5234 (2018)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34(6), 248:1–248:16 (2015)
Google Scholar
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: archive of motion capture as surface shapes. In: ICCV (October 2019). https://amass.is.tue.mpg.de
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: ICCV, pp. 9489–9497 (2019)
Google Scholar
von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using IMUs and a moving camera. In: ECCV (September 2018)
Google Scholar
Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR (July 2017)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814 (2010)
Google Scholar
Pavllo, D., Feichtenhofer, C., Auli, M., Grangier, D.: Modeling human motion with quaternion-based neural networks. In: IJCV, pp. 1–18 (2019)
Google Scholar
Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36(6), 245 (2017)
Google Scholar
Runia, T.F., Snoek, C.G., Smeulders, A.W.: Real-world repetition estimation by div, grad and curl. In: CVPR, pp. 9009–9017 (2018)
Google Scholar
Sidenbladh, Hedvig., Black, Michael J., Sigal, Leonid: Implicit probabilistic models of human motion for synthesis and tracking. In: Heyden, Anders, Sparr, Gunnar, Nielsen, Mads, Johansen, Peter (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_52
Chapter Google Scholar
Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: ICML, pp. 1017–1024 (2011)
Google Scholar
Tang, Y., Ma, L., Liu, W., Zheng, W.S.: Long-term human motion prediction by modeling motion context and enhancing motion dynamics. IJCAI (July 2018). https://doi.org/10.24963/ijcai.2018/130, http://dx.doi.org/10.24963/ijcai.2018/130
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)
Google Scholar
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. TPAMI 30(2), 283–298 (2008)
Article Google Scholar

Download references

Acknowledgements

This research was supported in part by the Australia Research Council DECRA Fellowship (DE180100628) and ARC Discovery Grant (DP200102274). The authors would like to thank NVIDIA for the donated GPU (Titan V).

Author information

Authors and Affiliations

Australian National University, Canberra, Australia
Wei Mao & Miaomiao Liu
EPFL–CVLab and ClearSpace, Lausanne, Switzerland
Mathieu Salzmann

Authors

Wei Mao
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Salzmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Mao .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 2649 KB)

Supplementary material 2 (pdf 311 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mao, W., Liu, M., Salzmann, M. (2020). History Repeats Itself: Human Motion Prediction via Motion Attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12359. Springer, Cham. https://doi.org/10.1007/978-3-030-58568-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-58568-6_28
Published: 13 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58567-9
Online ISBN: 978-3-030-58568-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics