Skip to main content

History Repeats Itself: Human Motion Prediction via Motion Attention

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12359))

Included in the following conference series:

Abstract

Human motion prediction aims to forecast future human poses given a past motion. Whether based on recurrent or feed-forward neural networks, existing methods fail to model the observation that human motion tends to repeat itself, even for complex sports actions and cooking activities. Here, we introduce an attention-based feed-forward network that explicitly leverages this observation. In particular, instead of modeling frame-wise attention via pose similarity, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences. Aggregating the relevant past motions and processing the result with a graph convolutional network allows us to effectively exploit motion patterns from the long-term history to predict the future poses. Our experiments on Human3.6M, AMASS and 3DPW evidence the benefits of our approach for both periodical and non-periodical actions. Thanks to our attention model, it yields state-of-the-art results on all three datasets. Our code is available at https://github.com/wei-mao-2019/HisRepItself.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Described at https://github.com/nghorbani/amass.

  2. 2.

    Available at https://amass.is.tue.mpg.de/dataset.

References

  1. Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. In: ICLR (2017)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2015)

    Google Scholar 

  3. Brand, M., Hertzmann, A.: Style machines. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 183–192. ACM Press/Addison-Wesley Publishing Company (2000)

    Google Scholar 

  4. Butepage, J., Black, M.J., Kragic, D., Kjellstrom, H.: Deep representation learning for human motion prediction and classification. In: CVPR (July 2017)

    Google Scholar 

  5. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: ICCV. pp. 4346–4354 (2015)

    Google Scholar 

  6. Gong, H., Sim, J., Likhachev, M., Shi, J.: Multi-hypothesis motion planning for visual object tracking. In: ICCV, pp. 619–626. IEEE (2011)

    Google Scholar 

  7. Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., Ororbia, A.G.: A neural temporal model for human motion prediction. In: CVPR, pp. 12116–12125 (2019)

    Google Scholar 

  8. Gui, L.Y., Wang, Y.X., Liang, X., Moura, J.M.: Adversarial geometry-aware human motion prediction. In: ECCV, pp. 786–803 (2018)

    Google Scholar 

  9. Hernandez, A., Gall, J., Moreno-Noguer, F.: Human motion prediction via spatio-temporal inpainting. In: ICCV, pp. 7134–7143 (2019)

    Google Scholar 

  10. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014)

    Article  Google Scholar 

  11. Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: Deep learning on spatio-temporal graphs. In: CVPR, pp. 5308–5317 (2016)

    Google Scholar 

  12. Kiros, R., et al.: Skip-thought vectors. In: NIPS, pp. 3294–3302 (2015)

    Google Scholar 

  13. Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071. Tokyo (2013)

    Google Scholar 

  14. Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. In: ACM SIGGRAPH 2008 classes, pp. 1–10 (2008)

    Google Scholar 

  15. Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071. Tokyo (2013)

    Google Scholar 

  16. Li, C., Zhang, Z., Lee, W.S., Lee, G.H.: Convolutional sequence to sequence model for human dynamics. In: CVPR, pp. 5226–5234 (2018)

    Google Scholar 

  17. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34(6), 248:1–248:16 (2015)

    Google Scholar 

  18. Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: Amass: archive of motion capture as surface shapes. In: ICCV (October 2019). https://amass.is.tue.mpg.de

  19. Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: ICCV, pp. 9489–9497 (2019)

    Google Scholar 

  20. von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3d human pose in the wild using IMUs and a moving camera. In: ECCV (September 2018)

    Google Scholar 

  21. Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR (July 2017)

    Google Scholar 

  22. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814 (2010)

    Google Scholar 

  23. Pavllo, D., Feichtenhofer, C., Auli, M., Grangier, D.: Modeling human motion with quaternion-based neural networks. In: IJCV, pp. 1–18 (2019)

    Google Scholar 

  24. Romero, J., Tzionas, D., Black, M.J.: Embodied hands: modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36(6), 245 (2017)

    Google Scholar 

  25. Runia, T.F., Snoek, C.G., Smeulders, A.W.: Real-world repetition estimation by div, grad and curl. In: CVPR, pp. 9009–9017 (2018)

    Google Scholar 

  26. Sidenbladh, Hedvig., Black, Michael J., Sigal, Leonid: Implicit probabilistic models of human motion for synthesis and tracking. In: Heyden, Anders, Sparr, Gunnar, Nielsen, Mads, Johansen, Peter (eds.) ECCV 2002. LNCS, vol. 2350, pp. 784–800. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_52

    Chapter  Google Scholar 

  27. Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: ICML, pp. 1017–1024 (2011)

    Google Scholar 

  28. Tang, Y., Ma, L., Liu, W., Zheng, W.S.: Long-term human motion prediction by modeling motion context and enhancing motion dynamics. IJCAI (July 2018). https://doi.org/10.24963/ijcai.2018/130, http://dx.doi.org/10.24963/ijcai.2018/130

  29. Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)

    Google Scholar 

  30. Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. TPAMI 30(2), 283–298 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the Australia Research Council DECRA Fellowship (DE180100628) and ARC Discovery Grant (DP200102274). The authors would like to thank NVIDIA for the donated GPU (Titan V).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Mao .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 2649 KB)

Supplementary material 2 (pdf 311 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mao, W., Liu, M., Salzmann, M. (2020). History Repeats Itself: Human Motion Prediction via Motion Attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12359. Springer, Cham. https://doi.org/10.1007/978-3-030-58568-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58568-6_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58567-9

  • Online ISBN: 978-3-030-58568-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics