Skip to main content

Long-Term Anticipation of Activities with Cycle Consistency

  • Conference paper
  • First Online:
Pattern Recognition (DAGM GCPR 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12544))

Included in the following conference series:

Abstract

With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and that anticipate a sequence of future activities including their durations. While these works decouple the semantic interpretation of the observed sequence from the anticipation task, we propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion. Furthermore, we introduce a cycle consistency loss over time by predicting the past activities given the predicted future. Our framework achieves state-of-the-art results on two datasets: the Breakfast dataset and 50Salads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abu Farha, Y., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: CVPR (2019)

    Google Scholar 

  2. Abu Farha, Y., Gall, J.: Uncertainty-aware anticipation of activities. In: ICCV Workshops (2019)

    Google Scholar 

  3. Abu Farha, Y., Richard, A., Gall, J.: When will you do what?-Anticipating temporal occurrences of activities. In: CVPR (2018)

    Google Scholar 

  4. Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: CVPR (2016)

    Google Scholar 

  5. Bhattacharyya, A., Fritz, M., Schiele, B.: Bayesian prediction of future street scenes using synthetic likelihoods. In: ICLR (2019)

    Google Scholar 

  6. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR (2017)

    Google Scholar 

  7. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Temporal cycle-consistency learning. In: CVPR (2019)

    Google Scholar 

  8. Furnari, A., Battiato, S., Farinella, G.M.: Leveraging uncertainty to rethink loss functions and evaluation measures for egocentric action anticipation. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 389–405. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_24

    Chapter  Google Scholar 

  9. Furnari, A., Farinella, G.M.: What would you expect? Anticipating egocentric actions with rolling-unrolling LSTMs and modality attention. In: ICCV (2019)

    Google Scholar 

  10. Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Forecasting future action sequences with neural memory networks. In: BMVC (2019)

    Google Scholar 

  11. Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Predicting the future: A jointly learnt model for action anticipation. In: ICCV (2019)

    Google Scholar 

  12. Gao, J., Yang, Z., Nevatia, R.: RED: reinforced encoder-decoder networks for action anticipation. In: BMVC (2017)

    Google Scholar 

  13. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)

    Google Scholar 

  14. Hoai, M., De la Torre, F.: Max-margin early event detectors. IJCV 107(2), 191–202 (2014). https://doi.org/10.1007/s11263-013-0683-3

    Article  MathSciNet  Google Scholar 

  15. Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: CVPR (2016)

    Google Scholar 

  16. Ke, Q., Fritz, M., Schiele, B.: Time-conditioned action anticipation in one shot. In: CVPR (2019)

    Google Scholar 

  17. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  18. Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_15

    Chapter  Google Scholar 

  19. Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. TPAMI 38(1), 14–29 (2016)

    Article  Google Scholar 

  20. Kuehne, H., Arslan, A., Serre, T.: The language of actions: recovering the syntax and semantics of goal-directed human activities. In: CVPR (2014)

    Google Scholar 

  21. Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 689–704. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_45

    Chapter  Google Scholar 

  22. Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: CVPR (2017)

    Google Scholar 

  23. Liang, J., Jiang, L., Niebles, J.C., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: predicting future person activities and locations in videos. In: CVPR (2019)

    Google Scholar 

  24. Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. In: ICCV (2017)

    Google Scholar 

  25. Luc, P., Neverova, N., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. In: ICCV (2017)

    Google Scholar 

  26. Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in LSTMs for activity detection and early detection. In: CVPR (2016)

    Google Scholar 

  27. Mahmud, T., Billah, M., Hasan, M., Roy-Chowdhury, A.K.: Captioning near-future activity sequences. arXiv (2019)

    Google Scholar 

  28. Mahmud, T., Hasan, M., Roy-Chowdhury, A.K.: Joint prediction of activity labels and starting times in untrimmed videos. In: ICCV (2017)

    Google Scholar 

  29. Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR (2017)

    Google Scholar 

  30. Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: ICLR (2016)

    Google Scholar 

  31. Mehrasa, N., Jyothi, A.A., Durand, T., He, J., Sigal, L., Mori, G.: A variational auto-encoder model for stochastic point processes. In: CVPR (2019)

    Google Scholar 

  32. Miech, A., Laptev, I., Sivic, J., Wang, H., Torresani, L., Tran, D.: Leveraging the present to anticipate the future in videos. In: CVPR Workshops (2019)

    Google Scholar 

  33. Richard, A., Kuehne, H., Gall, J.: Weakly supervised action learning with RNN based fine-to-coarse modeling. In: CVPR (2017)

    Google Scholar 

  34. Rodriguez, C., Fernando, B., Li, H.: Action anticipation by predicting future dynamic images. In: ECCV Workshops. Springer (2018)

    Google Scholar 

  35. Ruiz, A.H., Gall, J., Moreno-Noguer, F.: Human motion prediction via spatio-temporal inpainting. In: ICCV (2019)

    Google Scholar 

  36. Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV (2011)

    Google Scholar 

  37. Sadegh Aliakbarian, M., et al.: Encouraging LSTMs to anticipate actions very early. In: ICCV (2017)

    Google Scholar 

  38. Sener, F., Yao, A.: Zero-shot anticipation for instructional activities. In: ICCV (2019)

    Google Scholar 

  39. Shi, Y., Fernando, B., Hartley, R.: Action anticipation with RBF kernelized feature mapping RNN. In: ECCV (2018)

    Google Scholar 

  40. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: ICML, pp. 843–852 (2015)

    Google Scholar 

  41. Stein, S., McKenna, S.J.: Combining embedded accelerometers with computer vision for recognizing food preparation activities. In: UbiComp (2013)

    Google Scholar 

  42. Sun, C., Shrivastava, A., Vondrick, C., Sukthankar, R., Murphy, K., Schmid, C.: Relational action forecasting. In: CVPR (2019)

    Google Scholar 

  43. Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  44. Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: CVPR (2016)

    Google Scholar 

  45. Wang, X., Jabri, A., Efros, A.A.: Learning correspondence from the cycle-consistency of time. In: CVPR (2019)

    Google Scholar 

  46. Zeng, K.H., Shen, W.B., Huang, D.A., Sun, M., Carlos Niebles, J.: Visual forecasting by imitating dynamics in natural sequences. In: ICCV (2017)

    Google Scholar 

  47. Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3d-guided cycle consistency. In: CVPR (2016)

    Google Scholar 

  48. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)

    Google Scholar 

Download references

Acknowledgments

The work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – GA 1927/4-1 (FOR 2535 Anticipating Human Behavior) and the ERC Starting Grant ARCA (677650).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yazan Abu Farha .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 144 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abu Farha, Y., Ke, Q., Schiele, B., Gall, J. (2021). Long-Term Anticipation of Activities with Cycle Consistency. In: Akata, Z., Geiger, A., Sattler, T. (eds) Pattern Recognition. DAGM GCPR 2020. Lecture Notes in Computer Science(), vol 12544. Springer, Cham. https://doi.org/10.1007/978-3-030-71278-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71278-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71277-8

  • Online ISBN: 978-3-030-71278-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics