Long-Term Anticipation of Activities with Cycle Consistency

Abu Farha, Yazan; Ke, Qiuhong; Schiele, Bernt; Gall, Juergen

doi:10.1007/978-3-030-71278-5_12

Yazan Abu Farha¹¹,
Qiuhong Ke¹²,
Bernt Schiele¹³ &
…
Juergen Gall¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12544))

Included in the following conference series:

DAGM German Conference on Pattern Recognition

1186 Accesses
4 Citations

Abstract

With the success of deep learning methods in analyzing activities in videos, more attention has recently been focused towards anticipating future activities. However, most of the work on anticipation either analyzes a partially observed activity or predicts the next action class. Recently, new approaches have been proposed to extend the prediction horizon up to several minutes in the future and that anticipate a sequence of future activities including their durations. While these works decouple the semantic interpretation of the observed sequence from the anticipation task, we propose a framework for anticipating future activities directly from the features of the observed frames and train it in an end-to-end fashion. Furthermore, we introduce a cycle consistency loss over time by predicting the past activities given the predicted future. Our framework achieves state-of-the-art results on two datasets: the Breakfast dataset and 50Salads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abu Farha, Y., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: CVPR (2019)
Google Scholar
Abu Farha, Y., Gall, J.: Uncertainty-aware anticipation of activities. In: ICCV Workshops (2019)
Google Scholar
Abu Farha, Y., Richard, A., Gall, J.: When will you do what?-Anticipating temporal occurrences of activities. In: CVPR (2018)
Google Scholar
Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: CVPR (2016)
Google Scholar
Bhattacharyya, A., Fritz, M., Schiele, B.: Bayesian prediction of future street scenes using synthetic likelihoods. In: ICLR (2019)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR (2017)
Google Scholar
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Temporal cycle-consistency learning. In: CVPR (2019)
Google Scholar
Furnari, A., Battiato, S., Farinella, G.M.: Leveraging uncertainty to rethink loss functions and evaluation measures for egocentric action anticipation. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 389–405. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11021-5_24
Chapter Google Scholar
Furnari, A., Farinella, G.M.: What would you expect? Anticipating egocentric actions with rolling-unrolling LSTMs and modality attention. In: ICCV (2019)
Google Scholar
Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Forecasting future action sequences with neural memory networks. In: BMVC (2019)
Google Scholar
Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Predicting the future: A jointly learnt model for action anticipation. In: ICCV (2019)
Google Scholar
Gao, J., Yang, Z., Nevatia, R.: RED: reinforced encoder-decoder networks for action anticipation. In: BMVC (2017)
Google Scholar
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)
Google Scholar
Hoai, M., De la Torre, F.: Max-margin early event detectors. IJCV 107(2), 191–202 (2014). https://doi.org/10.1007/s11263-013-0683-3
Article MathSciNet Google Scholar
Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: CVPR (2016)
Google Scholar
Ke, Q., Fritz, M., Schiele, B.: Time-conditioned action anticipation in one shot. In: CVPR (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_15
Chapter Google Scholar
Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. TPAMI 38(1), 14–29 (2016)
Article Google Scholar
Kuehne, H., Arslan, A., Serre, T.: The language of actions: recovering the syntax and semantics of goal-directed human activities. In: CVPR (2014)
Google Scholar
Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 689–704. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_45
Chapter Google Scholar
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: CVPR (2017)
Google Scholar
Liang, J., Jiang, L., Niebles, J.C., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: predicting future person activities and locations in videos. In: CVPR (2019)
Google Scholar
Liang, X., Lee, L., Dai, W., Xing, E.P.: Dual motion GAN for future-flow embedded video prediction. In: ICCV (2017)
Google Scholar
Luc, P., Neverova, N., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. In: ICCV (2017)
Google Scholar
Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in LSTMs for activity detection and early detection. In: CVPR (2016)
Google Scholar
Mahmud, T., Billah, M., Hasan, M., Roy-Chowdhury, A.K.: Captioning near-future activity sequences. arXiv (2019)
Google Scholar
Mahmud, T., Hasan, M., Roy-Chowdhury, A.K.: Joint prediction of activity labels and starting times in untrimmed videos. In: ICCV (2017)
Google Scholar
Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR (2017)
Google Scholar
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. In: ICLR (2016)
Google Scholar
Mehrasa, N., Jyothi, A.A., Durand, T., He, J., Sigal, L., Mori, G.: A variational auto-encoder model for stochastic point processes. In: CVPR (2019)
Google Scholar
Miech, A., Laptev, I., Sivic, J., Wang, H., Torresani, L., Tran, D.: Leveraging the present to anticipate the future in videos. In: CVPR Workshops (2019)
Google Scholar
Richard, A., Kuehne, H., Gall, J.: Weakly supervised action learning with RNN based fine-to-coarse modeling. In: CVPR (2017)
Google Scholar
Rodriguez, C., Fernando, B., Li, H.: Action anticipation by predicting future dynamic images. In: ECCV Workshops. Springer (2018)
Google Scholar
Ruiz, A.H., Gall, J., Moreno-Noguer, F.: Human motion prediction via spatio-temporal inpainting. In: ICCV (2019)
Google Scholar
Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV (2011)
Google Scholar
Sadegh Aliakbarian, M., et al.: Encouraging LSTMs to anticipate actions very early. In: ICCV (2017)
Google Scholar
Sener, F., Yao, A.: Zero-shot anticipation for instructional activities. In: ICCV (2019)
Google Scholar
Shi, Y., Fernando, B., Hartley, R.: Action anticipation with RBF kernelized feature mapping RNN. In: ECCV (2018)
Google Scholar
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: ICML, pp. 843–852 (2015)
Google Scholar
Stein, S., McKenna, S.J.: Combining embedded accelerometers with computer vision for recognizing food preparation activities. In: UbiComp (2013)
Google Scholar
Sun, C., Shrivastava, A., Vondrick, C., Sukthankar, R., Murphy, K., Schmid, C.: Relational action forecasting. In: CVPR (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: CVPR (2016)
Google Scholar
Wang, X., Jabri, A., Efros, A.A.: Learning correspondence from the cycle-consistency of time. In: CVPR (2019)
Google Scholar
Zeng, K.H., Shen, W.B., Huang, D.A., Sun, M., Carlos Niebles, J.: Visual forecasting by imitating dynamics in natural sequences. In: ICCV (2017)
Google Scholar
Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3d-guided cycle consistency. In: CVPR (2016)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar

Download references

Acknowledgments

The work has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – GA 1927/4-1 (FOR 2535 Anticipating Human Behavior) and the ERC Starting Grant ARCA (677650).

Author information

Authors and Affiliations

University of Bonn, Bonn, Germany
Yazan Abu Farha & Juergen Gall
The University of Melbourne, Melbourne, Australia
Qiuhong Ke
MPI Informatics, Saarbrücken, Germany
Bernt Schiele

Authors

Yazan Abu Farha
View author publications
You can also search for this author in PubMed Google Scholar
Qiuhong Ke
View author publications
You can also search for this author in PubMed Google Scholar
Bernt Schiele
View author publications
You can also search for this author in PubMed Google Scholar
Juergen Gall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yazan Abu Farha .

Editor information

Editors and Affiliations

University of Tübingen, Tübingen, Germany
Zeynep Akata
University of Tübingen, Tübingen, Germany
Andreas Geiger
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 144 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abu Farha, Y., Ke, Q., Schiele, B., Gall, J. (2021). Long-Term Anticipation of Activities with Cycle Consistency. In: Akata, Z., Geiger, A., Sattler, T. (eds) Pattern Recognition. DAGM GCPR 2020. Lecture Notes in Computer Science(), vol 12544. Springer, Cham. https://doi.org/10.1007/978-3-030-71278-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-71278-5_12
Published: 17 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71277-8
Online ISBN: 978-3-030-71278-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics