Domain-Adaptive Discriminative One-Shot Learning of Gestures

Pfister, Tomas; Charles, James; Zisserman, Andrew

doi:10.1007/978-3-319-10599-4_52

Tomas Pfister¹⁹,
James Charles²⁰ &
Andrew Zisserman¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8694))

Included in the following conference series:

European Conference on Computer Vision

17k Accesses
35 Citations
1 Altmetric

Abstract

The objective of this paper is to recognize gestures in videos – both localizing the gesture and classifying it into one of multiple classes.

We show that the performance of a gesture classifier learnt from a single (strongly supervised) training example can be boosted significantly using a ‘reservoir’ of weakly supervised gesture examples (and that the performance exceeds learning from the one-shot example or reservoir alone). The one-shot example and weakly supervised reservoir are from different ‘domains’ (different people, different videos, continuous or non-continuous gesturing, etc), and we propose a domain adaptation method for human pose and hand shape that enables gesture learning methods to generalise between them. We also show the benefits of using the recently introduced Global Alignment Kernel [12], instead of the standard Dynamic Time Warping that is generally used for time alignment.

The domain adaptation and learning methods are evaluated on two large scale challenging gesture datasets: one for sign language, and the other for Italian hand gestures. In both cases performance exceeds the previous published results, including the best skeleton-classification-only entry in the 2013 ChaLearn challenge.

Download to read the full chapter text

Chapter PDF

Continuous Gesture Recognition from Articulated Poses

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Multi-layered Gesture Recognition with Kinect

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE PAMI 32(2), 288–303 (2010)
Article Google Scholar
Baisero, A., Pokorny, F.T., Kragic, D., Ek, C.: The path kernel. In: ICPRAM (2013)
Google Scholar
Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding actors and actions in movies. In: Proc. ICCV (2013)
Google Scholar
Books, M.: The standard dictionary of the British sign language. DVD (2005)
Google Scholar
Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: Proc. ICCV (2001)
Google Scholar
Bristol Centre for Deaf Studies: Signstation, http://www.signstation.org (accessed March 1, 2014)
Buehler, P., Everingham, M., Zisserman, A.: Learning sign language by watching TV (using weakly aligned subtitles). In: Proc. CVPR (2009)
Google Scholar
Chai, X., Li, G., Lin, Y., Xu, Z., Tang, Y., Chen, X., Zhou, M.: Sign language recognition and translation with Kinect. In: Proc. Int. Conf. Autom. Face and Gesture Recog. (2013)
Google Scholar
Charles, J., Pfister, T., Everingham, M., Zisserman, A.: Automatic and efficient human pose estimation for sign language videos. IJCV (2013)
Google Scholar
Charles, J., Pfister, T., Magee, D., Hogg, D., Zisserman, A.: Domain adaptation for upper body pose tracking in signed TV broadcasts. In: Proc. BMVC (2013)
Google Scholar
Cooper, H., Bowden, R.: Learning signs from subtitles: A weakly supervised approach to sign language recognition. In: Proc. CVPR (2009)
Google Scholar
Cuturi, M.: Fast global alignment kernels. In: ICML (2011)
Google Scholar
Cuturi, M., Vert, J., Birkenes, Ø., Matsui, T.: A kernel for time series based on global alignments. In: ICASSP (2007)
Google Scholar
Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: Proc. CVPR (2009)
Google Scholar
Escalera, S., Gonzàlez, J., Baró, X., Reyes, M., Guyon, I., Athitsos, V., Escalante, H., Sigal, L., Argyros, A., Sminchisescu, C.: Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary. In: ACM MM (2013)
Google Scholar
Fanello, S., Gori, I., Metta, G., Odone, F.: Keep it simple and sparse: real-time action recognition. J. Machine Learning Research 14(1), 2617–2640 (2013)
Google Scholar
Farhadi, A., Forsyth, D., White, R.: Transfer learning in sign language. In: Proc. CVPR (2007)
Google Scholar
Gaidon, A., Harchaoui, Z., Schmid, C.: A time series kernel for action recognition. In: Proc. BMVC (2011)
Google Scholar
Guyon, I., Athitsos, V., Jangyodsuk, P., Escalante, H., Hamner, B.: Results and analysis of the ChaLearn gesture challenge 2012. In: Proc. ICPR (2013)
Google Scholar
Guyon, I., Athitsos, V., Jangyodsuk, P., Hamner, B., Escalante, H.: ChaLearn gesture challenge: Design and first results. In: CVPR Workshops (2012)
Google Scholar
Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)
Chapter Google Scholar
Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: Proc. ICCV (2007)
Google Scholar
Kelly, D., McDonald, J., Markham, C.: Weakly supervised training of a sign language recognition system using multiple instance learning density matrices. Trans. Systems, Man, and Cybernetics 41(2), 526–541 (2011)
Google Scholar
Krishnan, R., Sarkar, S.: Similarity measure between two gestures using triplets. In: CVPR Workshops (2013)
Google Scholar
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: Proc. ICCV (2011)
Google Scholar
Nayak, S., Duncan, K., Sarkar, S., Loeding, B.: Finding recurrent patterns from continuous sign language sentences for automated extraction of signs. J. Machine Learning Research 13(1), 2589–2615 (2012)
MATH MathSciNet Google Scholar
Pfister, T., Charles, J., Everingham, M., Zisserman, A.: Automatic and efficient long term arm and hand tracking for continuous sign language TV broadcasts. In: Proc. BMVC (2012)
Google Scholar
Pfister, T., Charles, J., Zisserman, A.: Large-scale learning of sign language by watching TV (using co-occurrences). In: Proc. BMVC (2013)
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. In: Proc. ACM SIGGRAPH (2004)
Google Scholar
Sakoe, H.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing (1978)
Google Scholar
Sakoe, H., Chiba, S.: A similarity evaluation of speech patterns by dynamic programming. In: Nat. Meeting of Institute of Electronic Communications Engineers of Japan (1970)
Google Scholar
Shimodaira, H., Noma, K., Nakai, M., Sagayama, S.: Dynamic time-alignment kernel in support vector machine. In: NIPS (2001)
Google Scholar
Wan, J., Ruan, Q., Li, W., Deng, S.: One-shot learning gesture recognition from RGB-D data using bag of features. J. Machine Learning Research 14(1), 2549–2582 (2013)
Google Scholar
Wu, J., Cheng, J., Zhao, C., Lu, H.: Fusing multi-modal features for gesture recognition. In: ICMI (2013)
Google Scholar
Zhou, F., De la Torre, F.: Generalized time warping for multi-modal alignment of human motion. In: Proc. CVPR (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Visual Geometry Group, Department of Engineering Science, University of Oxford, UK
Tomas Pfister & Andrew Zisserman
Computer Vision Group, School of Computing, University of Leeds, UK
James Charles

Authors

Tomas Pfister
View author publications
You can also search for this author in PubMed Google Scholar
James Charles
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pfister, T., Charles, J., Zisserman, A. (2014). Domain-Adaptive Discriminative One-Shot Learning of Gestures. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-10599-4_52
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Domain-Adaptive Discriminative One-Shot Learning of Gestures

Abstract

Chapter PDF

Similar content being viewed by others

Continuous Gesture Recognition from Articulated Poses

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Multi-layered Gesture Recognition with Kinect

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Domain-Adaptive Discriminative One-Shot Learning of Gestures

Abstract

Chapter PDF

Similar content being viewed by others

Continuous Gesture Recognition from Articulated Poses

One-Shot-Learning Gesture Recognition Using HOG-HOF Features

Multi-layered Gesture Recognition with Kinect

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation