Skip to main content

One-Shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

  • Chapter
  • First Online:
Gesture Recognition

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2).

Editors: Isabelle Guyon and Vassilis Athitsos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The depth values are normalized to [0 255] in depth videos.

  2. 2.

    MoSIFT and 3D MoSIFT have the same strategy to detect interest points.

  3. 3.

    Here, \(\beta _{1}=0.005\) according to the reference (Ming et al. 2012).

  4. 4.

    Here, \(\beta _{1}=\beta _{2}=0.005\).

References

  • G. Bradski, The OpenCV Library, Dr. Dobb’s Journal of Software Tools, 2000

    Google Scholar 

  • M. Brand, N. Oliver, A. Pentland. Coupled hidden markov models for complex action recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1997, pp 994–999

    Google Scholar 

  • C.C. Chang, C.J. Lin. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, 2011

    Google Scholar 

  • F.S. Chen, C.M. Fu, C.L. Huang, Hand gesture recognition using a real-time tracking method and hidden markov models. Image Vis. Comput. 21, 745–758 (2003)

    Article  Google Scholar 

  • M. Chen, A. Hauptmann. Mosift: Recognizing human actions in surveillance videos. Technical Report, 2009

    Google Scholar 

  • H. Cooper, E.J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13, 2205–2231 (2012)

    MATH  Google Scholar 

  • A. Corradini. Dynamic time warping for off-line recognition of a small gesture vocabulary, in IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001, pp. 82–89

    Google Scholar 

  • N.H. Dardas, N.D. Georganas, Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Meas. 60(11), 3592–3607 (2011)

    Article  Google Scholar 

  • P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features, in Proceedings of IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72

    Google Scholar 

  • H.J. Escalante, I. Guyon. Principal motion: Pca-based reconstruction of motion histograms. Technical Memorandum, 2012

    Google Scholar 

  • L. Fei-Fei, P. Perona, A bayesian hierarchical model for learning natural scene categories. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 524–531 (2005)

    Google Scholar 

  • F. Flórez, J.M. García, J. García, A. Hernández. Hand gesture recognition following the dynamics of a topology-preserving network, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, 2002, pp. 318–323

    Google Scholar 

  • P.-E. Forssen, D.G. Lowe. Shape descriptors for maximally stable extremal regions, in IEEE 11th International Conference on Computer Vision, 2007, pp. 1–8

    Google Scholar 

  • W.T. Freeman, M. Roth, Orientation histograms for hand gesture recognition. Proc. IEEE Int. Workshop Autom. Face Gesture Recognit. 12, 296–301 (1995)

    Google Scholar 

  • W. Gao, G. Fang, D. Zhao, Y. Chen, A chinese sign language recognition system based on sofm/srn/hmm. Pattern Recognit. 37(12), 2389–2402 (2004)

    Article  MATH  Google Scholar 

  • T. Guha, R.K. Ward, Learning sparse representations for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1576–1588 (2012)

    Article  Google Scholar 

  • S. Guo, Z. Wang, Q. Ruan, Enhancing sparsity via \(\ell _{p}\) (0\(<\)p\(<\)1) minimization for robust face recognition. Neurocomputing 99, 592–602 (2013)

    Article  Google Scholar 

  • I. Guyon, V. Athitsos, P. Jangyodsuk, B. Hamner, and H.J. Escalante. Chalearn gesture challenge: Design and first results, in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 1–6

    Google Scholar 

  • I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, B. Hamner. Results and analysis of the chalearn gesture challenge 2012. Technical Report, 2013

    Google Scholar 

  • C. Harris and M. Stephens. A combined corner and edge detector, in Proceedings of Alvey Vision Conference, volume 15, p. 50, 1988

    Google Scholar 

  • A. Hernández-Vela, M. A. Bautista, X. Perez-Sala, V. Ponce, X. Baró, O. Pujol, C. Angulo, S. Escalera. Bovdw: Bag-of-visual-and-depth-words for gesture recognition. 21st International Conference on Pattern Recognition (ICPR), 2012

    Google Scholar 

  • D. Kim, J. Song, D. Kim, Simultaneous gesture segmentation and recognition based on forward spotting accumulative hmms. Pattern Recognit. 40(11), 3012–3026 (2007)

    Article  MATH  Google Scholar 

  • I. Laptev, On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)

    Article  Google Scholar 

  • J.F. Lichtenauer, E.A. Hendriks, M.J.T. Reinders, Sign language recognition by combining statistical dtw and independent classification. Pattern Anal. Mach. Intell. IEEE Trans. 30(11), 2040–2046 (2008)

    Article  Google Scholar 

  • Y. Linde, A. Buzo, R. Gray, An algorithm for vector quantizer design. Commun. IEEE Trans. 28(1), 84–95 (1980)

    Article  Google Scholar 

  • D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  • B.D. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision, in Proceedings of the 7th International Joint Conference on Artificial Intelligence, 1981

    Google Scholar 

  • Y.M. Lui, Human gesture recognition on product manifolds. J. Mach. Learn. Res. 13, 3297–3321 (2012)

    MathSciNet  MATH  Google Scholar 

  • M.R. Malgireddy, I. Inwogu, V. Govindaraju. A temporal bayesian model for classifying, detecting and localizing activities in video sequences, in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 43–48

    Google Scholar 

  • A. Malima, E. Ozgur, M. Çetin. A fast algorithm for vision-based hand gesture recognition for robot control, in Proceedings of IEEE Signal Processing and Communications Applications, 2006, pp. 1–4

    Google Scholar 

  • Y. Ming, Q. Ruan, A.G. Hauptmann. Activity recognition from rgb-d camera with 3d local spatio-temporal features, in Proceedings of IEEE International Conference on Multimedia and Expo, 2012 pp. 344–349

    Google Scholar 

  • L.P. Morency, A. Quattoni, T. Darrell. Latent-dynamic discriminative models for continuous gesture recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8

    Google Scholar 

  • B.A. Olshausen, D.J. Field et al., Sparse coding with an overcomplete basis set: a strategy employed by vi? Vis. Res. 37(23), 3311–3326 (1997)

    Article  Google Scholar 

  • V.I. Pavlovic, R. Sharma, T.S. Huang, Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997)

    Article  Google Scholar 

  • A. Rakotomamonjy, Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms. Signal Process. 91(7), 1505–1526 (2011)

    Article  MATH  Google Scholar 

  • S. Reifinger, F. Wallhoff, M. Ablassmeier, T. Poitschke, and G. Rigoll. Static and dynamic hand-gesture recognition for augmented reality applications, in Proceedings of the 12th International Conference on Human-computer Interaction: Intelligent Multimodal Interaction Environments, 2007, pp.728–737

    Google Scholar 

  • Y. Ruiduo, S. Sarkar, and B. Loeding. Enhanced level building algorithm for the movement epenthesis problem in sign language recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8

    Google Scholar 

  • C. Shan, T. Tan, Y. Wei, Real-time hand tracking using a mean shift embedded particle filter. Pattern Recognit. 40(7), 1958–1970 (2007)

    Article  MATH  Google Scholar 

  • X. Shen, G. Hua, L. Williams, Y. Wu, Dynamic hand gesture recognition: an exemplar-based approach from motion divergence fields. Image Vis. Comput. 30(3), 227–235 (2012)

    Article  Google Scholar 

  • C. Sminchisescu, A. Kanaujia, Zhiguo Li, D. Metaxas. Conditional models for contextual human motion recognition, in Tenth IEEE International Conference on Computer Vision, volume 2, pp. 1808–1815, 2005

    Google Scholar 

  • H.I. Suk, B.K. Sin, S.W. Lee, Hand gesture recognition based on dynamic bayesian network framework. Pattern Recognit. 43(9), 3059–3072 (2010)

    Article  MATH  Google Scholar 

  • J. Weaver, T. Starner, A. Pentland, Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1371–1375 (1998)

    Article  Google Scholar 

  • J.A. Tropp, A.C. Gilbert, M.J. Strauss, Algorithms for simultaneous sparse approximation. part i: Greedy pursuit. Signal Process. 86(3), 572–588 (2006)

    Article  MATH  Google Scholar 

  • A. Vedaldi, B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms, http://www.vlfeat.org/, 2008

  • A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. Inf. Theor. IEEE Trans. 13(2), 260–269 (1967)

    Article  MATH  Google Scholar 

  • C. P. Vogler. American Sign Language Recognition: Reducing the Complexity of the Task with Phoneme-based Modeling and Parallel Hidden Markov Models. Ph.D. thesis, Doctoral dissertation, University of Pennsylvania, 2003

    Google Scholar 

  • J. Wan, Q. Ruan, G. An, W. Li. Gesture recognition based on hidden markov model from sparse representative observations, in IEEE 10th International Conference on Signal Processing (ICSP), 2012, pp. 1180–1183

    Google Scholar 

  • H. Wang, M.M. Ullah, A. Klaser, I. Laptev, C. Schmid, et al. Evaluation of local spatio-temporal features for action recognition, in Proceedings of British Machine Vision Conference, 2009

    Google Scholar 

  • J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong. Locality-constrained linear coding for image classification, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3360–3367

    Google Scholar 

  • S.B. Wang, A. Quattoni, L.P. Morency, D. Demirdjian, T. Darrell, Hidden conditional random fields for gesture recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 1521–1527 (2006)

    Google Scholar 

  • J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Yi Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227 (2009)

    Article  Google Scholar 

  • J. Yamato, Jun Ohya, and K. Ishii. Recognizing human action in time-sequential images using hidden markov model, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1992, pp. 379–385

    Google Scholar 

  • J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1794–1801

    Google Scholar 

  • M.H. Yang, N. Ahuja, M. Tabb, Extraction of 2d motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1061–1074 (2002)

    Article  Google Scholar 

  • D. Youtian, C. Feng, X. Wenli, Li. Yongbin. Recognizing interaction activities using dynamic bayesian network, in 18th International Conference on Pattern Recognition, volume 1, pp. 618–621, 2006

    Google Scholar 

  • Y. Zhu, G. Xu, D.J. Kriegman, A real-time approach to the spotting, representation, and recognition of hand gestures for human-computer interaction. Comput. Vis. Image Underst. 85(3), 189–208 (2002)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We appreciate ChaLearn providing the gesture database (http://chalearn.org) whose directors are gratefully acknowledged. We would like to thank Isabelle Guyon, ChaLearn, Berkeley, California, who gives us insightful comments and suggestions to improve our manuscripts. And we are grateful to editors and anonymous reviewers whose instructive suggestions have improved the quality of this paper. Besides, thanks to acknowledge support for this project from National Natural Science Foundation (60973060, 61003114, 61172128), National 973 plans project (2012CB316304), the fundamental research funds for the central universities (2011JBM020, 2011JBM022) and the program for Innovative Research Team in University of Ministry of Education of China (IRT 201206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Wan, J., Ruan, Q., Li, W., Deng, S. (2017). One-Shot Learning Gesture Recognition from RGB-D Data Using Bag of Features. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57021-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57020-4

  • Online ISBN: 978-3-319-57021-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics