One-Shot Learning Gesture Recognition from RGB-D Data Using Bag of Features

Wan, Jun; Ruan, Qiuqi; Li, Wei; Deng, Shuang

doi:10.1007/978-3-319-57021-1_11

Jun Wan⁷,
Qiuqi Ruan⁷,
Wei Li⁷ &
…
Shuang Deng⁸

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

2250 Accesses
2 Citations

Abstract

For one-shot learning gesture recognition, two important challenges are: how to extract distinctive features and how to learn a discriminative model from only one training sample per gesture class. For feature extraction, a new spatio-temporal feature representation called 3D enhanced motion scale-invariant feature transform is proposed, which fuses RGB-D data. Compared with other features, the new feature set is invariant to scale and rotation, and has more compact and richer visual representations. For learning a discriminative model, all features extracted from training samples are clustered with the k-means algorithm to learn a visual codebook. Then, unlike the traditional bag of feature models using vector quantization (VQ) to map each feature into a certain visual codeword, a sparse coding method named simulation orthogonal matching pursuit (SOMP) is applied and thus each feature can be represented by some linear combination of a small number of codewords. Compared with VQ, SOMP leads to a much lower reconstruction error and achieves better performance. The proposed approach has been evaluated on ChaLearn gesture database and the result has been ranked amongst the top best performing techniques on ChaLearn gesture challenge (round 2).

Editors: Isabelle Guyon and Vassilis Athitsos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The depth values are normalized to [0 255] in depth videos.
2.
MoSIFT and 3D MoSIFT have the same strategy to detect interest points.
3.
Here, \(\beta _{1}=0.005\) according to the reference (Ming et al. 2012).
4.
Here, \(\beta _{1}=\beta _{2}=0.005\).

References

G. Bradski, The OpenCV Library, Dr. Dobb’s Journal of Software Tools, 2000
Google Scholar
M. Brand, N. Oliver, A. Pentland. Coupled hidden markov models for complex action recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1997, pp 994–999
Google Scholar
C.C. Chang, C.J. Lin. Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1–27:27, 2011
Google Scholar
F.S. Chen, C.M. Fu, C.L. Huang, Hand gesture recognition using a real-time tracking method and hidden markov models. Image Vis. Comput. 21, 745–758 (2003)
Article Google Scholar
M. Chen, A. Hauptmann. Mosift: Recognizing human actions in surveillance videos. Technical Report, 2009
Google Scholar
H. Cooper, E.J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13, 2205–2231 (2012)
MATH Google Scholar
A. Corradini. Dynamic time warping for off-line recognition of a small gesture vocabulary, in IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 2001, pp. 82–89
Google Scholar
N.H. Dardas, N.D. Georganas, Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Meas. 60(11), 3592–3607 (2011)
Article Google Scholar
P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features, in Proceedings of IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72
Google Scholar
H.J. Escalante, I. Guyon. Principal motion: Pca-based reconstruction of motion histograms. Technical Memorandum, 2012
Google Scholar
L. Fei-Fei, P. Perona, A bayesian hierarchical model for learning natural scene categories. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 524–531 (2005)
Google Scholar
F. Flórez, J.M. García, J. García, A. Hernández. Hand gesture recognition following the dynamics of a topology-preserving network, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition, 2002, pp. 318–323
Google Scholar
P.-E. Forssen, D.G. Lowe. Shape descriptors for maximally stable extremal regions, in IEEE 11th International Conference on Computer Vision, 2007, pp. 1–8
Google Scholar
W.T. Freeman, M. Roth, Orientation histograms for hand gesture recognition. Proc. IEEE Int. Workshop Autom. Face Gesture Recognit. 12, 296–301 (1995)
Google Scholar
W. Gao, G. Fang, D. Zhao, Y. Chen, A chinese sign language recognition system based on sofm/srn/hmm. Pattern Recognit. 37(12), 2389–2402 (2004)
Article MATH Google Scholar
T. Guha, R.K. Ward, Learning sparse representations for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1576–1588 (2012)
Article Google Scholar
S. Guo, Z. Wang, Q. Ruan, Enhancing sparsity via \(\ell _{p}\) (0\(<\)p\(<\)1) minimization for robust face recognition. Neurocomputing 99, 592–602 (2013)
Article Google Scholar
I. Guyon, V. Athitsos, P. Jangyodsuk, B. Hamner, and H.J. Escalante. Chalearn gesture challenge: Design and first results, in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 1–6
Google Scholar
I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, B. Hamner. Results and analysis of the chalearn gesture challenge 2012. Technical Report, 2013
Google Scholar
C. Harris and M. Stephens. A combined corner and edge detector, in Proceedings of Alvey Vision Conference, volume 15, p. 50, 1988
Google Scholar
A. Hernández-Vela, M. A. Bautista, X. Perez-Sala, V. Ponce, X. Baró, O. Pujol, C. Angulo, S. Escalera. Bovdw: Bag-of-visual-and-depth-words for gesture recognition. 21st International Conference on Pattern Recognition (ICPR), 2012
Google Scholar
D. Kim, J. Song, D. Kim, Simultaneous gesture segmentation and recognition based on forward spotting accumulative hmms. Pattern Recognit. 40(11), 3012–3026 (2007)
Article MATH Google Scholar
I. Laptev, On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005)
Article Google Scholar
J.F. Lichtenauer, E.A. Hendriks, M.J.T. Reinders, Sign language recognition by combining statistical dtw and independent classification. Pattern Anal. Mach. Intell. IEEE Trans. 30(11), 2040–2046 (2008)
Article Google Scholar
Y. Linde, A. Buzo, R. Gray, An algorithm for vector quantizer design. Commun. IEEE Trans. 28(1), 84–95 (1980)
Article Google Scholar
D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
B.D. Lucas, T. Kanade, et al. An iterative image registration technique with an application to stereo vision, in Proceedings of the 7th International Joint Conference on Artificial Intelligence, 1981
Google Scholar
Y.M. Lui, Human gesture recognition on product manifolds. J. Mach. Learn. Res. 13, 3297–3321 (2012)
MathSciNet MATH Google Scholar
M.R. Malgireddy, I. Inwogu, V. Govindaraju. A temporal bayesian model for classifying, detecting and localizing activities in video sequences, in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 43–48
Google Scholar
A. Malima, E. Ozgur, M. Çetin. A fast algorithm for vision-based hand gesture recognition for robot control, in Proceedings of IEEE Signal Processing and Communications Applications, 2006, pp. 1–4
Google Scholar
Y. Ming, Q. Ruan, A.G. Hauptmann. Activity recognition from rgb-d camera with 3d local spatio-temporal features, in Proceedings of IEEE International Conference on Multimedia and Expo, 2012 pp. 344–349
Google Scholar
L.P. Morency, A. Quattoni, T. Darrell. Latent-dynamic discriminative models for continuous gesture recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8
Google Scholar
B.A. Olshausen, D.J. Field et al., Sparse coding with an overcomplete basis set: a strategy employed by vi? Vis. Res. 37(23), 3311–3326 (1997)
Article Google Scholar
V.I. Pavlovic, R. Sharma, T.S. Huang, Visual interpretation of hand gestures for human-computer interaction: a review. IEEE Trans. Pattern Anal. Mach. Intell. 19, 677–695 (1997)
Article Google Scholar
A. Rakotomamonjy, Surveying and comparing simultaneous sparse approximation (or group-lasso) algorithms. Signal Process. 91(7), 1505–1526 (2011)
Article MATH Google Scholar
S. Reifinger, F. Wallhoff, M. Ablassmeier, T. Poitschke, and G. Rigoll. Static and dynamic hand-gesture recognition for augmented reality applications, in Proceedings of the 12th International Conference on Human-computer Interaction: Intelligent Multimodal Interaction Environments, 2007, pp.728–737
Google Scholar
Y. Ruiduo, S. Sarkar, and B. Loeding. Enhanced level building algorithm for the movement epenthesis problem in sign language recognition, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8
Google Scholar
C. Shan, T. Tan, Y. Wei, Real-time hand tracking using a mean shift embedded particle filter. Pattern Recognit. 40(7), 1958–1970 (2007)
Article MATH Google Scholar
X. Shen, G. Hua, L. Williams, Y. Wu, Dynamic hand gesture recognition: an exemplar-based approach from motion divergence fields. Image Vis. Comput. 30(3), 227–235 (2012)
Article Google Scholar
C. Sminchisescu, A. Kanaujia, Zhiguo Li, D. Metaxas. Conditional models for contextual human motion recognition, in Tenth IEEE International Conference on Computer Vision, volume 2, pp. 1808–1815, 2005
Google Scholar
H.I. Suk, B.K. Sin, S.W. Lee, Hand gesture recognition based on dynamic bayesian network framework. Pattern Recognit. 43(9), 3059–3072 (2010)
Article MATH Google Scholar
J. Weaver, T. Starner, A. Pentland, Real-time american sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1371–1375 (1998)
Article Google Scholar
J.A. Tropp, A.C. Gilbert, M.J. Strauss, Algorithms for simultaneous sparse approximation. part i: Greedy pursuit. Signal Process. 86(3), 572–588 (2006)
Article MATH Google Scholar
A. Vedaldi, B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms, http://www.vlfeat.org/, 2008
A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. Inf. Theor. IEEE Trans. 13(2), 260–269 (1967)
Article MATH Google Scholar
C. P. Vogler. American Sign Language Recognition: Reducing the Complexity of the Task with Phoneme-based Modeling and Parallel Hidden Markov Models. Ph.D. thesis, Doctoral dissertation, University of Pennsylvania, 2003
Google Scholar
J. Wan, Q. Ruan, G. An, W. Li. Gesture recognition based on hidden markov model from sparse representative observations, in IEEE 10th International Conference on Signal Processing (ICSP), 2012, pp. 1180–1183
Google Scholar
H. Wang, M.M. Ullah, A. Klaser, I. Laptev, C. Schmid, et al. Evaluation of local spatio-temporal features for action recognition, in Proceedings of British Machine Vision Conference, 2009
Google Scholar
J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong. Locality-constrained linear coding for image classification, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3360–3367
Google Scholar
S.B. Wang, A. Quattoni, L.P. Morency, D. Demirdjian, T. Darrell, Hidden conditional random fields for gesture recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 1521–1527 (2006)
Google Scholar
J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Yi Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 210–227 (2009)
Article Google Scholar
J. Yamato, Jun Ohya, and K. Ishii. Recognizing human action in time-sequential images using hidden markov model, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1992, pp. 379–385
Google Scholar
J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1794–1801
Google Scholar
M.H. Yang, N. Ahuja, M. Tabb, Extraction of 2d motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1061–1074 (2002)
Article Google Scholar
D. Youtian, C. Feng, X. Wenli, Li. Yongbin. Recognizing interaction activities using dynamic bayesian network, in 18th International Conference on Pattern Recognition, volume 1, pp. 618–621, 2006
Google Scholar
Y. Zhu, G. Xu, D.J. Kriegman, A real-time approach to the spotting, representation, and recognition of hand gestures for human-computer interaction. Comput. Vis. Image Underst. 85(3), 189–208 (2002)
Article MATH Google Scholar

Download references

Acknowledgements

We appreciate ChaLearn providing the gesture database (http://chalearn.org) whose directors are gratefully acknowledged. We would like to thank Isabelle Guyon, ChaLearn, Berkeley, California, who gives us insightful comments and suggestions to improve our manuscripts. And we are grateful to editors and anonymous reviewers whose instructive suggestions have improved the quality of this paper. Besides, thanks to acknowledge support for this project from National Natural Science Foundation (60973060, 61003114, 61172128), National 973 plans project (2012CB316304), the fundamental research funds for the central universities (2011JBM020, 2011JBM022) and the program for Innovative Research Team in University of Ministry of Education of China (IRT 201206).

Author information

Authors and Affiliations

Institute of Information Science, Beijing Jiaotong University, Beijing, 100044, China
Jun Wan, Qiuqi Ruan & Wei Li
China Machinery TDI International Engineering Co., Ltd., Beijing, 100083, China
Shuang Deng

Authors

Jun Wan
View author publications
You can also search for this author in PubMed Google Scholar
Qiuqi Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Wan .

Editor information

Editors and Affiliations

University of Barcelona, Barcelona, Spain
Sergio Escalera
ChaLearn, Berkeley, California, USA
Isabelle Guyon
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
Vassilis Athitsos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wan, J., Ruan, Q., Li, W., Deng, S. (2017). One-Shot Learning Gesture Recognition from RGB-D Data Using Bag of Features. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-57021-1_11
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics