Challenges in Multi-modal Gesture Recognition

Escalera, Sergio; Athitsos, Vassilis; Guyon, Isabelle

doi:10.1007/978-3-319-57021-1_1

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

3320 Accesses
20 Citations

Abstract

This paper surveys the state of the art on multimodal gesture recognition and introduces the JMLR special topic on gesture recognition 2011–2015. We began right at the start of the Kinect\(^\mathrm{TM}\) revolution when inexpensive infrared cameras providing image depth recordings became available. We published papers using this technology and other more conventional methods, including regular video cameras, to record data, thus providing a good overview of uses of machine learning and computer vision using multimodal data in this area of application. Notably, we organized a series of challenges and made available several datasets we recorded for that purpose, including tens of thousands of videos, which are available to conduct further research. We also overview recent state of the art works on gesture recognition based on a proposed taxonomy for gesture recognition, discussing challenges and future lines of research.

Editor: Zhuowen Tu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For round 1: http://www.kaggle.com/c/GestureChallenge. For round 2: http://www.kaggle.com/c/GestureChallenge2.
2.
http://gesture.chalearn.org/.
3.
https://www.kaggle.com/c/multimodal-gesture-recognition.
4.
https://www.codalab.org/competitions/.
5.
http://code.opencv.org/projects/opencv/wiki/VisionChallenge.
6.
http://gesture.chalearn.org/.

References

S. Ali, M. Shah, Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32, 288–303 (2010)
Article Google Scholar
J. Alon, V. Athitsos, Q. Yuan, S. Sclaroff, A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 31(9), 1685–1699 (2009)
Article Google Scholar
M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele, Human pose estimation: new benchmark and state of the art analysis, in CCVPR (IEEE, 2014)
Google Scholar
J. Appenrodt, A. Al-Hamadi, M. Elmezain, B. Michaelis, Data gathering for gesture recognition systems based on mono color-, stereo color- and thermal cameras, in Proceedings of the 1st International Conference on Future Generation Information Technology, FGIT ’09, 2009, pp. 78–86. ISBN 978-3-642-10508-1
Google Scholar
V. Athitsos, S. Sclaroff, Estimating hand pose from a cluttered image. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 432–439 (2003)
Google Scholar
V. Athitsos, C. Neidle, S. Sclaroff, J. Nash, A. Stefan, Q. Yuan, A. Thangali, The American Sign Language lexicon video dataset, in IEEE Workshop on Computer Vision and Pattern Recognition for Human Communicative Behavior Analysis (CVPR4HB), 2008
Google Scholar
A. Avci, S. Bosch, M. Marin-Perianu, R. Marin-Perianu, P.J.M. Havinga, Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: a survey, in ARCS Workshops, ed. M. Beigl, F.J. Cazorla-Almeida, 2010, pp. 167–176. ISBN 978-3-8007-3222-7
Google Scholar
L. Baraldi, F. Paci, G. Serra, L. Benini, R. Cucchiara, Gesture recognition in ego-centric videos using dense trajectories and hand segmentation, in Proceedings of the 10th IEEE Embedded Vision Workshop (EVW), Columbus, Ohio, June 2014
Google Scholar
X. Baró, J. Gonzàlez, J. Fabian, M.A. Bautista, M. Oliu, H.J. Escalante, I. Guyon, S. Escalera, ChaLearn looking at people 2015 challenges: action spotting and cultural event recognition, in ChaLearn Looking at People, Computer Vision and Pattern Recognition, 2015
Google Scholar
B. Bauer, H. Hienz, K.-F. Kraiss, Video-based continuous sign language recognition using statistical methods, in International Conference on Pattern Recognition, 2000, pp. 2463–2466
Google Scholar
A.Y. Benbasat, J.A. Paradiso, Compact, configurable inertial gesture recognition, in CHI ’01: CHI ’01 Extended Abstracts on Human factors in Computing Systems (ACM Press, 2001), pp. 183–184. ISBN 1581133405
Google Scholar
S. Berlemont, G. Lefebvre, S. Duffner, C. Garcia, Siamese neural network based similarity metric for inertial gesture classification and rejection, in Automatic Face and Gesture Recognition, 2015
Google Scholar
V. Bloom, D. Makris, V. Argyriou. G3D: a gaming action dataset and real time action recognition evaluation framework, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 7–12
Google Scholar
A.F. Bobick, J.W. Davis, The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Article Google Scholar
L. Bourdev, J. Malik, Poselets: body part detectors trained using 3d human pose annotations, in ICCV (IEEE, 2009), pp. 1365–1372
Google Scholar
M. Brand, N. Oliver, A.P. Pentland, Coupled Hidden Markov Models for complex action recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1997, pp. 994–999
Google Scholar
M. Caon, Y. Yong, J. Tscherrig, E. Mugellini, O. Abou Khaled, Context-aware 3D gesture interaction based on multiple Kinects, in The First International Conference on Ambient Computing, Applications, Services and Technologies, 2011, pp. 7–12. ISBN 978-1-61208-170-0
Google Scholar
A. Chaudhary, J.L. Raheja, K. Das, S. Raheja, A survey on hand gesture recognition in context of soft computing. Adv. Comput. 133, 46–55 (2011)
Article Google Scholar
F.S. Chen, C.M. Fu, C.L. Huang, Hand gesture recognition using a real-time tracking method and Hidden Markov Models. Image Video Comput. 21(8), 745–758 (2003)
Article Google Scholar
M. Chen, G. AlRegib, B.-H. Juang, 6DMG: a new 6D motion gesture database, in Multimedia Systems Conference, 2012, pp. 83–88
Google Scholar
C. Conly, P. Doliotis, P. Jangyodsuk, R. Alonzo, V. Athitsos, Toward a 3D body part detection video dataset and hand tracking benchmark, in Pervasive Technologies Related to Assistive Environments (PETRA), 2013
Google Scholar
C. Conly, Z. Zhang, V. Athitsos, An integrated RGB-D system for looking up the meaning of signs, in Pervasive Technologies Related to Assistive Environments (PETRA), 2015
Google Scholar
H. Cooper, R. Bowden, Learning signs from subtitles: a weakly supervised approach to sign language recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 2568–2574
Google Scholar
H. Cooper, E.-J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13(7), 2205–2231 (2012)
MATH Google Scholar
A. Corradini, Dynamic time warping for off-line recognition of a small gesture vocabulary, in Recognition Analysis and Tracking of Faces and Gestures in Real-time Systems (RATFG-RTS), 2001, pp. 82–89
Google Scholar
Y. Cui, J. Weng, Appearance-based hand sign recognition from intensity image sequences. Comput. Vis. Image Underst. 78(2), 157–176 (2000)
Article Google Scholar
R. Cutler, M. Turk, View-based interpretation of real-time optical flow for gesture recognition, in Automatic Face and Gesture Recognition, 1998, pp. 416–421
Google Scholar
A. Czabke, J. Neuhauser, T.C. Lueth, Recognition of interactions with objects based on radio modules, in International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth), 2010
Google Scholar
T.J. Darrell, I.A. Essa, A.P. Pentland, Task-specific gesture analysis in real-time using interpolated views. IEEE Trans. Pattern Anal. Mach. Intell. 18(12), 1236–1242 (1996)
Article Google Scholar
M. de La Gorce, D.J. Fleet, N. Paragios, Model-based 3D hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)
Article Google Scholar
K.G. Derpanis, M. Sizintsev, K.J. Cannons, R.P. Wildes, Action spotting and recognition based on a spatiotemporal orientation analysis. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 527–540 (2013)
Article Google Scholar
P. Dreuw, T. Deselaers, D. Keysers, H. Ney, Modeling image variability in appearance-based gesture recognition, in ECCV Workshop on Statistical Methods in Multi-Image and Video Processing, 2006, pp. 7–18
Google Scholar
S. Duffner, S. Berlemont, G. Lefebvre, C. Garcia, 3D gesture classification with convolutional neural networks, in The 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
Google Scholar
S. Escalera, J. Gonzàlez, X. Baró, M. Reyes, I. Guyon, V. Athitsos, H.J. Escalante, L. Sigal, A. Argyros, C. Sminchisescu, R. Bowden, S. Sclaroff, Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary, in 15th ACM International Conference on Multimodal Interaction, 2013a, pp. 365–368
Google Scholar
S. Escalera, J. Gonzàlez, X. Baró, M. Reyes, O. Lopés, I. Guyon, V. Athitsos, H.J. Escalante, Multi-modal gesture recognition challenge 2013: Dataset and results, in ChaLearn Multi-Modal Gesture Recognition Grand Challenge and Workshop, 15th ACM International Conference on Multimodal Interaction, 2013b
Google Scholar
S. Escalera, X. Baro, J. Gonzalez, M. Bautista, M. Madadi, M. Reyes, V. Ponce, H.J. Escalante, J. Shotton, I. Guyon, ChaLearn looking at people challenge 2014: dataset and results, in ChaLearn Looking at People, European Conference on Computer Vision, 2014
Google Scholar
M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The PASCAL visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
S.R. Fanello, I. Gori, G. Metta, F. Odone, Keep it simple and sparse: real-time action recognition. J. Mach. Learn. Res. 14(9), 2617–2640 (2013)
Google Scholar
A. Farhadi, D.A. Forsyth, R. White, Transfer learning in sign language, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007
Google Scholar
V. Ferrari, M. Marin-Jimenez, A. Zisserman, Progressive search space reduction for human pose estimation, in CVPR, 2008
Google Scholar
S. Fothergill, H. Mentis, P. Kohli, S. Nowozin, Instructing people for training gestural interactive systems, in SIGCHI Conference on Human Factors in Computing Systems, 2012, pp. 1737–1746
Google Scholar
W.T. Freeman, M. Roth, Computer vision for computer games, in Automatic Face and Gesture Recognition, 1996, pp. 100–105
Google Scholar
N. Gillian, J.A. Paradiso, The gesture recognition toolkit. J. Mach. Learn. Res. 15, 3483–3487 (2014)
MathSciNet Google Scholar
A. Gorban, H. Idrees, Y.-G. Jiang, A. Roshan Zamir, I. Laptev, M. Shah, R. Sukthankar, THUMOS challenge: action recognition with a large number of classes (2015), http://www.thumos.info/
L. Gorelick, M. Blank, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Article Google Scholar
N. Goussies, S. Ubalde, M. Mejail, Transfer learning decision forests for gesture recognition. J. Mach. Learn. Res. 15, 3667–3690 (2014)
MathSciNet MATH Google Scholar
M. Gowing, A. Ahmadi, F. Destelle, D.S. Monaghan, N.E. O’Connor, K. Moran, Kinect vs. Low-Cost Inertial Sensing for Gesture Recognition. Lecture Notes in Computer Science, vol. 8325 (Springer, Berlin, 2014)
Google Scholar
I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, B. Hamner, Results and analysis of the ChaLearn gesture challenge 2012, in Advances in Depth Image Analysis and Applications, ed. by X. Jiang, O.R.P. Bellon, D. Goldgof, T. Oishi, Lecture Notes in Computer Science, vol. 7854 (Springer, Berlin, 2013), pp. 186–204. ISBN 978-3-642-40302-6. doi:10.1007/978-3-642-40303-3_19
I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25, 1929–1951 (2014)
Article Google Scholar
A. Hernandez-Vela, N. Zlateva, A. Marinov, M. Reyes, P. Radeva, D. Dimov, S. Escalera, Graph cuts optimization for multi-limb human segmentation in depth maps, in IEEE Computer Vision and Pattern Recognition Conference, 2012
Google Scholar
A. Hernandez-Vela, M.A. Bautista, X. Perez-Sala, V. Ponce, S. Escalera, X. Baro, O. Pujol, C. Angulo, Probability-based dynamic time warping and bag-of-visual-and-depth-words for human gesture recognition in RGB-D. Pattern Recogn. Lett. (2013). doi:10.1016/j.patrec.2013.09.009
Google Scholar
A. Hernandez-Vela, M. Reyes, V. Ponce, S. Escalera, Grabcut-based human segmentation in video sequences. Sensors 12(1), 15376–15393 (2013b)
Google Scholar
G. Hewes, Primate communication and the gestural origins of language. Curr. Antropol. 14, 5–24 (1973)
Article Google Scholar
N.A. Ibraheem, R.Z. Khan, Survey on various gesture recognition technologies and techniques. Int. J. Comput. Appl. 50(7), 38–44 (2012)
Google Scholar
C. Ionescu, D. Papava, V. Olaru, C. Sminchisescu, Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Article Google Scholar
M. Isard, A. Blake, CONDENSATION—conditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)
Article Google Scholar
H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, C. Schmid, Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2012)
Article Google Scholar
F. Jiang, S. Zhang, S. Wu, Y. Gao, D. Zhao, Multi-layered gesture recognition with Kinect. J. Mach. Learn. Res. 16, 227–254 (2015)
MathSciNet MATH Google Scholar
S. Johnson, M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, in BMVC, 2010. doi:10.5244/C.24.12
A. Joshi, S. Sclaroff, M. Betke, C. Monnier, A random forest approach to segmenting and classifying gestures, in Automatic Face and Gesture Recognition, 2015
Google Scholar
T. Kadir, R. Bowden, E. Ong, A. Zisserman, Minimal training, large lexicon, unconstrained sign language recognition, in British Machine Vision Conference (BMVC), vol. 2, 2004, pp. 939–948
Google Scholar
K. Kahol, P. Tripathi, S. Panchanathan, Automated gesture segmentation from dance sequences, in Automatic Face and Gesture Recognition, 2004, pp. 883–888
Google Scholar
H. Kang, C.W. Lee, K. Jung, Recognition-based gesture spotting in video games. Pattern Recognit. Lett. 25(15), 1701–1704 (2004)
Article Google Scholar
S. Kausar, M.Y. Javed, A survey on sign language recognition, Frontiers of Information Technology, 2011, pp. 95–98
Google Scholar
Y. Ke, R. Sukthankar, M. Hebert, Efficient visual event detection using volumetric features, in IEEE International Conference on Computer Vision (ICCV), vol. 1, 2005, pp. 166–173
Google Scholar
D. Kelly, J. McDonald, C. Markham, A person independent system for recognition of hand postures used in sign language. Pattern Recogn. Lett. 31(11), 1359–1368 (2010)
Article Google Scholar
C. Keskin, F. Kıraç, Y.E. Kara, L. Akarun, Hand pose estimation and hand shape classification using multi-layered randomized decision forests, in European Conference on Computer Vision (ECCV), 2012, pp. 852–863
Google Scholar
R.Z. Khan, N.A. Ibraheem, Survey on gesture recognition for hand image postures. Comput. Inf. Sci. 5(3), 110–121 (2012)
Google Scholar
T.-K. Kim, S.-F. Wong, R. Cipolla, Tensor canonical correlation analysis for action classification, in IEEE Conference on Computer Vision and Pattern Recognition, 2007
Google Scholar
D.K.H. Kohlsdorf, T.E. Starner, MAGIC summoning: towards automatic suggesting and testing of gestures with low probability of false positives during use. J. Mach. Learn. Res. 14(1), 209–242 (2013)
MathSciNet Google Scholar
M. Kolsch, M. Turk, Fast 2D hand tracking with flocks of features and multi-cue integration, in IEEE Workshop on Real-Time Vision for Human-Computer Interaction, 2004, pp. 158–165
Google Scholar
J. Konecny, M. Hagara, One-shot-learning gesture recognition using hog-hof features. J. Mach. Learn. Res. 15, 2513–2532 (2014), http://jmlr.org/papers/v15/konecny14a.html
Y. Kong, B. Satarboroujeni, Y. Fu, Hierarchical 3D kernel descriptors for action recognition using depth sequences, in Automatic Face and Gesture Recognition, 2015
Google Scholar
J.B. Kruskal, M. Liberman, The symmetric time warping algorithm: from continuous to discrete, in Time Warps, Addison-Wesley, 1983
Google Scholar
A. Kurakin, Z. Zhang, Z. Liu, A real time system for dynamic hand gesture recognition with a depth sensor, in European Signal Processing Conference, EUSIPCO, 2012, pp. 1975–1979
Google Scholar
J.D. Lafferty, A. McCallum, F.C.N. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in International Conference on Machine Learning (ICML), 2001, pp. 282–289
Google Scholar
H. Lane, R.J. Hoffmeister, B. Bahan, A Journey into the Deaf-World (DawnSign Press, San Diego, 1996)
Google Scholar
I. Laptev, On space-time interest points, Int. J. Comput. Vis. 64(2–3), 107–123, (2005). ISSN 0920-5691. doi:10.1007/s11263-005-1838-7
I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in CVPR, 2008, pp. 1–8
Google Scholar
J.J. LaViola Jr., A survey of hand posture and gesture recognition techniques and technology, Technical Report, Providence, RI, USA, 1999
Google Scholar
H.K. Lee, J.H. Kim, An HMM-based threshold model approach for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(10), 961–973 (1999)
Article Google Scholar
C. Li, K.M. Kitani, Pixel-level hand detection for ego-centric videos, in CVPR, 2013
Google Scholar
W. Li, Z. Zhang, Z. Liu, Action recognition based on a bag of 3D points, in CVPR Workshops, 2010, pp. 9–14
Google Scholar
H. Liang, J. Yuan, D. Thalmann, Z. Zhang, Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013)
Article Google Scholar
H. Liang, J. Yuan, D. Thalmann, Parsing the hand in depth images. IEEE Trans. Multimed. 16(5), 1241–1253 (2014)
Article Google Scholar
Z. Lin, Z. Jiang, L.S. Davis, Recognizing actions by shape-motion prototype trees, in IEEE International Conference on Computer Vision, ICCV, 2009, pp. 444–451
Google Scholar
K. Liu, C. Chen, R. Jafari, N. Kehtarnavaz, Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14(6), 1898–1903 (2014)
Article Google Scholar
L. Liu, L. Shao, Learning discriminative representations from RGB-D video data, in International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp. 1493–1500
Google Scholar
O. Lopes, M. Reyes, S. Escalera, J. Gonzàlez, Spherical blurred shape model for 3-D object and pose recognition: quantitative analysis and HCI applications in smart environments. IEEE T. Cybern. 44(12), 2379–2390 (2014)
Article Google Scholar
Y.M. Lui, Human gesture recognition on product manifolds. J. Mach. Learn. Res. 13(11), 3297–3321 (2012)
MathSciNet MATH Google Scholar
J. Luo, W. Wang, H. Qi, Spatio-temporal feature extraction and representation for RGB-D human action recognition, in PRL, 2014
Google Scholar
S. Ma, J. Zhang, N. Ikizler-Cinbis, S. Sclaroff, Action recognition and localization by hierarchical space-time segments, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013
Google Scholar
M.R. Malgireddy, I. Nwogu, V. Govindaraju, Language-motivated approaches to action recognition. J. Mach. Learn. Res. 14, 2189–2212 (2013). http://jmlr.org/papers/v14/malgireddy13a.html
J. Martin, V. Devin, J.L. Crowley, Active hand tracking, in Automatic Face and Gesture Recognition, 1998, pp. 573–578
Google Scholar
A. Martinez, S. Du, A model of the perception of facial expressions of emotion by humans: research overview and perspectives. J. Mach. Learn. Res. 13(5), 1589–1608 (2012)
MathSciNet Google Scholar
D. McNeil, How language began, gesture and speech in human evolution, (Cambridge editorial, 2012)
Google Scholar
S. Mitra, T. Acharya, Gesture recognition: a survey. Trans. Syst. Man Cybern. Part C 37(3), 311–324, 2007. ISSN 1094-6977
Google Scholar
Z. Mo, U. Neumann, Real-time hand pose recognition using low-resolution depth images, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 1499–1505
Google Scholar
B. Moghaddam, A. Pentland, Probabilistic visual learning for object detection, Technical Report 326, MIT, June 1995
Google Scholar
P. Molchanov, S. Gupta, K. Kim, K. Pulli, Multi-sensor system for driverś hand-gesture recognition, in Automatic Face and Gesture Recognition, 2015
Google Scholar
J. Nagi, F. Ducatelle, G.A. Di Caro, D.C. Ciresan, U. Meier, A. Giusti, F. Nagi, J. Schmidhuber, L.M. Gambardella. Max-pooling convolutional neural networks for vision-based hand gesture recognition, in ICSIPA (IEEE, 2011), pp. 342–347. ISBN 978-1-4577-0243-3
Google Scholar
S. Nayak, S. Sarkar, B. Loeding, Unsupervised modeling of signs embedded in continuous sentences, in IEEE Workshop on Vision for Human-Computer Interaction, 2005
Google Scholar
S. Nayak, K. Duncan, S. Sarkar, B. Loeding, Finding recurrent patterns from continuous sign language sentences for automated extraction of signs. J. Mach. Learn. Res. 13(9), 2589–2615 (2012)
MathSciNet MATH Google Scholar
C. Neidle, A. Thangali, S. Sclaroff, Challenges in development of the American Sign Language lexicon video dataset (ASLLVD) corpus, in Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, 2012
Google Scholar
N. Neverova, C. Wolf, G.W. Taylor, F. Nebout, Hand segmentation with structured convolutional learning, in ACCV, 2014a
Google Scholar
N. Neverova, C. Wolf, G.W. Taylor, F. Nebout, Multi-scale deep learning for gesture detection and localization, in ChaLearn Looking at People, European Conference on Computer Vision, 2014b
Google Scholar
L. Nguyen-Dinh, A. Calatroni, G. Troster, Robust online gesture recognition with crowdsourced annotations. J. Mach. Learn. Res. 15, 3187–3220 (2014)
Google Scholar
E. Ohn-Bar, M.M. Trivedi, Hand gesture recognition in real-time for automotive interfaces: a multimodal vision-based approach and evaluations, in IEEE Transactions on Intelligent Transportation Systems, 2014
Google Scholar
I. Oikonomidis, N. Kyriazis, A.A. Argyros, Markerless and efficient 26-DOF hand pose recovery, in Asian Conference on Computer Vision (ACCV), 2010
Google Scholar
I. Oikonomidis, N. Kyriazis, A.A. Argyros, Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints, in IEEE International Conference on Computer Vision (ICCV), 2011, pp. 2088–2095
Google Scholar
K. Oka, Y. Sato, H. Koike, Real-time fingertip tracking and gesture recognition. IEEE Comput. Graphics Appl. 22(6), 64–71 (2002)
Article Google Scholar
R. Oka, Spotting method for classification of real world data. Comput. J. 41(8), 559–565 (1998)
Article MATH Google Scholar
E.J. Ong, R. Bowden, A boosted classifier tree for hand shape detection, in Face and Gesture Recognition, 2004, pp. 889–894
Google Scholar
O. Oreifej, Z. Liu, HON4D: histogram of oriented 4D normals for activity recognition from depth sequences, in CVPR, 2013, pp. 716–723
Google Scholar
A. Pardo, A. Clapes, S. Escalera, O. Pujol, Actions in context: system for people with dementia, in 2nd International Workshop on Citizen Sensor Networks (Citisen2013) at the European Conference on Complex Systems (ECCS’13), 2013
Google Scholar
X. Peng, L. Wang, Z. Cai, Y. Qiao, Action and gesture temporal spotting with super vector representation, in Computer Vision—ECCV 2014 Workshops, ed. by L. Agapito, M.M. Bronstein, C. Rother, Lecture Notes in Computer Science, vol. 8925 (Springer, Berlin, 2015), pp. 518–527. ISBN 978-3-319-16177-8. doi:10.1007/978-3-319-16178-5_36
A. Pieropan, G. Salvi, K.Pauwels, H. Kjellstrom, Audio-visual classification and detection of human manipulation actions, in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014
Google Scholar
V. Pitsikalis, A. Katsamanis, S. Theodorakis, P. Maragos, Multimodal gesture recognition via multiple hypotheses rescoring. J. Mach. Learn. Res. (2014)
Google Scholar
N. Pugeault, R. Bowden, Spelling it out: real-time ASL fingerspelling recognition, in ICCV Workshops, 2011, pp. 1114–1119
Google Scholar
A. Quattoni, S.B. Wang, L.-P. Morency, M. Collins, T. Darrell, Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1848–1852 (2007)
Article Google Scholar
D. Ramanan, Learning to parse images of articulated bodies, in NIPS, 2006, pp. 1129–1136
Google Scholar
J.M. Rehg, T. Kanade, Model-based tracking of self-occluding articulated objects, in IEEE International Conference on Computer Vision (ICCV), 1995, pp. 612–617
Google Scholar
Z. Ren, J. Meng, J. Yuan, Z. Zhang, Robust hand gesture recognition with Kinect sensor, in ACM International Conference on Multimedia, 2011a, pp. 759–760
Google Scholar
Z. Ren, J. Yuan, Z. Zhang, Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera, in ACM International Conference on Multimedia, 2011b, pp. 1093–1096
Google Scholar
Z. Ren, J. Yuan, J. Meng, Z. Zhang, Robust part-based hand gesture recognition using Kinect sensor. IEEE Trans. Multimed. 15(5), 1110–1120 (2013)
Article Google Scholar
A. Roussos, S. Theodorakis, V. Pitsikalis, P. Maragos, Dynamic affine-invariant shape-appearance handshape features and classification in sign language videos. J. Mach. Learn. Res. 14(6), 1627–1663 (2013)
MathSciNet Google Scholar
S. Ruffieux, D. Lalanne, E. Mugellini. ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI, in Proceedings of the 15th ACM on International Conference on Multimodal Interaction, 2013, pp. 483–488
Google Scholar
A. Sadeghipour, L.-P. Morency, S. Kopp, Gesture-based object recognition using histograms of guiding strokes, in British Machine Vision Conference, 2012, pp. 44.1–44.11
Google Scholar
D. Sánchez, M.A. Bautista, S. Escalera, HuPBA 8k+: dataset and ECOC-graphcut based segmentation of human limbs. Neurocomputing, 2014
Google Scholar
B. Sapp, B. Taskar, Modec: multimodal decomposable models for human pose estimation, in CVPR, IEEE, 2013
Google Scholar
Y. Sato, T. Kobayashi, Extension of Hidden Markov Models to deal with multiple candidates of observations and its application to mobile-robot-oriented gesture recognition, in International Conference on Pattern Recognition (ICPR), vol, II, 2002, pp. 515–519
Google Scholar
J.D. Schein, At Home Among Strangers (Gallaudet U. Press, Washington, DC, 1989)
Google Scholar
C. Schuldt, I. Laptev, B. Caputo, Recognizing human actions: a local SVM approach, in ICPR, vol. 3, 2004, pp. 32–36
Google Scholar
N. Shapovalova, W. Gong., M. Pedersoli, F.X. Roca, J. Gonzalez, On importance of interactions and context in human action recognition, in Pattern Recognition and Image Analysis, 2011, pp. 58–66
Google Scholar
J. Shotton, A.W. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake, Real-time human pose recognition in parts from single depth images, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1297–1304
Google Scholar
L. Sigal, A.O. Balan, M.J. Black, HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010)
Article Google Scholar
C. Sminchisescu, A. Kanaujia, D. Metaxas, Conditional models for contextual human motion recognition. Comput. Vis. Image Underst. 104, 210–220 (2006)
Article Google Scholar
Y. Song, D. Demirdjian, R. Davis, Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database, in Automatic Face and Gesture Recognition, 2011, pp. 500–506
Google Scholar
T. Starner, A. Pentland, Real-time American Sign Language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)
Article Google Scholar
N. Stefanov, A. Galata, R. Hubbold, Real-time hand tracking with variable-length Markov Models of behaviour, in Real Time Vision for Human-Computer Interaction, 2005
Google Scholar
B. Stenger, A. Thayananthan, P.H.S. Torr, R. Cipolla, Filtering using a tree-based estimator, in IEEE International Conference on Computer Vision (ICCV), 2003, pp. 1063–1070
Google Scholar
E. Sudderth, M. Mandel, W. Freeman, A. Willsky, Distributed occlusion reasoning for tracking with nonparametric belief propagation, in Neural Information Processing Systems (NIPS), 2004
Google Scholar
D. Tran, D. Forsyth, Improved human parsing with a full relational model, in ECCV (IEEE, 2010), pp. 227–240
Google Scholar
J. Triesch, C. von der Malsburg, A system for person-independent hand posture recognition against complex backgrounds. IEEE Trans. Pattern Anal. Mach. Intell. 23(12), 1449–1453 (2001)
Article Google Scholar
J. Triesch, C. von der Malsburg, Classification of hand postures against complex backgrounds using elastic graph matching. Image Vis. Comput. 20(13–14), 937–943 (2002)
Article Google Scholar
M. Van den Bergh, E. Koller-Meier, L. Van Gool, Real-time body pose recognition using 2D or 3D haarlets. Int. J. Comput. Vis. 83(1), 72–84 (2009)
Article Google Scholar
P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2001, pp. 511–518
Google Scholar
C. Vogler, D Metaxas, Parallel Hidden Markov Models for American Sign Language recognition, In IEEE International Conference on Computer Vision (ICCV), 1999, pp. 116–122
Google Scholar
J. Wan, Q. Ruan, W. Li, S. Deng, One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14, 2549–2582 (2013). http://jmlr.org/papers/v14/wan13a.html
H. Wang, C. Schmid, Action recognition with improved trajectories, in IEEE International Conference on Computer Vision, 2013
Google Scholar
H. Wang, A. Stefan, S. Moradi, V. Athitsos, C. Neidle, F. Kamangar, A system for large vocabulary sign search, in Workshop on Sign, Gesture and Activity (SGA), 2010
Google Scholar
H. Wang, X. Chai, Y. Zhou, X. Chen, Fast sign language recognition benefited from low rank approximation, in Automatic Face and Gesture Recognition, 2015a
Google Scholar
J. Wang, Z. Liu, Y. Wu, J. Yuan, Learning actionlet ensemble for 3D human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 914–927 (2014)
Article Google Scholar
R.Y. Wang, J. Popović, Real-time hand-tracking with a color glove. ACM Trans. Graph. 28(3), 63:1–63:8 (2009)
Google Scholar
Y. Wang, D. Tran, Z. Liao, D. Forsyth, Discriminative hierarchical part-based models for human parsing and action recognition. J. Mach. Learn. Res. 13(10), 3075–3102 (2012)
MathSciNet MATH Google Scholar
Z. Wang, L. Wang, W. Du, Q. Yu, Action spotting system using Fisher vector, in CVPR ChaLearn Looking at People Workshop 2015, 2015
Google Scholar
M. Wilhelm, A generic context aware gesture recognition framework for smart environments, in PerCom Workshops, 2012, pp. 536–537
Google Scholar
A.D. Wilson, A.F. Bobick, Parametric Hidden Markov Models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(9), 884–900 (1999)
Article Google Scholar
J. Wu, J. Cheng, Bayesian co-boosting for multi-modal gesture recognition. J. Mach. Learn. Res. 15(1), 3013–3036 (2014)
MATH Google Scholar
Y. Wu, T.S. Huang, View-independent recognition of hand postures, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2000, pp. 88–94
Google Scholar
Y. Xiao, Z. Zhang, A. Beck, J. Yuan, D. Thalmann, Human-robot interaction by understanding upper body gestures. Presence 23(2), 133–154 (2014)
Article Google Scholar
H.D. Yang, S. Sclaroff, S.W. Lee, Sign language spotting with a threshold model based on conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1264–1277 (2009)
Article Google Scholar
M.H. Yang, N. Ahuja, M. Tabb, Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(8), 1061–1074 (2002)
Article Google Scholar
W. Yang, Y. Wang, G. Mori, Recognizing human actions from still images with latent poses, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2030–2037
Google Scholar
X. Yang, Y. Tian, Super normal vector for activity recognition using depth sequences, in CVPR, 2014a
Google Scholar
X. Yang, Y. Tian, Action recognition using super sparse coding vector with spatio-temporal awareness, in ECCV, 2014b
Google Scholar
G. Yao, H. Yao, X. Liu, F. Jiang, Real time large vocabulary continuous sign language recognition based on OP/Viterbi algorithm, International Conference on Pattern Recognition, vol. 3, 2006, pp. 312–315
Google Scholar
G. Yu, Z. Liu, J. Yuan, Discriminative orderlet mining for real-time recognition of human-object interaction, in ACCV, 2014
Google Scholar
J. Yuan, Z. Liu, Y. Wu, Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728–1743 (2011)
Article Google Scholar
Z. Zafrulla, H. Brashear, T. Starner, H. Hamilton, P. Presti, American Sign Language recognition with the Kinect, in Proceedings of the 13th International Conference on Multimodal Interfaces, ICMI ’11, ACM, New York, NY, USA, 2011, pp. 279–286. ISBN 978-1-4503-0641-6. 10.1145/2070481.2070532. doi:10.1145/2070481.2070532
M. Zanfir, M. Leordeanu, C. Sminchisescu, The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection, in ICCV, 2013
Google Scholar
J. Zieren, K.-F. Kraiss, Robust person-independent visual sign language recognition. Iberian Conf. Pattern Recognit. Image Anal. 1, 520–528 (2005)
Google Scholar

Download references

Acknowledgements

This work has been partially supported by ChaLearn Challenges in Machine Learning http://chalearn.org, the Human Pose Recovery and Behavior Analysis Group (HuPBA research group: http://www.maia.ub.es/~sergio/), the Pascal2 network of excellence, NSF grants 1128296, 1059235, 1055062, 1338118, 1035913, 0923494, and Spanish project TIN2013-43478-P. Our sponsors include Microsoft and Texas Instrument who donated prizes and provided technical support. The challenges were hosted by Kaggle.com and Coralab.org who are gratefully acknowledged. We thank our co-organizers of ChaLearn gesture and action recognition challenges: Miguel Reyes, Jordi Gonzalez, Xavier Baro, Jamie Shotton, Victor Ponce, Miguel Angel Bautista, and Hugo Jair Escalante.

Author information

Authors and Affiliations

Computer Vision Center UAB and University of Barcelona, Barcelona, Spain
Sergio Escalera
University of Texas, Arlington, USA
Vassilis Athitsos
ChaLearn, Berkeley, CA, USA
Isabelle Guyon

Authors

Sergio Escalera
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis Athitsos
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Guyon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Escalera .

Editor information

Editors and Affiliations

University of Barcelona, Barcelona, Spain
Sergio Escalera
ChaLearn, Berkeley, California, USA
Isabelle Guyon
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
Vassilis Athitsos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Escalera, S., Athitsos, V., Guyon, I. (2017). Challenges in Multi-modal Gesture Recognition. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-57021-1_1
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics