Skip to main content

Challenges in Multi-modal Gesture Recognition

  • Chapter
  • First Online:
Gesture Recognition

Abstract

This paper surveys the state of the art on multimodal gesture recognition and introduces the JMLR special topic on gesture recognition 2011–2015. We began right at the start of the Kinect\(^\mathrm{TM}\) revolution when inexpensive infrared cameras providing image depth recordings became available. We published papers using this technology and other more conventional methods, including regular video cameras, to record data, thus providing a good overview of uses of machine learning and computer vision using multimodal data in this area of application. Notably, we organized a series of challenges and made available several datasets we recorded for that purpose, including tens of thousands of videos, which are available to conduct further research. We also overview recent state of the art works on gesture recognition based on a proposed taxonomy for gesture recognition, discussing challenges and future lines of research.

Editor: Zhuowen Tu.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For round 1: http://www.kaggle.com/c/GestureChallenge. For round 2: http://www.kaggle.com/c/GestureChallenge2.

  2. 2.

    http://gesture.chalearn.org/.

  3. 3.

    https://www.kaggle.com/c/multimodal-gesture-recognition.

  4. 4.

    https://www.codalab.org/competitions/.

  5. 5.

    http://code.opencv.org/projects/opencv/wiki/VisionChallenge.

  6. 6.

    http://gesture.chalearn.org/.

References

  • S. Ali, M. Shah, Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32, 288–303 (2010)

    Article  Google Scholar 

  • J. Alon, V. Athitsos, Q. Yuan, S. Sclaroff, A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 31(9), 1685–1699 (2009)

    Article  Google Scholar 

  • M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele, Human pose estimation: new benchmark and state of the art analysis, in CCVPR (IEEE, 2014)

    Google Scholar 

  • J. Appenrodt, A. Al-Hamadi, M. Elmezain, B. Michaelis, Data gathering for gesture recognition systems based on mono color-, stereo color- and thermal cameras, in Proceedings of the 1st International Conference on Future Generation Information Technology, FGIT ’09, 2009, pp. 78–86. ISBN 978-3-642-10508-1

    Google Scholar 

  • V. Athitsos, S. Sclaroff, Estimating hand pose from a cluttered image. IEEE Conf. Comput. Vis. Pattern Recognit. 2, 432–439 (2003)

    Google Scholar 

  • V. Athitsos, C. Neidle, S. Sclaroff, J. Nash, A. Stefan, Q. Yuan, A. Thangali, The American Sign Language lexicon video dataset, in IEEE Workshop on Computer Vision and Pattern Recognition for Human Communicative Behavior Analysis (CVPR4HB), 2008

    Google Scholar 

  • A. Avci, S. Bosch, M. Marin-Perianu, R. Marin-Perianu, P.J.M. Havinga, Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: a survey, in ARCS Workshops, ed. M. Beigl, F.J. Cazorla-Almeida, 2010, pp. 167–176. ISBN 978-3-8007-3222-7

    Google Scholar 

  • L. Baraldi, F. Paci, G. Serra, L. Benini, R. Cucchiara, Gesture recognition in ego-centric videos using dense trajectories and hand segmentation, in Proceedings of the 10th IEEE Embedded Vision Workshop (EVW), Columbus, Ohio, June 2014

    Google Scholar 

  • X. Baró, J. Gonzàlez, J. Fabian, M.A. Bautista, M. Oliu, H.J. Escalante, I. Guyon, S. Escalera, ChaLearn looking at people 2015 challenges: action spotting and cultural event recognition, in ChaLearn Looking at People, Computer Vision and Pattern Recognition, 2015

    Google Scholar 

  • B. Bauer, H. Hienz, K.-F. Kraiss, Video-based continuous sign language recognition using statistical methods, in International Conference on Pattern Recognition, 2000, pp. 2463–2466

    Google Scholar 

  • A.Y. Benbasat, J.A. Paradiso, Compact, configurable inertial gesture recognition, in CHI ’01: CHI ’01 Extended Abstracts on Human factors in Computing Systems (ACM Press, 2001), pp. 183–184. ISBN 1581133405

    Google Scholar 

  • S. Berlemont, G. Lefebvre, S. Duffner, C. Garcia, Siamese neural network based similarity metric for inertial gesture classification and rejection, in Automatic Face and Gesture Recognition, 2015

    Google Scholar 

  • V. Bloom, D. Makris, V. Argyriou. G3D: a gaming action dataset and real time action recognition evaluation framework, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 7–12

    Google Scholar 

  • A.F. Bobick, J.W. Davis, The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  • L. Bourdev, J. Malik, Poselets: body part detectors trained using 3d human pose annotations, in ICCV (IEEE, 2009), pp. 1365–1372

    Google Scholar 

  • M. Brand, N. Oliver, A.P. Pentland, Coupled Hidden Markov Models for complex action recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1997, pp. 994–999

    Google Scholar 

  • M. Caon, Y. Yong, J. Tscherrig, E. Mugellini, O. Abou Khaled, Context-aware 3D gesture interaction based on multiple Kinects, in The First International Conference on Ambient Computing, Applications, Services and Technologies, 2011, pp. 7–12. ISBN 978-1-61208-170-0

    Google Scholar 

  • A. Chaudhary, J.L. Raheja, K. Das, S. Raheja, A survey on hand gesture recognition in context of soft computing. Adv. Comput. 133, 46–55 (2011)

    Article  Google Scholar 

  • F.S. Chen, C.M. Fu, C.L. Huang, Hand gesture recognition using a real-time tracking method and Hidden Markov Models. Image Video Comput. 21(8), 745–758 (2003)

    Article  Google Scholar 

  • M. Chen, G. AlRegib, B.-H. Juang, 6DMG: a new 6D motion gesture database, in Multimedia Systems Conference, 2012, pp. 83–88

    Google Scholar 

  • C. Conly, P. Doliotis, P. Jangyodsuk, R. Alonzo, V. Athitsos, Toward a 3D body part detection video dataset and hand tracking benchmark, in Pervasive Technologies Related to Assistive Environments (PETRA), 2013

    Google Scholar 

  • C. Conly, Z. Zhang, V. Athitsos, An integrated RGB-D system for looking up the meaning of signs, in Pervasive Technologies Related to Assistive Environments (PETRA), 2015

    Google Scholar 

  • H. Cooper, R. Bowden, Learning signs from subtitles: a weakly supervised approach to sign language recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 2568–2574

    Google Scholar 

  • H. Cooper, E.-J. Ong, N. Pugeault, R. Bowden, Sign language recognition using sub-units. J. Mach. Learn. Res. 13(7), 2205–2231 (2012)

    MATH  Google Scholar 

  • A. Corradini, Dynamic time warping for off-line recognition of a small gesture vocabulary, in Recognition Analysis and Tracking of Faces and Gestures in Real-time Systems (RATFG-RTS), 2001, pp. 82–89

    Google Scholar 

  • Y. Cui, J. Weng, Appearance-based hand sign recognition from intensity image sequences. Comput. Vis. Image Underst. 78(2), 157–176 (2000)

    Article  Google Scholar 

  • R. Cutler, M. Turk, View-based interpretation of real-time optical flow for gesture recognition, in Automatic Face and Gesture Recognition, 1998, pp. 416–421

    Google Scholar 

  • A. Czabke, J. Neuhauser, T.C. Lueth, Recognition of interactions with objects based on radio modules, in International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth), 2010

    Google Scholar 

  • T.J. Darrell, I.A. Essa, A.P. Pentland, Task-specific gesture analysis in real-time using interpolated views. IEEE Trans. Pattern Anal. Mach. Intell. 18(12), 1236–1242 (1996)

    Article  Google Scholar 

  • M. de La Gorce, D.J. Fleet, N. Paragios, Model-based 3D hand pose estimation from monocular video. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1793–1805 (2011)

    Article  Google Scholar 

  • K.G. Derpanis, M. Sizintsev, K.J. Cannons, R.P. Wildes, Action spotting and recognition based on a spatiotemporal orientation analysis. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 527–540 (2013)

    Article  Google Scholar 

  • P. Dreuw, T. Deselaers, D. Keysers, H. Ney, Modeling image variability in appearance-based gesture recognition, in ECCV Workshop on Statistical Methods in Multi-Image and Video Processing, 2006, pp. 7–18

    Google Scholar 

  • S. Duffner, S. Berlemont, G. Lefebvre, C. Garcia, 3D gesture classification with convolutional neural networks, in The 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014

    Google Scholar 

  • S. Escalera, J. Gonzàlez, X. Baró, M. Reyes, I. Guyon, V. Athitsos, H.J. Escalante, L. Sigal, A. Argyros, C. Sminchisescu, R. Bowden, S. Sclaroff, Chalearn multi-modal gesture recognition 2013: grand challenge and workshop summary, in 15th ACM International Conference on Multimodal Interaction, 2013a, pp. 365–368

    Google Scholar 

  • S. Escalera, J. Gonzàlez, X. Baró, M. Reyes, O. Lopés, I. Guyon, V. Athitsos, H.J. Escalante, Multi-modal gesture recognition challenge 2013: Dataset and results, in ChaLearn Multi-Modal Gesture Recognition Grand Challenge and Workshop, 15th ACM International Conference on Multimodal Interaction, 2013b

    Google Scholar 

  • S. Escalera, X. Baro, J. Gonzalez, M. Bautista, M. Madadi, M. Reyes, V. Ponce, H.J. Escalante, J. Shotton, I. Guyon, ChaLearn looking at people challenge 2014: dataset and results, in ChaLearn Looking at People, European Conference on Computer Vision, 2014

    Google Scholar 

  • M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The PASCAL visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)

    Article  Google Scholar 

  • S.R. Fanello, I. Gori, G. Metta, F. Odone, Keep it simple and sparse: real-time action recognition. J. Mach. Learn. Res. 14(9), 2617–2640 (2013)

    Google Scholar 

  • A. Farhadi, D.A. Forsyth, R. White, Transfer learning in sign language, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007

    Google Scholar 

  • V. Ferrari, M. Marin-Jimenez, A. Zisserman, Progressive search space reduction for human pose estimation, in CVPR, 2008

    Google Scholar 

  • S. Fothergill, H. Mentis, P. Kohli, S. Nowozin, Instructing people for training gestural interactive systems, in SIGCHI Conference on Human Factors in Computing Systems, 2012, pp. 1737–1746

    Google Scholar 

  • W.T. Freeman, M. Roth, Computer vision for computer games, in Automatic Face and Gesture Recognition, 1996, pp. 100–105

    Google Scholar 

  • N. Gillian, J.A. Paradiso, The gesture recognition toolkit. J. Mach. Learn. Res. 15, 3483–3487 (2014)

    MathSciNet  Google Scholar 

  • A. Gorban, H. Idrees, Y.-G. Jiang, A. Roshan Zamir, I. Laptev, M. Shah, R. Sukthankar, THUMOS challenge: action recognition with a large number of classes (2015), http://www.thumos.info/

  • L. Gorelick, M. Blank, E. Shechtman, M. Irani, R. Basri, Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  • N. Goussies, S. Ubalde, M. Mejail, Transfer learning decision forests for gesture recognition. J. Mach. Learn. Res. 15, 3667–3690 (2014)

    MathSciNet  MATH  Google Scholar 

  • M. Gowing, A. Ahmadi, F. Destelle, D.S. Monaghan, N.E. O’Connor, K. Moran, Kinect vs. Low-Cost Inertial Sensing for Gesture Recognition. Lecture Notes in Computer Science, vol. 8325 (Springer, Berlin, 2014)

    Google Scholar 

  • I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, B. Hamner, Results and analysis of the ChaLearn gesture challenge 2012, in Advances in Depth Image Analysis and Applications, ed. by X. Jiang, O.R.P. Bellon, D. Goldgof, T. Oishi, Lecture Notes in Computer Science, vol. 7854 (Springer, Berlin, 2013), pp. 186–204. ISBN 978-3-642-40302-6. doi:10.1007/978-3-642-40303-3_19

  • I. Guyon, V. Athitsos, P. Jangyodsuk, H.J. Escalante, The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25, 1929–1951 (2014)

    Article  Google Scholar 

  • A. Hernandez-Vela, N. Zlateva, A. Marinov, M. Reyes, P. Radeva, D. Dimov, S. Escalera, Graph cuts optimization for multi-limb human segmentation in depth maps, in IEEE Computer Vision and Pattern Recognition Conference, 2012

    Google Scholar 

  • A. Hernandez-Vela, M.A. Bautista, X. Perez-Sala, V. Ponce, S. Escalera, X. Baro, O. Pujol, C. Angulo, Probability-based dynamic time warping and bag-of-visual-and-depth-words for human gesture recognition in RGB-D. Pattern Recogn. Lett. (2013). doi:10.1016/j.patrec.2013.09.009

    Google Scholar 

  • A. Hernandez-Vela, M. Reyes, V. Ponce, S. Escalera, Grabcut-based human segmentation in video sequences. Sensors 12(1), 15376–15393 (2013b)

    Google Scholar 

  • G. Hewes, Primate communication and the gestural origins of language. Curr. Antropol. 14, 5–24 (1973)

    Article  Google Scholar 

  • N.A. Ibraheem, R.Z. Khan, Survey on various gesture recognition technologies and techniques. Int. J. Comput. Appl. 50(7), 38–44 (2012)

    Google Scholar 

  • C. Ionescu, D. Papava, V. Olaru, C. Sminchisescu, Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)

    Article  Google Scholar 

  • M. Isard, A. Blake, CONDENSATION—conditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)

    Article  Google Scholar 

  • H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, C. Schmid, Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1704–1716 (2012)

    Article  Google Scholar 

  • F. Jiang, S. Zhang, S. Wu, Y. Gao, D. Zhao, Multi-layered gesture recognition with Kinect. J. Mach. Learn. Res. 16, 227–254 (2015)

    MathSciNet  MATH  Google Scholar 

  • S. Johnson, M. Everingham, Clustered pose and nonlinear appearance models for human pose estimation, in BMVC, 2010. doi:10.5244/C.24.12

  • A. Joshi, S. Sclaroff, M. Betke, C. Monnier, A random forest approach to segmenting and classifying gestures, in Automatic Face and Gesture Recognition, 2015

    Google Scholar 

  • T. Kadir, R. Bowden, E. Ong, A. Zisserman, Minimal training, large lexicon, unconstrained sign language recognition, in British Machine Vision Conference (BMVC), vol. 2, 2004, pp. 939–948

    Google Scholar 

  • K. Kahol, P. Tripathi, S. Panchanathan, Automated gesture segmentation from dance sequences, in Automatic Face and Gesture Recognition, 2004, pp. 883–888

    Google Scholar 

  • H. Kang, C.W. Lee, K. Jung, Recognition-based gesture spotting in video games. Pattern Recognit. Lett. 25(15), 1701–1704 (2004)

    Article  Google Scholar 

  • S. Kausar, M.Y. Javed, A survey on sign language recognition, Frontiers of Information Technology, 2011, pp. 95–98

    Google Scholar 

  • Y. Ke, R. Sukthankar, M. Hebert, Efficient visual event detection using volumetric features, in IEEE International Conference on Computer Vision (ICCV), vol. 1, 2005, pp. 166–173

    Google Scholar 

  • D. Kelly, J. McDonald, C. Markham, A person independent system for recognition of hand postures used in sign language. Pattern Recogn. Lett. 31(11), 1359–1368 (2010)

    Article  Google Scholar 

  • C. Keskin, F. Kıraç, Y.E. Kara, L. Akarun, Hand pose estimation and hand shape classification using multi-layered randomized decision forests, in European Conference on Computer Vision (ECCV), 2012, pp. 852–863

    Google Scholar 

  • R.Z. Khan, N.A. Ibraheem, Survey on gesture recognition for hand image postures. Comput. Inf. Sci. 5(3), 110–121 (2012)

    Google Scholar 

  • T.-K. Kim, S.-F. Wong, R. Cipolla, Tensor canonical correlation analysis for action classification, in IEEE Conference on Computer Vision and Pattern Recognition, 2007

    Google Scholar 

  • D.K.H. Kohlsdorf, T.E. Starner, MAGIC summoning: towards automatic suggesting and testing of gestures with low probability of false positives during use. J. Mach. Learn. Res. 14(1), 209–242 (2013)

    MathSciNet  Google Scholar 

  • M. Kolsch, M. Turk, Fast 2D hand tracking with flocks of features and multi-cue integration, in IEEE Workshop on Real-Time Vision for Human-Computer Interaction, 2004, pp. 158–165

    Google Scholar 

  • J. Konecny, M. Hagara, One-shot-learning gesture recognition using hog-hof features. J. Mach. Learn. Res. 15, 2513–2532 (2014), http://jmlr.org/papers/v15/konecny14a.html

  • Y. Kong, B. Satarboroujeni, Y. Fu, Hierarchical 3D kernel descriptors for action recognition using depth sequences, in Automatic Face and Gesture Recognition, 2015

    Google Scholar 

  • J.B. Kruskal, M. Liberman, The symmetric time warping algorithm: from continuous to discrete, in Time Warps, Addison-Wesley, 1983

    Google Scholar 

  • A. Kurakin, Z. Zhang, Z. Liu, A real time system for dynamic hand gesture recognition with a depth sensor, in European Signal Processing Conference, EUSIPCO, 2012, pp. 1975–1979

    Google Scholar 

  • J.D. Lafferty, A. McCallum, F.C.N. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in International Conference on Machine Learning (ICML), 2001, pp. 282–289

    Google Scholar 

  • H. Lane, R.J. Hoffmeister, B. Bahan, A Journey into the Deaf-World (DawnSign Press, San Diego, 1996)

    Google Scholar 

  • I. Laptev, On space-time interest points, Int. J. Comput. Vis. 64(2–3), 107–123, (2005). ISSN 0920-5691. doi:10.1007/s11263-005-1838-7

  • I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in CVPR, 2008, pp. 1–8

    Google Scholar 

  • J.J. LaViola Jr., A survey of hand posture and gesture recognition techniques and technology, Technical Report, Providence, RI, USA, 1999

    Google Scholar 

  • H.K. Lee, J.H. Kim, An HMM-based threshold model approach for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(10), 961–973 (1999)

    Article  Google Scholar 

  • C. Li, K.M. Kitani, Pixel-level hand detection for ego-centric videos, in CVPR, 2013

    Google Scholar 

  • W. Li, Z. Zhang, Z. Liu, Action recognition based on a bag of 3D points, in CVPR Workshops, 2010, pp. 9–14

    Google Scholar 

  • H. Liang, J. Yuan, D. Thalmann, Z. Zhang, Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization. Vis. Comput. 29(6–8), 837–848 (2013)

    Article  Google Scholar 

  • H. Liang, J. Yuan, D. Thalmann, Parsing the hand in depth images. IEEE Trans. Multimed. 16(5), 1241–1253 (2014)

    Article  Google Scholar 

  • Z. Lin, Z. Jiang, L.S. Davis, Recognizing actions by shape-motion prototype trees, in IEEE International Conference on Computer Vision, ICCV, 2009, pp. 444–451

    Google Scholar 

  • K. Liu, C. Chen, R. Jafari, N. Kehtarnavaz, Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14(6), 1898–1903 (2014)

    Article  Google Scholar 

  • L. Liu, L. Shao, Learning discriminative representations from RGB-D video data, in International Joint Conference on Artificial Intelligence (IJCAI), 2013, pp. 1493–1500

    Google Scholar 

  • O. Lopes, M. Reyes, S. Escalera, J. Gonzàlez, Spherical blurred shape model for 3-D object and pose recognition: quantitative analysis and HCI applications in smart environments. IEEE T. Cybern. 44(12), 2379–2390 (2014)

    Article  Google Scholar 

  • Y.M. Lui, Human gesture recognition on product manifolds. J. Mach. Learn. Res. 13(11), 3297–3321 (2012)

    MathSciNet  MATH  Google Scholar 

  • J. Luo, W. Wang, H. Qi, Spatio-temporal feature extraction and representation for RGB-D human action recognition, in PRL, 2014

    Google Scholar 

  • S. Ma, J. Zhang, N. Ikizler-Cinbis, S. Sclaroff, Action recognition and localization by hierarchical space-time segments, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013

    Google Scholar 

  • M.R. Malgireddy, I. Nwogu, V. Govindaraju, Language-motivated approaches to action recognition. J. Mach. Learn. Res. 14, 2189–2212 (2013). http://jmlr.org/papers/v14/malgireddy13a.html

  • J. Martin, V. Devin, J.L. Crowley, Active hand tracking, in Automatic Face and Gesture Recognition, 1998, pp. 573–578

    Google Scholar 

  • A. Martinez, S. Du, A model of the perception of facial expressions of emotion by humans: research overview and perspectives. J. Mach. Learn. Res. 13(5), 1589–1608 (2012)

    MathSciNet  Google Scholar 

  • D. McNeil, How language began, gesture and speech in human evolution, (Cambridge editorial, 2012)

    Google Scholar 

  • S. Mitra, T. Acharya, Gesture recognition: a survey. Trans. Syst. Man Cybern. Part C 37(3), 311–324, 2007. ISSN 1094-6977

    Google Scholar 

  • Z. Mo, U. Neumann, Real-time hand pose recognition using low-resolution depth images, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006, pp. 1499–1505

    Google Scholar 

  • B. Moghaddam, A. Pentland, Probabilistic visual learning for object detection, Technical Report 326, MIT, June 1995

    Google Scholar 

  • P. Molchanov, S. Gupta, K. Kim, K. Pulli, Multi-sensor system for driverÅ› hand-gesture recognition, in Automatic Face and Gesture Recognition, 2015

    Google Scholar 

  • J. Nagi, F. Ducatelle, G.A. Di Caro, D.C. Ciresan, U. Meier, A. Giusti, F. Nagi, J. Schmidhuber, L.M. Gambardella. Max-pooling convolutional neural networks for vision-based hand gesture recognition, in ICSIPA (IEEE, 2011), pp. 342–347. ISBN 978-1-4577-0243-3

    Google Scholar 

  • S. Nayak, S. Sarkar, B. Loeding, Unsupervised modeling of signs embedded in continuous sentences, in IEEE Workshop on Vision for Human-Computer Interaction, 2005

    Google Scholar 

  • S. Nayak, K. Duncan, S. Sarkar, B. Loeding, Finding recurrent patterns from continuous sign language sentences for automated extraction of signs. J. Mach. Learn. Res. 13(9), 2589–2615 (2012)

    MathSciNet  MATH  Google Scholar 

  • C. Neidle, A. Thangali, S. Sclaroff, Challenges in development of the American Sign Language lexicon video dataset (ASLLVD) corpus, in Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, 2012

    Google Scholar 

  • N. Neverova, C. Wolf, G.W. Taylor, F. Nebout, Hand segmentation with structured convolutional learning, in ACCV, 2014a

    Google Scholar 

  • N. Neverova, C. Wolf, G.W. Taylor, F. Nebout, Multi-scale deep learning for gesture detection and localization, in ChaLearn Looking at People, European Conference on Computer Vision, 2014b

    Google Scholar 

  • L. Nguyen-Dinh, A. Calatroni, G. Troster, Robust online gesture recognition with crowdsourced annotations. J. Mach. Learn. Res. 15, 3187–3220 (2014)

    Google Scholar 

  • E. Ohn-Bar, M.M. Trivedi, Hand gesture recognition in real-time for automotive interfaces: a multimodal vision-based approach and evaluations, in IEEE Transactions on Intelligent Transportation Systems, 2014

    Google Scholar 

  • I. Oikonomidis, N. Kyriazis, A.A. Argyros, Markerless and efficient 26-DOF hand pose recovery, in Asian Conference on Computer Vision (ACCV), 2010

    Google Scholar 

  • I. Oikonomidis, N. Kyriazis, A.A. Argyros, Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints, in IEEE International Conference on Computer Vision (ICCV), 2011, pp. 2088–2095

    Google Scholar 

  • K. Oka, Y. Sato, H. Koike, Real-time fingertip tracking and gesture recognition. IEEE Comput. Graphics Appl. 22(6), 64–71 (2002)

    Article  Google Scholar 

  • R. Oka, Spotting method for classification of real world data. Comput. J. 41(8), 559–565 (1998)

    Article  MATH  Google Scholar 

  • E.J. Ong, R. Bowden, A boosted classifier tree for hand shape detection, in Face and Gesture Recognition, 2004, pp. 889–894

    Google Scholar 

  • O. Oreifej, Z. Liu, HON4D: histogram of oriented 4D normals for activity recognition from depth sequences, in CVPR, 2013, pp. 716–723

    Google Scholar 

  • A. Pardo, A. Clapes, S. Escalera, O. Pujol, Actions in context: system for people with dementia, in 2nd International Workshop on Citizen Sensor Networks (Citisen2013) at the European Conference on Complex Systems (ECCS’13), 2013

    Google Scholar 

  • X. Peng, L. Wang, Z. Cai, Y. Qiao, Action and gesture temporal spotting with super vector representation, in Computer Vision—ECCV 2014 Workshops, ed. by L. Agapito, M.M. Bronstein, C. Rother, Lecture Notes in Computer Science, vol. 8925 (Springer, Berlin, 2015), pp. 518–527. ISBN 978-3-319-16177-8. doi:10.1007/978-3-319-16178-5_36

  • A. Pieropan, G. Salvi, K.Pauwels, H. Kjellstrom, Audio-visual classification and detection of human manipulation actions, in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014

    Google Scholar 

  • V. Pitsikalis, A. Katsamanis, S. Theodorakis, P. Maragos, Multimodal gesture recognition via multiple hypotheses rescoring. J. Mach. Learn. Res. (2014)

    Google Scholar 

  • N. Pugeault, R. Bowden, Spelling it out: real-time ASL fingerspelling recognition, in ICCV Workshops, 2011, pp. 1114–1119

    Google Scholar 

  • A. Quattoni, S.B. Wang, L.-P. Morency, M. Collins, T. Darrell, Hidden conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1848–1852 (2007)

    Article  Google Scholar 

  • D. Ramanan, Learning to parse images of articulated bodies, in NIPS, 2006, pp. 1129–1136

    Google Scholar 

  • J.M. Rehg, T. Kanade, Model-based tracking of self-occluding articulated objects, in IEEE International Conference on Computer Vision (ICCV), 1995, pp. 612–617

    Google Scholar 

  • Z. Ren, J. Meng, J. Yuan, Z. Zhang, Robust hand gesture recognition with Kinect sensor, in ACM International Conference on Multimedia, 2011a, pp. 759–760

    Google Scholar 

  • Z. Ren, J. Yuan, Z. Zhang, Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera, in ACM International Conference on Multimedia, 2011b, pp. 1093–1096

    Google Scholar 

  • Z. Ren, J. Yuan, J. Meng, Z. Zhang, Robust part-based hand gesture recognition using Kinect sensor. IEEE Trans. Multimed. 15(5), 1110–1120 (2013)

    Article  Google Scholar 

  • A. Roussos, S. Theodorakis, V. Pitsikalis, P. Maragos, Dynamic affine-invariant shape-appearance handshape features and classification in sign language videos. J. Mach. Learn. Res. 14(6), 1627–1663 (2013)

    MathSciNet  Google Scholar 

  • S. Ruffieux, D. Lalanne, E. Mugellini. ChAirGest: a challenge for multimodal mid-air gesture recognition for close HCI, in Proceedings of the 15th ACM on International Conference on Multimodal Interaction, 2013, pp. 483–488

    Google Scholar 

  • A. Sadeghipour, L.-P. Morency, S. Kopp, Gesture-based object recognition using histograms of guiding strokes, in British Machine Vision Conference, 2012, pp. 44.1–44.11

    Google Scholar 

  • D. Sánchez, M.A. Bautista, S. Escalera, HuPBA 8k+: dataset and ECOC-graphcut based segmentation of human limbs. Neurocomputing, 2014

    Google Scholar 

  • B. Sapp, B. Taskar, Modec: multimodal decomposable models for human pose estimation, in CVPR, IEEE, 2013

    Google Scholar 

  • Y. Sato, T. Kobayashi, Extension of Hidden Markov Models to deal with multiple candidates of observations and its application to mobile-robot-oriented gesture recognition, in International Conference on Pattern Recognition (ICPR), vol, II, 2002, pp. 515–519

    Google Scholar 

  • J.D. Schein, At Home Among Strangers (Gallaudet U. Press, Washington, DC, 1989)

    Google Scholar 

  • C. Schuldt, I. Laptev, B. Caputo, Recognizing human actions: a local SVM approach, in ICPR, vol. 3, 2004, pp. 32–36

    Google Scholar 

  • N. Shapovalova, W. Gong., M. Pedersoli, F.X. Roca, J. Gonzalez, On importance of interactions and context in human action recognition, in Pattern Recognition and Image Analysis, 2011, pp. 58–66

    Google Scholar 

  • J. Shotton, A.W. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake, Real-time human pose recognition in parts from single depth images, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1297–1304

    Google Scholar 

  • L. Sigal, A.O. Balan, M.J. Black, HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010)

    Article  Google Scholar 

  • C. Sminchisescu, A. Kanaujia, D. Metaxas, Conditional models for contextual human motion recognition. Comput. Vis. Image Underst. 104, 210–220 (2006)

    Article  Google Scholar 

  • Y. Song, D. Demirdjian, R. Davis, Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database, in Automatic Face and Gesture Recognition, 2011, pp. 500–506

    Google Scholar 

  • T. Starner, A. Pentland, Real-time American Sign Language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)

    Article  Google Scholar 

  • N. Stefanov, A. Galata, R. Hubbold, Real-time hand tracking with variable-length Markov Models of behaviour, in Real Time Vision for Human-Computer Interaction, 2005

    Google Scholar 

  • B. Stenger, A. Thayananthan, P.H.S. Torr, R. Cipolla, Filtering using a tree-based estimator, in IEEE International Conference on Computer Vision (ICCV), 2003, pp. 1063–1070

    Google Scholar 

  • E. Sudderth, M. Mandel, W. Freeman, A. Willsky, Distributed occlusion reasoning for tracking with nonparametric belief propagation, in Neural Information Processing Systems (NIPS), 2004

    Google Scholar 

  • D. Tran, D. Forsyth, Improved human parsing with a full relational model, in ECCV (IEEE, 2010), pp. 227–240

    Google Scholar 

  • J. Triesch, C. von der Malsburg, A system for person-independent hand posture recognition against complex backgrounds. IEEE Trans. Pattern Anal. Mach. Intell. 23(12), 1449–1453 (2001)

    Article  Google Scholar 

  • J. Triesch, C. von der Malsburg, Classification of hand postures against complex backgrounds using elastic graph matching. Image Vis. Comput. 20(13–14), 937–943 (2002)

    Article  Google Scholar 

  • M. Van den Bergh, E. Koller-Meier, L. Van Gool, Real-time body pose recognition using 2D or 3D haarlets. Int. J. Comput. Vis. 83(1), 72–84 (2009)

    Article  Google Scholar 

  • P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2001, pp. 511–518

    Google Scholar 

  • C. Vogler, D Metaxas, Parallel Hidden Markov Models for American Sign Language recognition, In IEEE International Conference on Computer Vision (ICCV), 1999, pp. 116–122

    Google Scholar 

  • J. Wan, Q. Ruan, W. Li, S. Deng, One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14, 2549–2582 (2013). http://jmlr.org/papers/v14/wan13a.html

  • H. Wang, C. Schmid, Action recognition with improved trajectories, in IEEE International Conference on Computer Vision, 2013

    Google Scholar 

  • H. Wang, A. Stefan, S. Moradi, V. Athitsos, C. Neidle, F. Kamangar, A system for large vocabulary sign search, in Workshop on Sign, Gesture and Activity (SGA), 2010

    Google Scholar 

  • H. Wang, X. Chai, Y. Zhou, X. Chen, Fast sign language recognition benefited from low rank approximation, in Automatic Face and Gesture Recognition, 2015a

    Google Scholar 

  • J. Wang, Z. Liu, Y. Wu, J. Yuan, Learning actionlet ensemble for 3D human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 914–927 (2014)

    Article  Google Scholar 

  • R.Y. Wang, J. Popović, Real-time hand-tracking with a color glove. ACM Trans. Graph. 28(3), 63:1–63:8 (2009)

    Google Scholar 

  • Y. Wang, D. Tran, Z. Liao, D. Forsyth, Discriminative hierarchical part-based models for human parsing and action recognition. J. Mach. Learn. Res. 13(10), 3075–3102 (2012)

    MathSciNet  MATH  Google Scholar 

  • Z. Wang, L. Wang, W. Du, Q. Yu, Action spotting system using Fisher vector, in CVPR ChaLearn Looking at People Workshop 2015, 2015

    Google Scholar 

  • M. Wilhelm, A generic context aware gesture recognition framework for smart environments, in PerCom Workshops, 2012, pp. 536–537

    Google Scholar 

  • A.D. Wilson, A.F. Bobick, Parametric Hidden Markov Models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21(9), 884–900 (1999)

    Article  Google Scholar 

  • J. Wu, J. Cheng, Bayesian co-boosting for multi-modal gesture recognition. J. Mach. Learn. Res. 15(1), 3013–3036 (2014)

    MATH  Google Scholar 

  • Y. Wu, T.S. Huang, View-independent recognition of hand postures, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2000, pp. 88–94

    Google Scholar 

  • Y. Xiao, Z. Zhang, A. Beck, J. Yuan, D. Thalmann, Human-robot interaction by understanding upper body gestures. Presence 23(2), 133–154 (2014)

    Article  Google Scholar 

  • H.D. Yang, S. Sclaroff, S.W. Lee, Sign language spotting with a threshold model based on conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1264–1277 (2009)

    Article  Google Scholar 

  • M.H. Yang, N. Ahuja, M. Tabb, Extraction of 2D motion trajectories and its application to hand gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 24(8), 1061–1074 (2002)

    Article  Google Scholar 

  • W. Yang, Y. Wang, G. Mori, Recognizing human actions from still images with latent poses, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2030–2037

    Google Scholar 

  • X. Yang, Y. Tian, Super normal vector for activity recognition using depth sequences, in CVPR, 2014a

    Google Scholar 

  • X. Yang, Y. Tian, Action recognition using super sparse coding vector with spatio-temporal awareness, in ECCV, 2014b

    Google Scholar 

  • G. Yao, H. Yao, X. Liu, F. Jiang, Real time large vocabulary continuous sign language recognition based on OP/Viterbi algorithm, International Conference on Pattern Recognition, vol. 3, 2006, pp. 312–315

    Google Scholar 

  • G. Yu, Z. Liu, J. Yuan, Discriminative orderlet mining for real-time recognition of human-object interaction, in ACCV, 2014

    Google Scholar 

  • J. Yuan, Z. Liu, Y. Wu, Discriminative video pattern search for efficient action detection. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1728–1743 (2011)

    Article  Google Scholar 

  • Z. Zafrulla, H. Brashear, T. Starner, H. Hamilton, P. Presti, American Sign Language recognition with the Kinect, in Proceedings of the 13th International Conference on Multimodal Interfaces, ICMI ’11, ACM, New York, NY, USA, 2011, pp. 279–286. ISBN 978-1-4503-0641-6. 10.1145/2070481.2070532. doi:10.1145/2070481.2070532

  • M. Zanfir, M. Leordeanu, C. Sminchisescu, The moving pose: An efficient 3D kinematics descriptor for low-latency action recognition and detection, in ICCV, 2013

    Google Scholar 

  • J. Zieren, K.-F. Kraiss, Robust person-independent visual sign language recognition. Iberian Conf. Pattern Recognit. Image Anal. 1, 520–528 (2005)

    Google Scholar 

Download references

Acknowledgements

This work has been partially supported by ChaLearn Challenges in Machine Learning http://chalearn.org, the Human Pose Recovery and Behavior Analysis Group (HuPBA research group: http://www.maia.ub.es/~sergio/), the Pascal2 network of excellence, NSF grants 1128296, 1059235, 1055062, 1338118, 1035913, 0923494, and Spanish project TIN2013-43478-P. Our sponsors include Microsoft and Texas Instrument who donated prizes and provided technical support. The challenges were hosted by Kaggle.com and Coralab.org who are gratefully acknowledged. We thank our co-organizers of ChaLearn gesture and action recognition challenges: Miguel Reyes, Jordi Gonzalez, Xavier Baro, Jamie Shotton, Victor Ponce, Miguel Angel Bautista, and Hugo Jair Escalante.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Escalera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Escalera, S., Athitsos, V., Guyon, I. (2017). Challenges in Multi-modal Gesture Recognition. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57021-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57020-4

  • Online ISBN: 978-3-319-57021-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics