Abstract
Existing techniques for Yoga pose recognition build classifiers based on sophisticated handcrafted features computed from the raw inputs captured in a controlled environment. These techniques often fail in complex real-world situations and thus, pose limitations on the practical applicability of existing Yoga pose recognition systems. This paper presents an alternative computationally efficient approach for Yoga pose recognition in complex real-world environments using deep learning. To this end, a Yoga pose dataset was created with the participation of 27 individual (8 males and 19 females), which consists of ten Yoga poses, namely Malasana, Ananda Balasana, Janu Sirsasana, Anjaneyasana, Tadasana, Kumbhakasana, Hasta Uttanasana, Paschimottanasana, Uttanasana, and Dandasana. To capture the videos, we used smartphone cameras having 4 K resolution and 30 fps frame rate. For the recognition of Yoga poses in real time, a three-dimensional convolutional neural network (3D CNN) architecture is designed and implemented. The designed architecture is a modified version of the C3D architecture initially introduced for the recognition of human actions. In the proposed modified C3D architecture, the computationally intensive fully connected layers are pruned, and supplementary layers such as the batch normalization and average pooling were introduced for computational efficiency. To the best of our knowledge, this is among the first studies, which utilized the inherent spatial–temporal relationship among Yoga poses for their recognition. The designed 3D CNN architecture achieved test recognition accuracy of 91.15% on the in-house prepared Yoga pose dataset consisting of ten Yoga poses. Furthermore, on the publicly available dataset, the designed architecture achieved competitive test recognition accuracy of 99.39%, along with multifold improvement in the execution speed compared to the existing state-of-the-art technique. To promote further study, we will make the in-house created Yoga pose dataset publicly available to the research community.
Similar content being viewed by others
References
Kidokuchi L (2008) The philosophy of Yoga. http://spot.pcc.edu/~lkidoguc/Yoga/Yoga01.htm. Accessed 13 November 2019
Chen HT, He YZ, Hsu CC et al (2014) Yoga posture recognition for self-training. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), pp. 496–505
Sathyanarayanan G, Vengadavaradan A, Bharadwaj B (2019) Role of yoga and mindfulness in severe mental illnesses: a narrative review. Int J Yoga 12:3–28. https://doi.org/10.4103/ijoy.IJOY_65_1
Guddeti RR, Dang G, Williams MA, Alla VM (2018) Role of Yoga in cardiac disease and rehabilitation. J Cardiopulm Rehabil Prev. https://doi.org/10.1097/hcr.0000000000000372
Sethi JK, Nagendra H, Ganpat TS (2013) Yoga improves attention and self-esteem in underprivileged girl student. J Educ Health Promot 2:55
Wilhelm FH, Grossman P, Coyle MA (2004) Improving estimation of cardiac vagal tone during spontaneous breathing using a paced breathing calibration. Biomed Sci Instrum 40:317–324
Risher B (2019) Yoga in schools really works: this is how one program helps students decompress. https://www.yogajournal.com/lifestyle/yoga-and-mindfulness-programs-for-schools. Accessed 14 November 2019
Schure MB, Christopher J, Christopher S (2008) Mind–body medicine and the art of self-care: teaching mindfulness to counseling students through yoga, meditation, and qigong. J Couns Dev. https://doi.org/10.1002/j.1556-6678.2008.tb00625.x
Lim S-A, Cheong K-J (2015) Regular Yoga practice improves antioxidant status, immune function, and stress hormone releases in young healthy people: a randomized, double-blind, controlled pilot study. J Altern Complement Med 1:1. https://doi.org/10.1089/acm.2014.0044
Chen HT, He YZ, Hsu CC (2018) Computer-assisted yoga training system. Multimed Tools Appl 77:23969–23991. https://doi.org/10.1007/s11042-018-5721-2
Gao Z, Zhang H, Liu AA et al (2016) Human action recognition on depth dataset. Neural Comput Appl 27:2047–2054. https://doi.org/10.1007/s00521-015-2002-0
Connaghan D, Kelly P, O’Connor NE et al (2011) Multi-sensor classification of tennis strokes. Proc IEEE Sens. https://doi.org/10.1109/icsens.2011.6127084
Nordsborg NB, Espinosa HG, Thiel DV (2014) Estimating energy expenditure during front crawl swimming using accelerometers. Procedia Eng 72:132–137. https://doi.org/10.1016/j.proeng.2014.06.024
Pai PF, ChangLiao LH, Lin KP (2017) Analyzing basketball games by a support vector machines with decision tree model. Neural Comput Appl 28:4159–4167. https://doi.org/10.1007/s00521-016-2321-9
Bai L, Efstratiou C, Ang CS (2016) WeSport: utilising wrist-band sensing to detect player activities in basketball games. In: 2016 IEEE international conference on pervasive computing and communication workshops, PerCom workshops 2016. IEEE. pp. 1–6
Shan CZ, Su E, Ming L (2015) Investigation of upper limb movement during badminton smash. In: 2015 10th Asian control conference, pp 1–6. https://doi.org/10.1109/ascc.2015.7244605
Waldron M, Twist C, Highton J et al (2011) Movement and physiological match demands of elite rugby league using portable global positioning systems. J Sports Sci 29:1223–1230. https://doi.org/10.1080/02640414.2011.587445
Kelly P, Healy A, Moran K, O’Connor NE (2010) A virtual coaching environment for improving golf swing technique. In: Proceedings of the 2010 ACM workshop on Surreal media and virtual cloning, ACM. pp. 51–56
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011, IEEE, pp 1385–1392
Wang F, Li Y (2013) Beyond physical connections: Tree models in human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 596–603
Patil S, Pawar A, Peshave A et al (2011) Yoga tutor: visualization and analysis using SURF algorithm. In: Proceedings of 2011 IEEE control system graduate research colloquium, ICSGRC 2011. pp. 43–46
Toshev A, Szegedy C (2013) DeepPose: human pose estimation via deep neural networks. https://doi.org/10.1109/cvpr.2014.214
Luo Z, Yang W, Ding ZQ, Liu L, Chen IM, Yeo SH, Ling KV, Duh HBL (2011) “left arm up!” interactive yoga training in virtual environment. In: 2011 IEEE virtual reality conference. IEEE. pp. 261–262
Hsieh CC, Wu BS, Lee CC (2011) A distance computer vision assisted yoga learning system. J. Comput. 6(11):2382–2388
Tompson JJ, Jain A, Le-Cun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems. pp 1799–1807
Qiang B, Zhang S, Zhan Y, Xie W, Zhao T (2019) Improved convolutional pose machines for human pose esti-mation using image sensor data. Sensors 19(3):718
Martinez J, Hossain R,Romero J, Little JJ (2017) A simple yet effective baseline for 3d human pose esti-mation. In: Proceedings of the IEEE international conference on computer vision. pp 2640–2649
Wang C, Wang Y, Lin Z, YuilleAL, Gao W (2014) Robust estimation of 3d human poses from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2361–2368
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp7291–7299
Fang HS, Xie S, Tai YW, Lu C (2017) Rmpe: Regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp. 2334–2343
Liu Y, Stoll C, Gall J, Seidel HP, Theobalt C (2011) Markerless motion capture of interacting characters using multi-view image segmentation. In: CVPR 2011, IEEE, pp 1249–1256
Alp Guler R, Neverova N, Kokkinos I (2018) Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7297–7306
Joo H, Liu H, Tan L, Gui L, Nabbe B, Matthews I, Kanade T, Nobuhara S, SheikhY (2015) Panoptic studio: a massively multiview system for social motion capture. In: Proceedings of the IEEE international conference on computer vision, pp. 3334–3342
Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3041–3048
Tian Y, Zitnick CL, Narasimhan SG (2012) Exploring the spatial hierarchy of mixture models for human pose estimation. In: European Conference on Computer Vision, Springer, pp 256–269
Sapp B, Taskar B (2013) Modec: Multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3681
Pishchulin L, An-driluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook Mamore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Mohanty A, Ahmed A, Goswami T, Das A, Vaishnavi P, Sahay RR (2017) Robust pose recognition using deep learning. In: Proceedings of international conference on computer vision and image processing, Springer. pp. 93–105
Yadav SK, Singh A, Gupta A, Raheja J (2019) Real-time yoga recognition using deep learning. Neural Comput Appl 31:9349. https://doi.org/10.1007/s00521-019-04232-7
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1725–1732
Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE trans Patttern Anal Mach Intell 40(6):1510–1517
Vanholder H (2016) Efficient inference with tensorrt
Ditty M, Karandikar A, Reed D (2018) NVidia’s Xavier soc. In: Hot chips: a symposium on high performance chips
Acknowledgments
The work is carried out at CSIR-CEERI, Pilani, and the authors would like to thank the Director, CSIR-CEERI, Pilani, for providing the necessary infrastructure and technical support. We would also like to acknowledge the consistent encouragement and motivation by the Head of the Cognitive Computing Group at CSIR-CEERI, Pilani. The authors would also like to thank all the volunteers for their active participation in the database preparation. We would also like to acknowledge Yadav et al. for making their dataset publicly available.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jain, S., Rustagi, A., Saurav, S. et al. Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment. Neural Comput & Applic 33, 6427–6441 (2021). https://doi.org/10.1007/s00521-020-05405-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05405-5