Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment

Jain, Shrajal; Rustagi, Aditya; Saurav, Sumeet; Saini, Ravi; Singh, Sanjay

doi:10.1007/s00521-020-05405-5

Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment

Original Article
Published: 09 October 2020

Volume 33, pages 6427–6441, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Shrajal Jain¹,
Aditya Rustagi¹,
Sumeet Saurav ORCID: orcid.org/0000-0002-4375-4107²,
Ravi Saini² &
…
Sanjay Singh²

1885 Accesses
41 Citations
Explore all metrics

Abstract

Existing techniques for Yoga pose recognition build classifiers based on sophisticated handcrafted features computed from the raw inputs captured in a controlled environment. These techniques often fail in complex real-world situations and thus, pose limitations on the practical applicability of existing Yoga pose recognition systems. This paper presents an alternative computationally efficient approach for Yoga pose recognition in complex real-world environments using deep learning. To this end, a Yoga pose dataset was created with the participation of 27 individual (8 males and 19 females), which consists of ten Yoga poses, namely Malasana, Ananda Balasana, Janu Sirsasana, Anjaneyasana, Tadasana, Kumbhakasana, Hasta Uttanasana, Paschimottanasana, Uttanasana, and Dandasana. To capture the videos, we used smartphone cameras having 4 K resolution and 30 fps frame rate. For the recognition of Yoga poses in real time, a three-dimensional convolutional neural network (3D CNN) architecture is designed and implemented. The designed architecture is a modified version of the C3D architecture initially introduced for the recognition of human actions. In the proposed modified C3D architecture, the computationally intensive fully connected layers are pruned, and supplementary layers such as the batch normalization and average pooling were introduced for computational efficiency. To the best of our knowledge, this is among the first studies, which utilized the inherent spatial–temporal relationship among Yoga poses for their recognition. The designed 3D CNN architecture achieved test recognition accuracy of 91.15% on the in-house prepared Yoga pose dataset consisting of ten Yoga poses. Furthermore, on the publicly available dataset, the designed architecture achieved competitive test recognition accuracy of 99.39%, along with multifold improvement in the execution speed compared to the existing state-of-the-art technique. To promote further study, we will make the in-house created Yoga pose dataset publicly available to the research community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploration of deep learning architectures for real-time yoga pose recognition

Article 08 March 2024

YoNet: A Neural Network for Yoga Pose Classification

Article Open access 08 February 2023

Robust Pose Recognition Using Deep Learning

References

Kidokuchi L (2008) The philosophy of Yoga. http://spot.pcc.edu/~lkidoguc/Yoga/Yoga01.htm. Accessed 13 November 2019
Chen HT, He YZ, Hsu CC et al (2014) Yoga posture recognition for self-training. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), pp. 496–505
Sathyanarayanan G, Vengadavaradan A, Bharadwaj B (2019) Role of yoga and mindfulness in severe mental illnesses: a narrative review. Int J Yoga 12:3–28. https://doi.org/10.4103/ijoy.IJOY_65_1
Article Google Scholar
Guddeti RR, Dang G, Williams MA, Alla VM (2018) Role of Yoga in cardiac disease and rehabilitation. J Cardiopulm Rehabil Prev. https://doi.org/10.1097/hcr.0000000000000372
Article Google Scholar
Sethi JK, Nagendra H, Ganpat TS (2013) Yoga improves attention and self-esteem in underprivileged girl student. J Educ Health Promot 2:55
Article Google Scholar
Wilhelm FH, Grossman P, Coyle MA (2004) Improving estimation of cardiac vagal tone during spontaneous breathing using a paced breathing calibration. Biomed Sci Instrum 40:317–324
Google Scholar
Risher B (2019) Yoga in schools really works: this is how one program helps students decompress. https://www.yogajournal.com/lifestyle/yoga-and-mindfulness-programs-for-schools. Accessed 14 November 2019
Schure MB, Christopher J, Christopher S (2008) Mind–body medicine and the art of self-care: teaching mindfulness to counseling students through yoga, meditation, and qigong. J Couns Dev. https://doi.org/10.1002/j.1556-6678.2008.tb00625.x
Article Google Scholar
Lim S-A, Cheong K-J (2015) Regular Yoga practice improves antioxidant status, immune function, and stress hormone releases in young healthy people: a randomized, double-blind, controlled pilot study. J Altern Complement Med 1:1. https://doi.org/10.1089/acm.2014.0044
Article Google Scholar
Chen HT, He YZ, Hsu CC (2018) Computer-assisted yoga training system. Multimed Tools Appl 77:23969–23991. https://doi.org/10.1007/s11042-018-5721-2
Article Google Scholar
Gao Z, Zhang H, Liu AA et al (2016) Human action recognition on depth dataset. Neural Comput Appl 27:2047–2054. https://doi.org/10.1007/s00521-015-2002-0
Article Google Scholar
Connaghan D, Kelly P, O’Connor NE et al (2011) Multi-sensor classification of tennis strokes. Proc IEEE Sens. https://doi.org/10.1109/icsens.2011.6127084
Article Google Scholar
Nordsborg NB, Espinosa HG, Thiel DV (2014) Estimating energy expenditure during front crawl swimming using accelerometers. Procedia Eng 72:132–137. https://doi.org/10.1016/j.proeng.2014.06.024
Article Google Scholar
Pai PF, ChangLiao LH, Lin KP (2017) Analyzing basketball games by a support vector machines with decision tree model. Neural Comput Appl 28:4159–4167. https://doi.org/10.1007/s00521-016-2321-9
Article Google Scholar
Bai L, Efstratiou C, Ang CS (2016) WeSport: utilising wrist-band sensing to detect player activities in basketball games. In: 2016 IEEE international conference on pervasive computing and communication workshops, PerCom workshops 2016. IEEE. pp. 1–6
Shan CZ, Su E, Ming L (2015) Investigation of upper limb movement during badminton smash. In: 2015 10th Asian control conference, pp 1–6. https://doi.org/10.1109/ascc.2015.7244605
Waldron M, Twist C, Highton J et al (2011) Movement and physiological match demands of elite rugby league using portable global positioning systems. J Sports Sci 29:1223–1230. https://doi.org/10.1080/02640414.2011.587445
Article Google Scholar
Kelly P, Healy A, Moran K, O’Connor NE (2010) A virtual coaching environment for improving golf swing technique. In: Proceedings of the 2010 ACM workshop on Surreal media and virtual cloning, ACM. pp. 51–56
Yang Y, Ramanan D (2011) Articulated pose estimation with flexible mixtures-of-parts. In: CVPR 2011, IEEE, pp 1385–1392
Wang F, Li Y (2013) Beyond physical connections: Tree models in human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 596–603
Patil S, Pawar A, Peshave A et al (2011) Yoga tutor: visualization and analysis using SURF algorithm. In: Proceedings of 2011 IEEE control system graduate research colloquium, ICSGRC 2011. pp. 43–46
Toshev A, Szegedy C (2013) DeepPose: human pose estimation via deep neural networks. https://doi.org/10.1109/cvpr.2014.214
Luo Z, Yang W, Ding ZQ, Liu L, Chen IM, Yeo SH, Ling KV, Duh HBL (2011) “left arm up!” interactive yoga training in virtual environment. In: 2011 IEEE virtual reality conference. IEEE. pp. 261–262
Hsieh CC, Wu BS, Lee CC (2011) A distance computer vision assisted yoga learning system. J. Comput. 6(11):2382–2388
Google Scholar
Tompson JJ, Jain A, Le-Cun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems. pp 1799–1807
Qiang B, Zhang S, Zhan Y, Xie W, Zhao T (2019) Improved convolutional pose machines for human pose esti-mation using image sensor data. Sensors 19(3):718
Article Google Scholar
Martinez J, Hossain R,Romero J, Little JJ (2017) A simple yet effective baseline for 3d human pose esti-mation. In: Proceedings of the IEEE international conference on computer vision. pp 2640–2649
Wang C, Wang Y, Lin Z, YuilleAL, Gao W (2014) Robust estimation of 3d human poses from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2361–2368
Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp7291–7299
Fang HS, Xie S, Tai YW, Lu C (2017) Rmpe: Regional multi-person pose estimation. In: Proceedings of the IEEE international conference on computer vision, pp. 2334–2343
Liu Y, Stoll C, Gall J, Seidel HP, Theobalt C (2011) Markerless motion capture of interacting characters using multi-view image segmentation. In: CVPR 2011, IEEE, pp 1249–1256
Alp Guler R, Neverova N, Kokkinos I (2018) Densepose: dense human pose estimation in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7297–7306
Joo H, Liu H, Tan L, Gui L, Nabbe B, Matthews I, Kanade T, Nobuhara S, SheikhY (2015) Panoptic studio: a massively multiview system for social motion capture. In: Proceedings of the IEEE international conference on computer vision, pp. 3334–3342
Dantone M, Gall J, Leistner C, Van Gool L (2013) Human pose estimation using body parts dependent joint regressors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3041–3048
Tian Y, Zitnick CL, Narasimhan SG (2012) Exploring the spatial hierarchy of mixture models for human pose estimation. In: European Conference on Computer Vision, Springer, pp 256–269
Sapp B, Taskar B (2013) Modec: Multimodal decomposable models for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3674–3681
Pishchulin L, An-driluka M, Gehler P, Schiele B (2013) Poselet conditioned pictorial structures. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook Mamore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Article Google Scholar
Mohanty A, Ahmed A, Goswami T, Das A, Vaishnavi P, Sahay RR (2017) Robust pose recognition using deep learning. In: Proceedings of international conference on computer vision and image processing, Springer. pp. 93–105
Yadav SK, Singh A, Gupta A, Raheja J (2019) Real-time yoga recognition using deep learning. Neural Comput Appl 31:9349. https://doi.org/10.1007/s00521-019-04232-7
Article Google Scholar
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1725–1732
Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE trans Patttern Anal Mach Intell 40(6):1510–1517
Article Google Scholar
Vanholder H (2016) Efficient inference with tensorrt
Ditty M, Karandikar A, Reed D (2018) NVidia’s Xavier soc. In: Hot chips: a symposium on high performance chips

Download references

Acknowledgments

The work is carried out at CSIR-CEERI, Pilani, and the authors would like to thank the Director, CSIR-CEERI, Pilani, for providing the necessary infrastructure and technical support. We would also like to acknowledge the consistent encouragement and motivation by the Head of the Cognitive Computing Group at CSIR-CEERI, Pilani. The authors would also like to thank all the volunteers for their active participation in the database preparation. We would also like to acknowledge Yadav et al. for making their dataset publicly available.

Author information

Authors and Affiliations

Department of Electrical Engineering, Birla Institute of Technology and Science (BITS), Pilani, 333 031, India
Shrajal Jain & Aditya Rustagi
Cognitive Computing Group, CSIR-Central Electronics Engineering Research Institute, Pilani, 333 031, India
Sumeet Saurav, Ravi Saini & Sanjay Singh

Authors

Shrajal Jain
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Rustagi
View author publications
You can also search for this author in PubMed Google Scholar
Sumeet Saurav
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Saini
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumeet Saurav.

Ethics declarations

Conflict of interest

The authors declare they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, S., Rustagi, A., Saurav, S. et al. Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment. Neural Comput & Applic 33, 6427–6441 (2021). https://doi.org/10.1007/s00521-020-05405-5

Download citation

Received: 30 December 2019
Accepted: 29 September 2020
Published: 09 October 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00521-020-05405-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment

Abstract

Access this article

Similar content being viewed by others

Exploration of deep learning architectures for real-time yoga pose recognition

YoNet: A Neural Network for Yoga Pose Classification

Robust Pose Recognition Using Deep Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Three-dimensional CNN-inspired deep learning architecture for Yoga pose recognition in the real-world environment

Abstract

Access this article

Similar content being viewed by others

Exploration of deep learning architectures for real-time yoga pose recognition

YoNet: A Neural Network for Yoga Pose Classification

Robust Pose Recognition Using Deep Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation