research-article

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

Authors:
Mingyi Shi

Shandong University, China, and AICFVE, Beijing Film Academy, China

Shandong University, China, and AICFVE, Beijing Film Academy, China
View Profile

,
Kfir Aberman

AICFVE, Beijing Film Academy, China, and Tel-Aviv University, Israel

AICFVE, Beijing Film Academy, China, and Tel-Aviv University, Israel
View Profile

,
Andreas Aristidou

University of Cyprus and RISE Research Centre, Cyprus

University of Cyprus and RISE Research Centre, Cyprus
View Profile

,
Taku Komura

Edinburgh University, Japan

Edinburgh University, Japan
View Profile

,
Dani Lischinski

Shandong University, China and The Hebrew University of Jerusalem, Israel and AICFVE, Beijing Film Academy, Israel

Shandong University, China and The Hebrew University of Jerusalem, Israel and AICFVE, Beijing Film Academy, Israel
View Profile

,
Daniel Cohen-Or

Tel-Aviv University, Israel, and AICFVE, Beijing Film Academy, Israel

Tel-Aviv University, Israel, and AICFVE, Beijing Film Academy, Israel
View Profile

,
Baoquan Chen

CFCS, Peking University, China, and AICFVE, Beijing Film Academy, China

CFCS, Peking University, China, and AICFVE, Beijing Film Academy, China
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 40 Issue 1Article No.: 1pp 1–15https://doi.org/10.1145/3407659

Published:04 September 2020Publication History

ACM Transactions on Graphics

Abstract

We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from a monocular video. While previous methods rely on either rigging or inverse kinematics (IK) to associate a consistent skeleton with temporally coherent joint rotations, our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used motion representation. At the crux of our approach lies a deep neural network with embedded kinematic priors, which decomposes sequences of 2D joint positions into two separate attributes: a single, symmetric skeleton encoded by bone lengths, and a sequence of 3D joint rotations associated with global root positions and foot contact labels. These attributes are fed into an integrated forward kinematics (FK) layer that outputs 3D positions, which are compared to a ground truth. In addition, an adversarial loss is applied to the velocities of the recovered rotations to ensure that they lie on the manifold of natural joint rotations. The key advantage of our approach is that it learns to infer natural joint rotations directly from the training data rather than assuming an underlying model, or inferring them from joint positions using a data-agnostic IK solver. We show that enforcing a single consistent skeleton along with temporally coherent joint rotations constrains the solution space, leading to a more robust handling of self-occlusions and depth ambiguities.

References

Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE Computer Society, Washington, DC.Google Scholar
Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video based reconstruction of 3D people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3 (July 2005), 408--416. DOI:https://doi.org/10.1145/1073204.1073207Google ScholarDigital Library
Anurag Arnab, Carl Doersch, and Andrew Zisserman. 2019. Exploiting temporal context for 3D human pose estimation in the wild. In Proceeding of the IEEE/CVF Conference in Computer Vision and Pattern Recognition (CVPR’19). 3395--3404.Google ScholarCross Ref
Andreas Baak, Meinard Müller, Gaurav Bharaj, Hans-Peter Seidel, and Christian Theobalt. 2011. A data-driven approach for real-time full body pose reconstruction from a depth camera. In Proceedings of the International Conference on Computer Vision (ICCV’11). IEEE Computer Society, 1092--1099. DOI:https://doi.org/10.1109/ICCV.2011.6126356Google ScholarDigital Library
Didier Bieler, Semih Günel, Pascal Fua, and Helge Rhodin. 2019. Gravity as a Reference for Estimating a Person’s Height from Video. arxiv:cs.CV/1909.02211Google Scholar
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, Berlin, Germany, 561--578.Google Scholar
Ernesto Brau and Hao Jiang. 2016. 3D human pose estimation via deep learning from 2D annotations. In Proceedings of the 2016 4th International Conference on 3D Vision (3DV’16). 582--591. DOI:https://doi.org/10.1109/3DV.2016.84Google ScholarCross Ref
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google Scholar
João Carreira, Pulkit Agrawal, Katerina Fragkiadaki, and Jitendra Malik. 2016. Human pose estimation with iterative error feedback. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 4733--4742.Google ScholarCross Ref
Ching-Hang Chen and Deva Ramanan. 2017. 3d human pose estimation= 2d pose estimation+ matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 7035--7043.Google ScholarCross Ref
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018a. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7103--7112.Google ScholarCross Ref
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018b. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7103--7112.Google ScholarCross Ref
CMU. 2019. CMU Graphics Lab Motion Capture Database. Retrieved from http://mocap.cs.cmu.edu/.Google Scholar
Rishabh Dabral, Anurag Mundhada, Uday Kusupati, Safeer Afaque, Abhishek Sharma, and Arjun Jain. 2018. Learning 3D human pose from structure and motion. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, 679--696.Google ScholarCross Ref
Yuzhu Dong, Aishat Aloba, Sachin Paryani, Lisa Anthony, Neha Rana, and Eakta Jain. 2017. Adult2Child: Dynamic scaling laws to create child-like motion. In Proceedings of the 10th International Conference on Motion in Games (MIG’17). ACM, New York, NY, Article 13, 13:1–13:10 pages. DOI:https://doi.org/10.1145/3136457.3136460Google ScholarDigital Library
Hao-Shu Fang,*Yuanlu Xu,*Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3D pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
Keith Grochow, Steven L. Martin, Aaron Hertzmann, and Zoran Popoviundefined. 2004. Style-based inverse kinematics. In Proceedings of the ACM SIGGRAPH 2004 Papers (SIGGRAPH’04). Association for Computing Machinery, New York, NY, 522--531. DOI:https://doi.org/10.1145/1186562.1015755Google ScholarDigital Library
Riza Alp Güler and Iasonas Kokkinos. 2019. HoloPose: Holistic 3D human reconstruction in-the-wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 10876--10886. DOI:https://doi.org/10.1109/CVPR.2019.01114Google ScholarCross Ref
Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7297--7306. DOI:https://doi.org/10.1109/CVPR.2018.00762Google ScholarCross Ref
Semih Günel, Helge Rhodin, and Pascal Fua. 2018. What Face and Body Shapes Can Tell Us About Height. arxiv:cs.CV/1805.10355Google Scholar
Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Gerard Pons-Moll, and Christian Theobalt. 2019. In the wild human pose estimation using explicit 2D features and intermediate 3D representations. arXiv preprint arXiv:1904.03289 (2019).Google Scholar
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969.Google Scholar
Mir Rayat Imtiaz Hossain and James J. Little. 2018. Exploiting temporal information for 3d human pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 69--86.Google Scholar
Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V. Gehler, Javier Romero, Ijaz Akhter, and Michael J. Black. 2017. Towards accurate marker-less human shape and pose estimation over time. In Proceedings of the 2017 International Conference on 3D Vision (3DV). IEEE, 421--430.Google Scholar
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7 (July 2014), 1325--1339. DOI:https://doi.org/10.1109/TPAMI.2013.248Google ScholarDigital Library
Sam Johnson and Mark Everingham. 2010. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference (BMVC’10). DOI:10.5244/C.24.12Google ScholarCross Ref
Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7122--7131. DOI:https://doi.org/10.1109/CVPR.2018.00744Google ScholarCross Ref
Angjoo Kanazawa, Jason Y. JZhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3D human dynamics from video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
Isinsu Katircioglu, Bugra Tekin, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2018. Learning latent representations of 3D human pose with deep neural networks. Int. J. Comput. Vision 126, 12 (2018), 1326--1341.Google ScholarDigital Library
Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019b. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarCross Ref
Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019a. Convolutional mesh regression for single-image human shape reconstruction. In Proceeding of the IEEE/CVF Conference in Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 4704--4713. DOI:https://doi.org/10.1109/CVPR.2017.500Google Scholar
Kyoungoh Lee, Inwoong Lee, and Sanghoon Lee. 2018a. Propagating LSTM: 3D pose estimation based on joint interdependency. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer International Publishing, 123--141.Google ScholarCross Ref
Kyungho Lee, Seyoung Lee, and Jehee Lee. 2018b. Interactive character animation by learning multi-objective control. ACM Trans. Graph. 37, 6, Article 180 (Dec. 2018), 10 pages. DOI:https://doi.org/10.1145/3272127.3275071Google ScholarDigital Library
Chen Li and Gim Hee Lee. 2019. Generating multiple hypotheses for 3D human pose estimation with mixture density network. arXiv preprint arXiv:1904.05547 (2019).Google Scholar
Shuang Liang, Xiao Sun, and Yichen Wei. 2018. Compositional human pose regression. Comput. Vision Image Understanding 176-177 (2018), 1--8.Google Scholar
Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, and Hui Cheng. 2017. Recurrent 3d pose sequence machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 810--819.Google ScholarCross Ref
Yebin Liu, Juergen Gall, Carsten Stoll, Qionghai Dai, Hans-Peter Seidel, and Christian Theobalt. 2013. Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (Nov. 2013), 2720--2735. DOI:https://doi.org/10.1109/TPAMI.2013.47Google Scholar
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6 (Oct. 2015), 248:1–248:16. DOI:https://doi.org/10.1145/2816795.2818013Google ScholarDigital Library
Diogo C. Luvizon, David Picard, and Hedi Tabia. 2018. 2D/3D pose estimation and action recognition using multitask deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 5137--5146.Google ScholarCross Ref
Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3D human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2659--2668. DOI:https://doi.org/10.1109/ICCV.2017.288Google Scholar
Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3D human pose estimation in the wild using improved CNN supervision. In Proceedings of the 2017 5th International Conference on 3D Vision (3DV’17). IEEE Computer Society, 506–516.Google ScholarCross Ref
Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, and Christian Theobalt. 2019. XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera. arxiv:cs.CV/1907.00837Google Scholar
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4, Article 44 (July 2017), 44:1–44:14 pages. DOI:https://doi.org/10.1145/3072959.3073596Google ScholarDigital Library
Francesc Moreno-Noguer. 2017. 3D human pose estimation from a single image via distance matrix regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 1561--1570. DOI:https://doi.org/10.1109/CVPR.2017.170Google ScholarCross Ref
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarCross Ref
George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
Georgios Pavlakos, Xiaowei Zhou, and Kostas Daniilidis. 2018a. Ordinal depth supervision for 3D human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7307--7316.Google ScholarCross Ref
Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 1263--1272.Google ScholarCross Ref
Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018b. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
Dario Pavllo, David Grangier, and Michael Auli. 2018. QuaterNet: A Quaternion-based Recurrent Model for Human Motion. arxiv:cs.CV/1805.06485Google Scholar
Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018. SFV: Reinforcement learning of physical skills from videos. In SIGGRAPH Asia 2018 Technical Papers. ACM, 178.Google ScholarDigital Library
Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2012. Reconstructing 3D human pose from 2D image landmarks. In Proceedings of the European Conference on Computer Vision (ECCV’12), Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Springer, Berlin, 573--586.Google ScholarDigital Library
Helge Rhodin, Mathieu Salzmann, and Pascal Fua. 2018a. Unsupervised geometry-aware representation for 3D human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarCross Ref
Helge Rhodin, Jörg Spörri, Isinsu Katircioglu, Victor Constantin, Frédéric Meyer, Erich Müller, Mathieu Salzmann, and Pascal Fua. 2018b. Learning monocular 3D human pose estimation from multi-view images. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
Nikolaos Sarafianos, Bogdan Boteanu, Bogdan Ionescu, and Ioannis A. Kakadiaris. 2016. 3D human pose estimation: A review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, C (Nov. 2016), 1--20. DOI:https://doi.org/10.1016/j.cviu.2016.09.002Google Scholar
Toby Sharp. 2012. The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR’12). IEEE Computer Society, Washington, DC, 103--110.Google Scholar
Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 1 (Jan. 2013), 116--124. DOI:https://doi.org/10.1145/2398356.2398381Google ScholarDigital Library
Leonid Sigal, Alexandru O. Balan, and Michael J. Black. 2009. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vision 87, 1 (Aug. 5, 2009), 4. DOI:https://doi.org/10.1007/s11263-009-0273-6Google ScholarDigital Library
Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2D and 3D image cues for monocular body pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, Washington, DC, 3941--3950.Google ScholarCross Ref
Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 991--1000.Google ScholarCross Ref
Denis Tome, Chris Russell, and Lourdes Agapito. 2017. Lifting from the deep: Convolutional 3D pose estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 2500--2509.Google ScholarCross Ref
Alexander Toshev and Christian Szegedy. 2014. DeepPose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 1653--1660. DOI:https://doi.org/10.1109/CVPR.2014.214Google ScholarDigital Library
Ruben Villegas, Jimei Yang, Duygu Ceylan, and Honglak Lee. 2018. Neural kinematic networks for unsupervised motion retargetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 8639--8648.Google ScholarCross Ref
Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 3, Article 97 (Aug. 2008), 97:1–97:9 pages. DOI:https://doi.org/10.1145/1360612.1360696Google ScholarDigital Library
Keze Wang, Liang Lin, Chenhan Jiang, Chen Qian, and Pengxu Wei. 2019. 3D human pose machines with self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2019).Google ScholarCross Ref
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 4724--4732. DOI:https://doi.org/10.1109/CVPR.2016.511Google ScholarCross Ref
Xiaolin Wei, Peizhao Zhang, and Jinxiang Chai. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM Trans. Graph. 31, 6, Article 188 (Nov. 2012), 188:1–188:12 pages. DOI:https://doi.org/10.1145/2366145.2366207Google ScholarDigital Library
Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human performance capture from monocular video. ACM Trans. Graph. 37, 2, Article 27 (May 2018), 27:1–27:15 pages. DOI:https://doi.org/10.1145/3181973Google ScholarDigital Library
Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019. DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare. arxiv:cs.CV/1910.00116Google Scholar
Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, and Xiaogang Wang. 2018. 3D human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 5255--5264. DOI:https://doi.org/10.1109/CVPR.2018.00551Google ScholarCross Ref
Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 2353--2360. DOI:https://doi.org/10.1109/CVPR.2014.301Google ScholarDigital Library
Anastasios Yiannakides, Andreas Aristidou, and Yiorgos Chrysanthou. 2019. Real-time 3D human pose and motion reconstruction from monocular RGB videos. Comput. Animat. Virtual Worlds 30, 3–4 (May 2019).Google ScholarCross Ref
Yusuke Yoshiyasu, Ryusuke Sagawa, Ko Ayusawa, and Akihiko Murai. 2018. Skeleton Transformer Networks: 3D Human Pose and Skinned Mesh from Single RGB Image. arxiv:cs.CV/1812.11328Google Scholar
Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, and Jitendra Malik. 2019. Predicting 3D human dynamics from video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarCross Ref
Zerong Zheng, Tao Yu, Yixuan Wei, Qionghai Dai, and Yebin Liu. 2019. DeepHuman: 3D human reconstruction from a single image. In Proceedings of the IEEE International Conference on Computer Vision. 7739–7749.Google ScholarCross Ref
Xingyi Zhou, Qi-Xing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, Washington, DC, 398--407.Google ScholarCross Ref
Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep kinematic pose regression. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer International Publishing, 186--201.Google ScholarCross Ref
Xiaowei Zhou, Menglong Zhu, Georgios Pavlakos, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2018. MonoCap: Monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (2018), 901--914. DOI:https://doi.org/10.1109/tpami.2018.2816031Google ScholarDigital Library
Yi Zhou, Connelly Barnes, Lu Jingwan, Yang Jimei, and Li Hao. 2019. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref

Index Terms

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Motion processing

Recommendations

3D human motion capturing based only on acceleration and angular rate measurement for low extremities
ICDHM'11: Proceedings of the Third international conference on Digital human modeling

Human motion capturing is used in ergonomics for ambulatory assessment of physical workloads in field. This is necessary to investigate the risk of work-related musculoskeletal disorders. Since more than fifteen years the IFA is developing and using the ...
Read More
Recovering articulated non-rigid shapes, motions and kinematic chains from video
AMDO'06: Proceedings of the 4th international conference on Articulated Motion and Deformable Objects

We propose an approach to analyze and recover articulated motion with non-rigid parts, e.g. the human body motion with non-rigid facial motion, under affine projection from feature trajectories. We model the motion using a set of intersecting subspaces. ...
Read More
A Group-Theoretic Construction with Spatiotemporal Wavelets for the Analysis of Rotational Motion

This paper presents a group-theoretic approach for the analysis of rotational motion in image sequences. This method relies on Lie algebras, Lie groups and Lie group representations to provide not only the continuous wavelets but also the related tools ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 40, Issue 1
February 2021
139 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3420236
Editor:
Marc Alexa
TU Berlin, Germany
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 September 2020
- Accepted: 1 June 2020
- Revised: 1 March 2020
- Received: 1 October 2019
Published in tog Volume 40, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Pose estimation
motion analysis
motion capturing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 1,395
  Total Downloads
- Downloads (Last 12 months)241
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

ACM Transactions on Graphics

Abstract

References

Cited By

Index Terms

Recommendations

3D human motion capturing based only on acceleration and angular rate measurement for low extremities

Recovering articulated non-rigid shapes, motions and kinematic chains from video

A Group-Theoretic Construction with Spatiotemporal Wavelets for the Analysis of Rotational Motion