skip to main content
research-article

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

Authors Info & Claims
Published:04 September 2020Publication History
Skip Abstract Section

Abstract

We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from a monocular video. While previous methods rely on either rigging or inverse kinematics (IK) to associate a consistent skeleton with temporally coherent joint rotations, our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used motion representation. At the crux of our approach lies a deep neural network with embedded kinematic priors, which decomposes sequences of 2D joint positions into two separate attributes: a single, symmetric skeleton encoded by bone lengths, and a sequence of 3D joint rotations associated with global root positions and foot contact labels. These attributes are fed into an integrated forward kinematics (FK) layer that outputs 3D positions, which are compared to a ground truth. In addition, an adversarial loss is applied to the velocities of the recovered rotations to ensure that they lie on the manifold of natural joint rotations. The key advantage of our approach is that it learns to infer natural joint rotations directly from the training data rather than assuming an underlying model, or inferring them from joint positions using a data-agnostic IK solver. We show that enforcing a single consistent skeleton along with temporally coherent joint rotations constrains the solution space, leading to a more robust handling of self-occlusions and depth ambiguities.

References

  1. Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE Computer Society, Washington, DC.Google ScholarGoogle Scholar
  2. Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video based reconstruction of 3D people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarGoogle ScholarCross RefCross Ref
  3. Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3 (July 2005), 408--416. DOI:https://doi.org/10.1145/1073204.1073207Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Anurag Arnab, Carl Doersch, and Andrew Zisserman. 2019. Exploiting temporal context for 3D human pose estimation in the wild. In Proceeding of the IEEE/CVF Conference in Computer Vision and Pattern Recognition (CVPR’19). 3395--3404.Google ScholarGoogle ScholarCross RefCross Ref
  5. Andreas Baak, Meinard Müller, Gaurav Bharaj, Hans-Peter Seidel, and Christian Theobalt. 2011. A data-driven approach for real-time full body pose reconstruction from a depth camera. In Proceedings of the International Conference on Computer Vision (ICCV’11). IEEE Computer Society, 1092--1099. DOI:https://doi.org/10.1109/ICCV.2011.6126356Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Didier Bieler, Semih Günel, Pascal Fua, and Helge Rhodin. 2019. Gravity as a Reference for Estimating a Person’s Height from Video. arxiv:cs.CV/1909.02211Google ScholarGoogle Scholar
  7. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, Berlin, Germany, 561--578.Google ScholarGoogle Scholar
  8. Ernesto Brau and Hao Jiang. 2016. 3D human pose estimation via deep learning from 2D annotations. In Proceedings of the 2016 4th International Conference on 3D Vision (3DV’16). 582--591. DOI:https://doi.org/10.1109/3DV.2016.84Google ScholarGoogle ScholarCross RefCross Ref
  9. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarGoogle Scholar
  10. João Carreira, Pulkit Agrawal, Katerina Fragkiadaki, and Jitendra Malik. 2016. Human pose estimation with iterative error feedback. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 4733--4742.Google ScholarGoogle ScholarCross RefCross Ref
  11. Ching-Hang Chen and Deva Ramanan. 2017. 3d human pose estimation= 2d pose estimation+ matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 7035--7043.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018a. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7103--7112.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018b. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7103--7112.Google ScholarGoogle ScholarCross RefCross Ref
  14. CMU. 2019. CMU Graphics Lab Motion Capture Database. Retrieved from http://mocap.cs.cmu.edu/.Google ScholarGoogle Scholar
  15. Rishabh Dabral, Anurag Mundhada, Uday Kusupati, Safeer Afaque, Abhishek Sharma, and Arjun Jain. 2018. Learning 3D human pose from structure and motion. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, 679--696.Google ScholarGoogle ScholarCross RefCross Ref
  16. Yuzhu Dong, Aishat Aloba, Sachin Paryani, Lisa Anthony, Neha Rana, and Eakta Jain. 2017. Adult2Child: Dynamic scaling laws to create child-like motion. In Proceedings of the 10th International Conference on Motion in Games (MIG’17). ACM, New York, NY, Article 13, 13:1–13:10 pages. DOI:https://doi.org/10.1145/3136457.3136460Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hao-Shu Fang,*Yuanlu Xu,*Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3D pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  18. Keith Grochow, Steven L. Martin, Aaron Hertzmann, and Zoran Popoviundefined. 2004. Style-based inverse kinematics. In Proceedings of the ACM SIGGRAPH 2004 Papers (SIGGRAPH’04). Association for Computing Machinery, New York, NY, 522--531. DOI:https://doi.org/10.1145/1186562.1015755Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Riza Alp Güler and Iasonas Kokkinos. 2019. HoloPose: Holistic 3D human reconstruction in-the-wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 10876--10886. DOI:https://doi.org/10.1109/CVPR.2019.01114Google ScholarGoogle ScholarCross RefCross Ref
  20. Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7297--7306. DOI:https://doi.org/10.1109/CVPR.2018.00762Google ScholarGoogle ScholarCross RefCross Ref
  21. Semih Günel, Helge Rhodin, and Pascal Fua. 2018. What Face and Body Shapes Can Tell Us About Height. arxiv:cs.CV/1805.10355Google ScholarGoogle Scholar
  22. Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Gerard Pons-Moll, and Christian Theobalt. 2019. In the wild human pose estimation using explicit 2D features and intermediate 3D representations. arXiv preprint arXiv:1904.03289 (2019).Google ScholarGoogle Scholar
  23. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969.Google ScholarGoogle Scholar
  24. Mir Rayat Imtiaz Hossain and James J. Little. 2018. Exploiting temporal information for 3d human pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 69--86.Google ScholarGoogle Scholar
  25. Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V. Gehler, Javier Romero, Ijaz Akhter, and Michael J. Black. 2017. Towards accurate marker-less human shape and pose estimation over time. In Proceedings of the 2017 International Conference on 3D Vision (3DV). IEEE, 421--430.Google ScholarGoogle Scholar
  26. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7 (July 2014), 1325--1339. DOI:https://doi.org/10.1109/TPAMI.2013.248Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sam Johnson and Mark Everingham. 2010. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference (BMVC’10). DOI:10.5244/C.24.12Google ScholarGoogle ScholarCross RefCross Ref
  28. Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7122--7131. DOI:https://doi.org/10.1109/CVPR.2018.00744Google ScholarGoogle ScholarCross RefCross Ref
  29. Angjoo Kanazawa, Jason Y. JZhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3D human dynamics from video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  30. Isinsu Katircioglu, Bugra Tekin, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2018. Learning latent representations of 3D human pose with deep neural networks. Int. J. Comput. Vision 126, 12 (2018), 1326--1341.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019b. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarGoogle ScholarCross RefCross Ref
  32. Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019a. Convolutional mesh regression for single-image human shape reconstruction. In Proceeding of the IEEE/CVF Conference in Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref
  33. Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 4704--4713. DOI:https://doi.org/10.1109/CVPR.2017.500Google ScholarGoogle Scholar
  34. Kyoungoh Lee, Inwoong Lee, and Sanghoon Lee. 2018a. Propagating LSTM: 3D pose estimation based on joint interdependency. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer International Publishing, 123--141.Google ScholarGoogle ScholarCross RefCross Ref
  35. Kyungho Lee, Seyoung Lee, and Jehee Lee. 2018b. Interactive character animation by learning multi-objective control. ACM Trans. Graph. 37, 6, Article 180 (Dec. 2018), 10 pages. DOI:https://doi.org/10.1145/3272127.3275071Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Chen Li and Gim Hee Lee. 2019. Generating multiple hypotheses for 3D human pose estimation with mixture density network. arXiv preprint arXiv:1904.05547 (2019).Google ScholarGoogle Scholar
  37. Shuang Liang, Xiao Sun, and Yichen Wei. 2018. Compositional human pose regression. Comput. Vision Image Understanding 176-177 (2018), 1--8.Google ScholarGoogle Scholar
  38. Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, and Hui Cheng. 2017. Recurrent 3d pose sequence machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 810--819.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yebin Liu, Juergen Gall, Carsten Stoll, Qionghai Dai, Hans-Peter Seidel, and Christian Theobalt. 2013. Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (Nov. 2013), 2720--2735. DOI:https://doi.org/10.1109/TPAMI.2013.47Google ScholarGoogle Scholar
  40. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6 (Oct. 2015), 248:1–248:16. DOI:https://doi.org/10.1145/2816795.2818013Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Diogo C. Luvizon, David Picard, and Hedi Tabia. 2018. 2D/3D pose estimation and action recognition using multitask deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 5137--5146.Google ScholarGoogle ScholarCross RefCross Ref
  42. Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3D human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2659--2668. DOI:https://doi.org/10.1109/ICCV.2017.288Google ScholarGoogle Scholar
  43. Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3D human pose estimation in the wild using improved CNN supervision. In Proceedings of the 2017 5th International Conference on 3D Vision (3DV’17). IEEE Computer Society, 506–516.Google ScholarGoogle ScholarCross RefCross Ref
  44. Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, and Christian Theobalt. 2019. XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera. arxiv:cs.CV/1907.00837Google ScholarGoogle Scholar
  45. Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4, Article 44 (July 2017), 44:1–44:14 pages. DOI:https://doi.org/10.1145/3072959.3073596Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Francesc Moreno-Noguer. 2017. 3D human pose estimation from a single image via distance matrix regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 1561--1570. DOI:https://doi.org/10.1109/CVPR.2017.170Google ScholarGoogle ScholarCross RefCross Ref
  47. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarGoogle ScholarCross RefCross Ref
  48. George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC.Google ScholarGoogle ScholarCross RefCross Ref
  49. Georgios Pavlakos, Xiaowei Zhou, and Kostas Daniilidis. 2018a. Ordinal depth supervision for 3D human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7307--7316.Google ScholarGoogle ScholarCross RefCross Ref
  50. Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 1263--1272.Google ScholarGoogle ScholarCross RefCross Ref
  51. Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018b. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarGoogle ScholarCross RefCross Ref
  52. Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE Computer Society, Washington, DC.Google ScholarGoogle ScholarCross RefCross Ref
  53. Dario Pavllo, David Grangier, and Michael Auli. 2018. QuaterNet: A Quaternion-based Recurrent Model for Human Motion. arxiv:cs.CV/1805.06485Google ScholarGoogle Scholar
  54. Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018. SFV: Reinforcement learning of physical skills from videos. In SIGGRAPH Asia 2018 Technical Papers. ACM, 178.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2012. Reconstructing 3D human pose from 2D image landmarks. In Proceedings of the European Conference on Computer Vision (ECCV’12), Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Springer, Berlin, 573--586.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Helge Rhodin, Mathieu Salzmann, and Pascal Fua. 2018a. Unsupervised geometry-aware representation for 3D human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarGoogle ScholarCross RefCross Ref
  57. Helge Rhodin, Jörg Spörri, Isinsu Katircioglu, Victor Constantin, Frédéric Meyer, Erich Müller, Mathieu Salzmann, and Pascal Fua. 2018b. Learning monocular 3D human pose estimation from multi-view images. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarGoogle ScholarCross RefCross Ref
  58. Nikolaos Sarafianos, Bogdan Boteanu, Bogdan Ionescu, and Ioannis A. Kakadiaris. 2016. 3D human pose estimation: A review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, C (Nov. 2016), 1--20. DOI:https://doi.org/10.1016/j.cviu.2016.09.002Google ScholarGoogle Scholar
  59. Toby Sharp. 2012. The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR’12). IEEE Computer Society, Washington, DC, 103--110.Google ScholarGoogle Scholar
  60. Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 1 (Jan. 2013), 116--124. DOI:https://doi.org/10.1145/2398356.2398381Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Leonid Sigal, Alexandru O. Balan, and Michael J. Black. 2009. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vision 87, 1 (Aug. 5, 2009), 4. DOI:https://doi.org/10.1007/s11263-009-0273-6Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2D and 3D image cues for monocular body pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, Washington, DC, 3941--3950.Google ScholarGoogle ScholarCross RefCross Ref
  63. Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 991--1000.Google ScholarGoogle ScholarCross RefCross Ref
  64. Denis Tome, Chris Russell, and Lourdes Agapito. 2017. Lifting from the deep: Convolutional 3D pose estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 2500--2509.Google ScholarGoogle ScholarCross RefCross Ref
  65. Alexander Toshev and Christian Szegedy. 2014. DeepPose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 1653--1660. DOI:https://doi.org/10.1109/CVPR.2014.214Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Ruben Villegas, Jimei Yang, Duygu Ceylan, and Honglak Lee. 2018. Neural kinematic networks for unsupervised motion retargetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 8639--8648.Google ScholarGoogle ScholarCross RefCross Ref
  67. Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 3, Article 97 (Aug. 2008), 97:1–97:9 pages. DOI:https://doi.org/10.1145/1360612.1360696Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Keze Wang, Liang Lin, Chenhan Jiang, Chen Qian, and Pengxu Wei. 2019. 3D human pose machines with self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2019).Google ScholarGoogle ScholarCross RefCross Ref
  69. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 4724--4732. DOI:https://doi.org/10.1109/CVPR.2016.511Google ScholarGoogle ScholarCross RefCross Ref
  70. Xiaolin Wei, Peizhao Zhang, and Jinxiang Chai. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM Trans. Graph. 31, 6, Article 188 (Nov. 2012), 188:1–188:12 pages. DOI:https://doi.org/10.1145/2366145.2366207Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human performance capture from monocular video. ACM Trans. Graph. 37, 2, Article 27 (May 2018), 27:1–27:15 pages. DOI:https://doi.org/10.1145/3181973Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019. DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare. arxiv:cs.CV/1910.00116Google ScholarGoogle Scholar
  73. Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, and Xiaogang Wang. 2018. 3D human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 5255--5264. DOI:https://doi.org/10.1109/CVPR.2018.00551Google ScholarGoogle ScholarCross RefCross Ref
  74. Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 2353--2360. DOI:https://doi.org/10.1109/CVPR.2014.301Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Anastasios Yiannakides, Andreas Aristidou, and Yiorgos Chrysanthou. 2019. Real-time 3D human pose and motion reconstruction from monocular RGB videos. Comput. Animat. Virtual Worlds 30, 3–4 (May 2019).Google ScholarGoogle ScholarCross RefCross Ref
  76. Yusuke Yoshiyasu, Ryusuke Sagawa, Ko Ayusawa, and Akihiko Murai. 2018. Skeleton Transformer Networks: 3D Human Pose and Skinned Mesh from Single RGB Image. arxiv:cs.CV/1812.11328Google ScholarGoogle Scholar
  77. Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, and Jitendra Malik. 2019. Predicting 3D human dynamics from video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarGoogle ScholarCross RefCross Ref
  78. Zerong Zheng, Tao Yu, Yixuan Wei, Qionghai Dai, and Yebin Liu. 2019. DeepHuman: 3D human reconstruction from a single image. In Proceedings of the IEEE International Conference on Computer Vision. 7739–7749.Google ScholarGoogle ScholarCross RefCross Ref
  79. Xingyi Zhou, Qi-Xing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, Washington, DC, 398--407.Google ScholarGoogle ScholarCross RefCross Ref
  80. Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep kinematic pose regression. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer International Publishing, 186--201.Google ScholarGoogle ScholarCross RefCross Ref
  81. Xiaowei Zhou, Menglong Zhu, Georgios Pavlakos, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2018. MonoCap: Monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (2018), 901--914. DOI:https://doi.org/10.1109/tpami.2018.2816031Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Yi Zhou, Connelly Barnes, Lu Jingwan, Yang Jimei, and Li Hao. 2019. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 40, Issue 1
        February 2021
        139 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/3420236
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 September 2020
        • Accepted: 1 June 2020
        • Revised: 1 March 2020
        • Received: 1 October 2019
        Published in tog Volume 40, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format