Abstract
We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from a monocular video. While previous methods rely on either rigging or inverse kinematics (IK) to associate a consistent skeleton with temporally coherent joint rotations, our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used motion representation. At the crux of our approach lies a deep neural network with embedded kinematic priors, which decomposes sequences of 2D joint positions into two separate attributes: a single, symmetric skeleton encoded by bone lengths, and a sequence of 3D joint rotations associated with global root positions and foot contact labels. These attributes are fed into an integrated forward kinematics (FK) layer that outputs 3D positions, which are compared to a ground truth. In addition, an adversarial loss is applied to the velocities of the recovered rotations to ensure that they lie on the manifold of natural joint rotations. The key advantage of our approach is that it learns to infer natural joint rotations directly from the training data rather than assuming an underlying model, or inferring them from joint positions using a data-agnostic IK solver. We show that enforcing a single consistent skeleton along with temporally coherent joint rotations constrains the solution space, leading to a more robust handling of self-occlusions and depth ambiguities.
- Ijaz Akhter and Michael J. Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE Computer Society, Washington, DC.Google Scholar
- Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Video based reconstruction of 3D people models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
- Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3 (July 2005), 408--416. DOI:https://doi.org/10.1145/1073204.1073207Google ScholarDigital Library
- Anurag Arnab, Carl Doersch, and Andrew Zisserman. 2019. Exploiting temporal context for 3D human pose estimation in the wild. In Proceeding of the IEEE/CVF Conference in Computer Vision and Pattern Recognition (CVPR’19). 3395--3404.Google ScholarCross Ref
- Andreas Baak, Meinard Müller, Gaurav Bharaj, Hans-Peter Seidel, and Christian Theobalt. 2011. A data-driven approach for real-time full body pose reconstruction from a depth camera. In Proceedings of the International Conference on Computer Vision (ICCV’11). IEEE Computer Society, 1092--1099. DOI:https://doi.org/10.1109/ICCV.2011.6126356Google ScholarDigital Library
- Didier Bieler, Semih Günel, Pascal Fua, and Helge Rhodin. 2019. Gravity as a Reference for Estimating a Person’s Height from Video. arxiv:cs.CV/1909.02211Google Scholar
- Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, Berlin, Germany, 561--578.Google Scholar
- Ernesto Brau and Hao Jiang. 2016. 3D human pose estimation via deep learning from 2D annotations. In Proceedings of the 2016 4th International Conference on 3D Vision (3DV’16). 582--591. DOI:https://doi.org/10.1109/3DV.2016.84Google ScholarCross Ref
- Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google Scholar
- João Carreira, Pulkit Agrawal, Katerina Fragkiadaki, and Jitendra Malik. 2016. Human pose estimation with iterative error feedback. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 4733--4742.Google ScholarCross Ref
- Ching-Hang Chen and Deva Ramanan. 2017. 3d human pose estimation= 2d pose estimation+ matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 7035--7043.Google ScholarCross Ref
- Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018a. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7103--7112.Google ScholarCross Ref
- Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018b. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7103--7112.Google ScholarCross Ref
- CMU. 2019. CMU Graphics Lab Motion Capture Database. Retrieved from http://mocap.cs.cmu.edu/.Google Scholar
- Rishabh Dabral, Anurag Mundhada, Uday Kusupati, Safeer Afaque, Abhishek Sharma, and Arjun Jain. 2018. Learning 3D human pose from structure and motion. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, 679--696.Google ScholarCross Ref
- Yuzhu Dong, Aishat Aloba, Sachin Paryani, Lisa Anthony, Neha Rana, and Eakta Jain. 2017. Adult2Child: Dynamic scaling laws to create child-like motion. In Proceedings of the 10th International Conference on Motion in Games (MIG’17). ACM, New York, NY, Article 13, 13:1–13:10 pages. DOI:https://doi.org/10.1145/3136457.3136460Google ScholarDigital Library
- Hao-Shu Fang,*Yuanlu Xu,*Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3D pose estimation. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Keith Grochow, Steven L. Martin, Aaron Hertzmann, and Zoran Popoviundefined. 2004. Style-based inverse kinematics. In Proceedings of the ACM SIGGRAPH 2004 Papers (SIGGRAPH’04). Association for Computing Machinery, New York, NY, 522--531. DOI:https://doi.org/10.1145/1186562.1015755Google ScholarDigital Library
- Riza Alp Güler and Iasonas Kokkinos. 2019. HoloPose: Holistic 3D human reconstruction in-the-wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 10876--10886. DOI:https://doi.org/10.1109/CVPR.2019.01114Google ScholarCross Ref
- Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7297--7306. DOI:https://doi.org/10.1109/CVPR.2018.00762Google ScholarCross Ref
- Semih Günel, Helge Rhodin, and Pascal Fua. 2018. What Face and Body Shapes Can Tell Us About Height. arxiv:cs.CV/1805.10355Google Scholar
- Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Gerard Pons-Moll, and Christian Theobalt. 2019. In the wild human pose estimation using explicit 2D features and intermediate 3D representations. arXiv preprint arXiv:1904.03289 (2019).Google Scholar
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969.Google Scholar
- Mir Rayat Imtiaz Hossain and James J. Little. 2018. Exploiting temporal information for 3d human pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 69--86.Google Scholar
- Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V. Gehler, Javier Romero, Ijaz Akhter, and Michael J. Black. 2017. Towards accurate marker-less human shape and pose estimation over time. In Proceedings of the 2017 International Conference on 3D Vision (3DV). IEEE, 421--430.Google Scholar
- Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 7 (July 2014), 1325--1339. DOI:https://doi.org/10.1109/TPAMI.2013.248Google ScholarDigital Library
- Sam Johnson and Mark Everingham. 2010. Clustered pose and nonlinear appearance models for human pose estimation. In Proceedings of the British Machine Vision Conference (BMVC’10). DOI:10.5244/C.24.12Google ScholarCross Ref
- Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 7122--7131. DOI:https://doi.org/10.1109/CVPR.2018.00744Google ScholarCross Ref
- Angjoo Kanazawa, Jason Y. JZhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3D human dynamics from video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
- Isinsu Katircioglu, Bugra Tekin, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2018. Learning latent representations of 3D human pose with deep neural networks. Int. J. Comput. Vision 126, 12 (2018), 1326--1341.Google ScholarDigital Library
- Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, and Kostas Daniilidis. 2019b. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarCross Ref
- Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019a. Convolutional mesh regression for single-image human shape reconstruction. In Proceeding of the IEEE/CVF Conference in Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
- Christoph Lassner, Javier Romero, Martin Kiefel, Federica Bogo, Michael J. Black, and Peter V. Gehler. 2017. Unite the people: Closing the loop between 3D and 2D human representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 4704--4713. DOI:https://doi.org/10.1109/CVPR.2017.500Google Scholar
- Kyoungoh Lee, Inwoong Lee, and Sanghoon Lee. 2018a. Propagating LSTM: 3D pose estimation based on joint interdependency. In Proceedings of the European Conference on Computer Vision (ECCV’18). Springer International Publishing, 123--141.Google ScholarCross Ref
- Kyungho Lee, Seyoung Lee, and Jehee Lee. 2018b. Interactive character animation by learning multi-objective control. ACM Trans. Graph. 37, 6, Article 180 (Dec. 2018), 10 pages. DOI:https://doi.org/10.1145/3272127.3275071Google ScholarDigital Library
- Chen Li and Gim Hee Lee. 2019. Generating multiple hypotheses for 3D human pose estimation with mixture density network. arXiv preprint arXiv:1904.05547 (2019).Google Scholar
- Shuang Liang, Xiao Sun, and Yichen Wei. 2018. Compositional human pose regression. Comput. Vision Image Understanding 176-177 (2018), 1--8.Google Scholar
- Mude Lin, Liang Lin, Xiaodan Liang, Keze Wang, and Hui Cheng. 2017. Recurrent 3d pose sequence machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 810--819.Google ScholarCross Ref
- Yebin Liu, Juergen Gall, Carsten Stoll, Qionghai Dai, Hans-Peter Seidel, and Christian Theobalt. 2013. Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35, 11 (Nov. 2013), 2720--2735. DOI:https://doi.org/10.1109/TPAMI.2013.47Google Scholar
- Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6 (Oct. 2015), 248:1–248:16. DOI:https://doi.org/10.1145/2816795.2818013Google ScholarDigital Library
- Diogo C. Luvizon, David Picard, and Hedi Tabia. 2018. 2D/3D pose estimation and action recognition using multitask deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 5137--5146.Google ScholarCross Ref
- Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3D human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 2659--2668. DOI:https://doi.org/10.1109/ICCV.2017.288Google Scholar
- Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3D human pose estimation in the wild using improved CNN supervision. In Proceedings of the 2017 5th International Conference on 3D Vision (3DV’17). IEEE Computer Society, 506–516.Google ScholarCross Ref
- Dushyant Mehta, Oleksandr Sotnychenko, Franziska Mueller, Weipeng Xu, Mohamed Elgharib, Pascal Fua, Hans-Peter Seidel, Helge Rhodin, Gerard Pons-Moll, and Christian Theobalt. 2019. XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera. arxiv:cs.CV/1907.00837Google Scholar
- Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. 36, 4, Article 44 (July 2017), 44:1–44:14 pages. DOI:https://doi.org/10.1145/3072959.3073596Google ScholarDigital Library
- Francesc Moreno-Noguer. 2017. 3D human pose estimation from a single image via distance matrix regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 1561--1570. DOI:https://doi.org/10.1109/CVPR.2017.170Google ScholarCross Ref
- Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarCross Ref
- George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, and Kevin Murphy. 2017. Towards accurate multi-person pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
- Georgios Pavlakos, Xiaowei Zhou, and Kostas Daniilidis. 2018a. Ordinal depth supervision for 3D human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7307--7316.Google ScholarCross Ref
- Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 1263--1272.Google ScholarCross Ref
- Georgios Pavlakos, Luyang Zhu, Xiaowei Zhou, and Kostas Daniilidis. 2018b. Learning to estimate 3D human pose and shape from a single color image. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
- Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 2019. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
- Dario Pavllo, David Grangier, and Michael Auli. 2018. QuaterNet: A Quaternion-based Recurrent Model for Human Motion. arxiv:cs.CV/1805.06485Google Scholar
- Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, and Sergey Levine. 2018. SFV: Reinforcement learning of physical skills from videos. In SIGGRAPH Asia 2018 Technical Papers. ACM, 178.Google ScholarDigital Library
- Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2012. Reconstructing 3D human pose from 2D image landmarks. In Proceedings of the European Conference on Computer Vision (ECCV’12), Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Springer, Berlin, 573--586.Google ScholarDigital Library
- Helge Rhodin, Mathieu Salzmann, and Pascal Fua. 2018a. Unsupervised geometry-aware representation for 3D human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarCross Ref
- Helge Rhodin, Jörg Spörri, Isinsu Katircioglu, Victor Constantin, Frédéric Meyer, Erich Müller, Mathieu Salzmann, and Pascal Fua. 2018b. Learning monocular 3D human pose estimation from multi-view images. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC.Google ScholarCross Ref
- Nikolaos Sarafianos, Bogdan Boteanu, Bogdan Ionescu, and Ioannis A. Kakadiaris. 2016. 3D human pose estimation: A review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, C (Nov. 2016), 1--20. DOI:https://doi.org/10.1016/j.cviu.2016.09.002Google Scholar
- Toby Sharp. 2012. The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR’12). IEEE Computer Society, Washington, DC, 103--110.Google Scholar
- Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 1 (Jan. 2013), 116--124. DOI:https://doi.org/10.1145/2398356.2398381Google ScholarDigital Library
- Leonid Sigal, Alexandru O. Balan, and Michael J. Black. 2009. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vision 87, 1 (Aug. 5, 2009), 4. DOI:https://doi.org/10.1007/s11263-009-0273-6Google ScholarDigital Library
- Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2017. Learning to fuse 2D and 3D image cues for monocular body pose estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, Washington, DC, 3941--3950.Google ScholarCross Ref
- Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016. Direct prediction of 3D body poses from motion compensated sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 991--1000.Google ScholarCross Ref
- Denis Tome, Chris Russell, and Lourdes Agapito. 2017. Lifting from the deep: Convolutional 3D pose estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE Computer Society, Washington, DC, 2500--2509.Google ScholarCross Ref
- Alexander Toshev and Christian Szegedy. 2014. DeepPose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 1653--1660. DOI:https://doi.org/10.1109/CVPR.2014.214Google ScholarDigital Library
- Ruben Villegas, Jimei Yang, Duygu Ceylan, and Honglak Lee. 2018. Neural kinematic networks for unsupervised motion retargetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 8639--8648.Google ScholarCross Ref
- Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph. 27, 3, Article 97 (Aug. 2008), 97:1–97:9 pages. DOI:https://doi.org/10.1145/1360612.1360696Google ScholarDigital Library
- Keze Wang, Liang Lin, Chenhan Jiang, Chen Qian, and Pengxu Wei. 2019. 3D human pose machines with self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. (2019).Google ScholarCross Ref
- Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). IEEE Computer Society, Washington, DC, 4724--4732. DOI:https://doi.org/10.1109/CVPR.2016.511Google ScholarCross Ref
- Xiaolin Wei, Peizhao Zhang, and Jinxiang Chai. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM Trans. Graph. 31, 6, Article 188 (Nov. 2012), 188:1–188:12 pages. DOI:https://doi.org/10.1145/2366145.2366207Google ScholarDigital Library
- Weipeng Xu, Avishek Chatterjee, Michael Zollhöfer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human performance capture from monocular video. ACM Trans. Graph. 37, 2, Article 27 (May 2018), 27:1–27:15 pages. DOI:https://doi.org/10.1145/3181973Google ScholarDigital Library
- Yuanlu Xu, Song-Chun Zhu, and Tony Tung. 2019. DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare. arxiv:cs.CV/1910.00116Google Scholar
- Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, and Xiaogang Wang. 2018. 3D human pose estimation in the wild by adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). IEEE Computer Society, Washington, DC, 5255--5264. DOI:https://doi.org/10.1109/CVPR.2018.00551Google ScholarCross Ref
- Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 2353--2360. DOI:https://doi.org/10.1109/CVPR.2014.301Google ScholarDigital Library
- Anastasios Yiannakides, Andreas Aristidou, and Yiorgos Chrysanthou. 2019. Real-time 3D human pose and motion reconstruction from monocular RGB videos. Comput. Animat. Virtual Worlds 30, 3–4 (May 2019).Google ScholarCross Ref
- Yusuke Yoshiyasu, Ryusuke Sagawa, Ko Ayusawa, and Akihiko Murai. 2018. Skeleton Transformer Networks: 3D Human Pose and Skinned Mesh from Single RGB Image. arxiv:cs.CV/1812.11328Google Scholar
- Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, and Jitendra Malik. 2019. Predicting 3D human dynamics from video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarCross Ref
- Zerong Zheng, Tao Yu, Yixuan Wei, Qionghai Dai, and Yebin Liu. 2019. DeepHuman: 3D human reconstruction from a single image. In Proceedings of the IEEE International Conference on Computer Vision. 7739–7749.Google ScholarCross Ref
- Xingyi Zhou, Qi-Xing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Towards 3D human pose estimation in the wild: A weakly-supervised approach. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE Computer Society, Washington, DC, 398--407.Google ScholarCross Ref
- Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep kinematic pose regression. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer International Publishing, 186--201.Google ScholarCross Ref
- Xiaowei Zhou, Menglong Zhu, Georgios Pavlakos, Spyridon Leonardos, Konstantinos G. Derpanis, and Kostas Daniilidis. 2018. MonoCap: Monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. 41, 4 (2018), 901--914. DOI:https://doi.org/10.1109/tpami.2018.2816031Google ScholarDigital Library
- Yi Zhou, Connelly Barnes, Lu Jingwan, Yang Jimei, and Li Hao. 2019. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19).Google ScholarCross Ref
Index Terms
- MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency
Recommendations
3D human motion capturing based only on acceleration and angular rate measurement for low extremities
ICDHM'11: Proceedings of the Third international conference on Digital human modelingHuman motion capturing is used in ergonomics for ambulatory assessment of physical workloads in field. This is necessary to investigate the risk of work-related musculoskeletal disorders. Since more than fifteen years the IFA is developing and using the ...
Recovering articulated non-rigid shapes, motions and kinematic chains from video
AMDO'06: Proceedings of the 4th international conference on Articulated Motion and Deformable ObjectsWe propose an approach to analyze and recover articulated motion with non-rigid parts, e.g. the human body motion with non-rigid facial motion, under affine projection from feature trajectories. We model the motion using a set of intersecting subspaces. ...
A Group-Theoretic Construction with Spatiotemporal Wavelets for the Analysis of Rotational Motion
This paper presents a group-theoretic approach for the analysis of rotational motion in image sequences. This method relies on Lie algebras, Lie groups and Lie group representations to provide not only the continuous wavelets but also the related tools ...
Comments