Skip to main content
Log in

Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

An Author Correction to this article was published on 10 March 2018

This article has been updated

Abstract

Pose estimation, tracking, and action recognition of articulated objects from depth images are important and challenging problems, which are normally considered separately. In this paper, a unified paradigm based on Lie group theory is proposed, which enables us to collectively address these related problems. Our approach is also applicable to a wide range of articulated objects. Empirically it is evaluated on lab animals including mouse and fish, as well as on human hand. On these applications, it is shown to deliver competitive results compared to the state-of-the-arts, and non-trivial baselines including convolutional neural networks and regression forest methods. Moreover, new sets of annotated depth data of articulated objects are created which, together with our code, are made publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Change history

  • 10 March 2018

    The original author list did not accurately reflect the contributions of the following colleagues.

Notes

  1. Our datasets, code, and detailed information pertaining to the project can be found at a dedicated project webpage http://web.bii.a-star.edu.sg/~xuchi/Lie-X.html.

  2. The NYU dataset is publicly available at http://cims.nyu.edu/~tompson/NYU_Hand_Pose_Dataset.htm.

References

  • Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Transanction on PAMI 28(1), 44–58.

  • Ali, K., Fleuret, F., Hasler, D., & Fua, P. (2009). Joint pose estimator and feature learning for object detection. In ICCV.

  • Altafini, C. (2000). Nonlinear control in year 2000, chap. The De Casteljau algorithm on SE(3) (pp. 1–12). Springer, Berlin.

  • Andriluka, M., Roth, S., & Schiele, B. (2008). People-tracking-by-detection and people-detection-by-tracking. In CVPR.

  • Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M., Pfau, D., Schaul, T., Shillingford, B., & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent (pp. 1–50).

  • Arnol’d, V. I. (2013). Mathematical methods of classical mechanics. Berlin: Springer.

    Google Scholar 

  • Ballan, L., Taneja, A., Gall, J., Gool, L.V., & Pollefeys, M. (2012). Motion capture of hands in action using discriminative salient points. In ECCV.

  • Barsoum, E. (2016). Articulated hand pose estimation review. arXiv:1604.06195.

  • Bookstein, F. (1977). The study of shape transformation after D’Arcy Thompson. Mathematical Biosciences, 34(3–4), 177–219.

  • Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3D human pose annotations. In ICCV.

  • Branson, K., & Belongie, S. (2005). Tracking multiple mouse contours (without too many samples). In CVPR.

  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

    Article  MATH  Google Scholar 

  • Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In ICML.

  • Chen, L., Wei, H., & Ferryman, J. (2013). A survey on model based approaches for 2D and 3D visual human pose recovery. PRL, 34(15), 1995–2006.

    Article  Google Scholar 

  • Dollar, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In IEEE Workshop on PETS.

  • Dollar, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In CVPR.

  • Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.

    Article  Google Scholar 

  • Fleuret, F., & Geman, D. (2008). Stationary features and cat detection. JMLR, 9, 2549–2578.

    MathSciNet  MATH  Google Scholar 

  • Gall, J., Yao, A., Razavi, N., van Gool, L., & Lempitsky, V. (2011). Hough forests for object detection, tracking, and action recognition. IEEE Transactions on PAMI, 33(11), 2188–2202.

    Article  Google Scholar 

  • Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., & Navab, N. (2010). Dominant orientation templates for real-time detection of textureless objects. In CVPR.

  • Hough, P. (1959). Machine analysis of bubble chamber pictures. In Proceedings of International Conference on High Energy Accelerators and Instrumentation.

  • Hsu, E. P. (2002). Stochastic analysis on manifolds. New York: AMS press.

    Book  MATH  Google Scholar 

  • Huang, C., Allain, B., Franco, J., Navab, N., & Boyer, E. (2016). Volumetric 3D tracking by detection. In CVPR.

  • Isard, M., & Blake, A. (1998). Condensation—Conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1), 5–28.

  • Kalueff, A., Gebhardt, M., Stewart, A., Cachat, J., Brimmer, M., Chawla, J., et al. (2013). Towards a comprehensive catalog of zebrafish behavior 1.0 and beyond. Zebrafish, 10(1), 70–86.

    Article  Google Scholar 

  • Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR.

  • Lee, J. (2003). Introduction to smooth manifolds. Berlin: Springer.

    Book  Google Scholar 

  • Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model (pp. 17–32). In ECCV workshop on statistical learning in computer vision.

  • Mahasseni, B., & Todorovic, S. (2016). Regularizing long short term memory with 3D human-skeleton sequences for action recognition. In CVPR.

  • Manton, J. (2013). A primer on stochastic differential geometry for signal processing. IEEE Journal of Selected Topics in Signal Processing, 7(4), 681–699.

    Article  Google Scholar 

  • Mikic, I., Trivedi, M. M., Hunter, E., & Cosman, P. C. (2003). Human body model acquisition and tracking using voxel data. International Journal of Computer Vision, 53(3), 199–223.

    Article  MATH  Google Scholar 

  • Murray, R., Sastry, S., & Li, Z. (1994). A mathematical introduction to robotic manipulation. boca raton: CRC Press.

    MATH  Google Scholar 

  • Nie, X., Xiong, C., & Zhu, S. (2015). Joint action recognition and pose estimation from video. In CVPR.

  • Oberweger, M., Wohlhart, P., & Lepetit, V. (2015a). Hands deep in deep learning for hand pose estimation. In Computer Vision Winter Workshop.

  • Oberweger, M., Wohlhart, P., & Lepetit, V. (2015b). Training a feedback loop for hand pose estimation. In ICCV.

  • Oikonomidis, N., & Argyros, A. (2011). Efficient model-based 3D tracking of hand articulations using Kinect. In BMVC.

  • Perez-Sala, X., Escalera, S., Angulo, C., & Gonzalez, J. (2014). Survey of human motion analysis using depth imagery. Sensors, 14, 4189–4210.

    Article  Google Scholar 

  • Poppe, R. (2007). Vision-based human motion analysis: An overview. Computer Vision and Image Understanding, 108(1–2), 4–18.

  • Procesi, C. (2007). Lie groups: An approach through invariants and representations. Berlin: Springer.

    MATH  Google Scholar 

  • Qian, C., Sun, X., Wei, Y., Tang, X., & Sun, J. (2014). Realtime and robust hand tracking from depth. In CVPR.

  • Rahmani, H., & Mian, A. (2016). 3D action recognition from novel viewpoints. In CVPR.

  • Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., et al. (2013). Efficient human pose estimation from single depth images. IEEE TPAMI, 35(12), 2821–40.

    Article  Google Scholar 

  • Sinha, A., Choi, C., & Ramani, K. (2016). Deephand: Robust hand pose estimation by completing a matrix imputed with deep features. In CVPR.

  • Srivastava, A., Turaga, P., & Kurtek, S. (2012). On advances in differential-geometric approaches for 2D and 3D shape analyses and activity recognition. Image Vision Computing, 30(6–7), 398–416.

    Article  Google Scholar 

  • Sun, X., Wei, Y., Liang, S., Tang, X., & Sun, J. (2015). Cascaded hand pose regression. In CVPR.

  • Tan, D., Cashman, T., Taylor, J., Fitzgibbon, A., Tarlow, D., Khamis, S., Izadi, S., & Shotton, J. (2016). Fits like a glove: Rapid and reliable hand shape personalization. In CVPR.

  • Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T., & Shotton, J. (2015). Opening the black box: Hierarchical sampling optimization for estimating human hand pose. In ICCV.

  • Tompson, J., Jain, A., LeCun, Y., & Bregler, C. (2014). Joint training of a convolutional network and a graphical model for human pose estimation. In NIPS.

  • Tompson, J., Stein, M., Lecun, Y., & Perlin, K. (2014). Real-time continuous pose recovery of human hands using convolutional networks. SIGGRAPH.

  • Tuzel, O., Porikli, F., & Meer, P. (2008). Learning on Lie groups for invariant detection and tracking. In CVPR.

  • Vemulapalli, R., Arrate, F., & Chellappa, R. (2014). Human action recognition by representing 3D skeletons as points in a Lie group. In CVPR.

  • Vemulapalli, R., & Chellappa, R. (2016). Rolling rotations for recognizing human actions from 3D skeletal data. In CVPR.

  • Wiltschko, A., Johnson, M., Iurilli, G., Peterson, R., Katon, J., Pashkovski, S., et al. (2015). Mapping sub-second structure in mouse behavior. Neuron, 88(6), 1121–35.

    Article  Google Scholar 

  • Xiong, X., & la Torre, F.D. (2013). Supervised descent method and its applications to face alignment. In CVPR.

  • Xu, C., & Cheng, L. (2013). Efficient hand pose estimation from a single depth image. In ICCV.

  • Xu, C., Nanjappa, A., Zhang, X., & Cheng, L. (2015). Estimate hand poses efficiently from single depth images. International Journal of Computer Vision, 1–25.

  • Yang, Y., & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. In CVPR.

  • Zhou, X., Wan, Q., Zhang, W., Xue, X. & Wei, Y. (2016). Model-based deep hand pose estimation. In IJCAI.

Download references

Acknowledgements

The project is partially supported by A*STAR JCO Grants 1431AFG120 and 15302FG149. Mouse and fish images are acquired with the help of Zoe Bichler, James Stewart, Suresh Jesuthasan, and Adam Claridge-Chang. Zilong Wang helps with the annotation of mouse data, while Wei Gao and Ashwin Nanjappa help with implementing the mouse baseline method.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Cheng.

Additional information

Communicated by V. Lepetit.

A correction to this article is available online at https://doi.org/10.1007/s11263-018-1069-3.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, C., Govindarajan, L.N., Zhang, Y. et al. Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups. Int J Comput Vis 123, 454–478 (2017). https://doi.org/10.1007/s11263-017-0998-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-017-0998-6

Keywords

Navigation