skip to main content
research-article

Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences

Published:11 July 2016Publication History
Skip Abstract Section

Abstract

Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efficiency of current systems has prevented widespread adoption. Today's dominant paradigm uses machine learning for initialization and recovery followed by iterative model-fitting optimization to achieve a detailed pose fit. We follow this paradigm, but make several changes to the model-fitting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per fitting iteration, we find a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model fitting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efficient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.

Skip Supplemental Material Section

Supplemental Material

a143.mp4

mp4

386.7 MB

References

  1. 3Gear Systems Inc, 2013. Gesture recognizer. http://threegear.com, Jan.Google ScholarGoogle Scholar
  2. Athitsos, V., and Sclaroff, S. 2003. Estimating 3D hand pose from a cluttered image. In Proc. CVPR, vol. 2, II--432.Google ScholarGoogle Scholar
  3. Ballan, L., Taneja, A., Gall, J., Gool, L. V., and Pollefeys, M. 2012. Motion capture of hands in action using discriminative salient points. In Proc. ECCV, 640--653. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bray, M., Koller-Meier, E., and Van Gool, L. 2004. Smart particle filtering for 3D hand tracking. In Proc. Automatic Face and Gesture Recognition, 675--680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. de La Gorce, M., Fleet, D. J., and Paragios, N. 2011. Model-Based 3D Hand Pose Estimation from Monocular Video. IEEE Trans. PAMI 33, 9, 1793--1805. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dipietro, L., Sabatini, A. M., and Dario, P. 2008. A survey of glove-based systems and their applications. IEEE Trans. Systems, Man, and Cybernetics, Part C: Applications and Reviews 38, 4, 461--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D., and Twombly, X. 2007. Vision-based hand pose estimation: A review. CVIU 108, 1-2, 52--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fleishman, S., Kliger, M., Lerner, A., and Kutliroff, G. 2015. ICPIK: Inverse kinematics based articulated-ICP. In Proc. CVPR Workshops, 28--35.Google ScholarGoogle Scholar
  9. Geman, S., and McClure, D. E. 1987. Statistical methods for tomographic image reconstruction. Bulletin of the International Statistical Institute 52, 4, 5--21.Google ScholarGoogle Scholar
  10. Guzmán-Rivera, A., Kohli, P., Glocker, B., Shotton, J., Sharp, T., Fitzgibbon, A. W., and Izadi, S. 2014. Multi-output learning for camera relocalization. In Proc. CVPR, 1114--1121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Heap, T., and Hogg, D. 1996. Towards 3D hand tracking using a deformable model. In Proc. Automatic Face and Gesture Recognition, 140--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel Corporation, 2016. RealSense SDK. http://software.intel.com/realsense, Jan.Google ScholarGoogle Scholar
  13. Jacobson, A., Deng, Z., Kavan, L., and Lewis, J. 2014. Skinning: Real-time shape deformation. In ACM SIGGRAPH 2014 Courses, #24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Keskin, C., Kiraç, F., Kara, Y. E., and Akarun, L. 2012. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Proc. ECCV, 852--863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., and Fitzgibbon, A. 2015. Learning an efficient model of hand shape variation from depth images. In Proc. CVPR, 2540--2548.Google ScholarGoogle Scholar
  16. Kim, D., Hilliges, O., Izadi, S., Butler, A. D., Chen, J., Oikonomidis, I., and Olivier, P. 2012. Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor. In Proc. UIST, 167--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Krupka, E., Bar Hillel, A., Klein, B., Vinnikov, A., Freedman, D., and Stachniak, S. 2014. Discriminative ferns ensemble for hand pose recognition. In Proc. CVPR, 3670--3677. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Leap Motion Inc, 2013. Motion Controller. http://leapmotion.com/product, Jan.Google ScholarGoogle Scholar
  19. Leap Motion Inc, 2015. Orion. http://developer.leapmotion.com/orion, Feb.Google ScholarGoogle Scholar
  20. Li, P., Ling, H., Li, X., and Liao, C. 2015. 3D hand pose estimation using randomized decision forest with segmentation index points. In Proc. ICCV, 819--827. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Loop, C. T. 1987. Smooth Subdivision Surfaces Based on Triangles. Master's thesis, University of Utah.Google ScholarGoogle Scholar
  22. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., and Black, M. J. 2015. SMPL: a skinned multi-person linear model. ACM Trans. Graphics 34, 6, #248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Makris, A., Kyriazis, N., and Argyros, A. 2015. Hierarchical particle filtering for 3D hand tracking. In Proc. CVPR Workshops, 8--17.Google ScholarGoogle Scholar
  24. Melax, S., Keselman, L., and Orsten, S. 2013. Dynamics based 3D skeletal hand tracking. In Proceedings of the 2013 Graphics Interface Conference, 63--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mitchell, D. P. 1991. Spectrally optimal sampling for distribution ray tracing. In Proc. SIGGRAPH, 157--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Monnai, Y., Hasegawa, K., Fujiwara, M., Yoshino, K., Inoue, S., and Shinoda, H. 2014. HaptoMime: Mid-air haptic interaction with a floating virtual screen. In Proc. UIST, 663--667. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Neverova, N., Wolf, C., Nebout, F., and Taylor, G. 2015. Hand pose estimation through weakly-supervised learning of a rich intermediate representation. arXiv preprint 1511.06728.Google ScholarGoogle Scholar
  28. Oberweger, M., Wohlhart, P., and Lepetit, V. 2015. Training a feedback loop for hand pose estimation. In Proc. ICCV, 3316--3324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Oikonomidis, I., Kyriazis, N., and Argyros, A. 2011. Efficient model-based 3D tracking of hand articulations using Kinect. In Proc. BMVC, 101.1--101.11.Google ScholarGoogle Scholar
  30. Poier, G., Roditakis, K., Schulter, S., Michel, D., Bischof, H., and Argyros, A. A. 2015. Hybrid one-shot 3D hand pose estimation by exploiting uncertainties. In Proc. BMVC, 182.1--182.14.Google ScholarGoogle Scholar
  31. Qian, C., Sun, X., Wei, Y., Tang, X., and Sun, J. 2014. Realtime and robust hand tracking from depth. In Proc. CVPR, 1106--1113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rehg, J. M., and Kanade, T. 1994. Visual tracking of high DOF articulated structures: an application to human hand tracking. In Proc. ECCV, 35--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sharp, T., Keskin, C., Robertson, D., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A., and Izadi, S. 2015. Accurate, robust, and flexible realtime hand tracking. In Proc. CHI, 3633--3642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from a single depth image. In Proc. CVPR, 1297--1304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shotton, J., Sharp, T., Kohli, P., Nowozin, S., Winn, J., and Criminisi, A. 2013. Decision jungles: Compact and rich models for classification. In NIPS, 234--242.Google ScholarGoogle Scholar
  36. Sridhar, S., Oulasvirta, A., and Theobalt, C. 2013. Interactive markerless articulated hand motion tracking using RGB and depth data. In Proc. ICCV, 2456--2463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sridhar, S., Rhodin, H., Seidel, H.-P., Oulasvirta, A., and Theobalt, C. 2014. Real-time hand tracking using a sum of anisotropic Gaussians model. In Proc. 3DV, 319--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sridhar, S., Mueller, F., Oulasvirta, A., and Theobalt, C. 2015. Fast and robust hand tracking using detection-guided optimization. In Proc. CVPR, 3213--3221.Google ScholarGoogle Scholar
  39. Stenger, B., Mendonça, P. R., and Cipolla, R. 2001. Model-based 3D tracking of an articulated hand. In Proc. CVPR, vol. 2, II--310.Google ScholarGoogle Scholar
  40. Sun, X., Wei, Y., Liang, S., Tang, X., and Sun, J. 2015. Cascaded hand pose regression. In Proc. CVPR, 824--832.Google ScholarGoogle Scholar
  41. Tagliasacchi, A., Schröder, M., Tkach, A., Bouaziz, S., Botsch, M., and Pauly, M. 2015. Robust articulated-ICP for real-time hand tracking. Computer Graphics Forum 34, 5, 101--114.Google ScholarGoogle ScholarCross RefCross Ref
  42. Tan, D. J., Cashman, T., Taylor, J., Fitzgibbon, A., Tarlow, D., Khamis, S., Izadi, S., and Shotton, J. 2016. Fits like a glove: Rapid and reliable hand shape personalization. In Proc. CVPR.Google ScholarGoogle Scholar
  43. Tang, D., Yu, T.-H., and Kim, T.-K. 2013. Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In Proc. ICCV, 3224--3231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T.-K., and Shotton, J. 2015. Opening the black box: Hierarchical sampling optimization for estimating human hand pose. In Proc. ICCV, 3325--3333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Taylor, J., Shotton, J., Sharp, T., and Fitzgibbon, A. 2012. The Vitruvian Manifold: Inferring dense correspondences for one-shot human pose estimation. In Proc. CVPR, 103--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Taylor, J., Stebbing, R., Ramakrishna, V., Keskin, C., Shotton, J., Izadi, S., Hertzmann, A., and Fitzgibbon, A. 2014. User-specific hand modeling from monocular depth sequences. In Proc. CVPR, 644--651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tejani, A., Tang, D., Kouskouridas, R., and Kim, T.-K. 2014. Latent-class Hough forests for 3D object detection and pose estimation. In Proc. ECCV, 462--477.Google ScholarGoogle Scholar
  48. Tompson, J., Stein, M., Lecun, Y., and Perlin, K. 2014. Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graphics 33, 5, #169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Triggs, W., McLauchlan, P., Hartley, R., and Fitzgibbon, A. 2000. Bundle adjustment --- A modern synthesis. In Vision Algorithms: Theory and Practice, LNCS. 298--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., and Gall, J. 2015. Capturing hands in action using discriminative salient points and physics simulation. arXiv preprint 1506.02178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ultrahaptics Ltd, 2013. Haptics System. http://ultrahaptics.com, Jan. Valentin, J., Dai, A., Niessner, M., Kohli, P., Torr, P., Izadi, S., and Keskin, C. 2016. Learning to navigate the energy landscape. arXiv preprint 1603.05772.Google ScholarGoogle Scholar
  52. Vicente, S., and Agapito, L. 2013. Balloon shapes: reconstructing and deforming objects with volume from images. In Proc. 3DV, 223--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wang, R. Y., and Popović, J. 2009. Real-time hand-tracking with a color glove. ACM Trans. Graphics 28, 3, #63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wang, R., Paris, S., and Popović, J. 2011. 6D hands. In Proc. UIST, 549--558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wang, Y., Min, J., Zhang, J., Liu, Y., Xu, F., Dai, Q., and Chai, J. 2013. Video-based hand manipulation capture through composite motion control. ACM Trans. Graphics 32, 4 (July), 43:1--43:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Wu, Y., and Huang, T. S. 2000. View-independent recognition of hand postures. In Proc. CVPR, vol. 2, 88--94.Google ScholarGoogle Scholar
  57. Wu, Y., Lin, J. Y., and Huang, T. S. 2001. Capturing natural hand articulation. In Proc. ICCV, vol. 2, 426--432.Google ScholarGoogle Scholar
  58. Xu, C., and Cheng, L. 2013. Efficient hand pose estimation from a single depth image. In Proc. ICCV, 3456--3462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zach, C. 2014. Robust bundle adjustment revisited. In Proc. ECCV, 772--787.Google ScholarGoogle ScholarCross RefCross Ref
  60. Zhao, W., Chai, J., and Xu, Y.-Q. 2012. Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data. In Proc. Symposium on Computer Animation, 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Graphics
        ACM Transactions on Graphics  Volume 35, Issue 4
        July 2016
        1396 pages
        ISSN:0730-0301
        EISSN:1557-7368
        DOI:10.1145/2897824
        Issue’s Table of Contents

        Copyright © 2016 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 July 2016
        Published in tog Volume 35, Issue 4

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader