skip to main content
research-article
Artifacts Available

BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration

Published:16 July 2017Publication History
Skip Abstract Section

Abstract

Real-time, high-quality, 3D scanning of large-scale scenes is key to mixed reality and robotic applications. However, scalability brings challenges of drift in pose estimation, introducing significant errors in the accumulated model. Approaches often require hours of offline processing to globally correct model errors. Recent online methods demonstrate compelling results but suffer from (1) needing minutes to perform online correction, preventing true real-time use; (2) brittle frame-to-frame (or frame-to-model) pose estimation, resulting in many tracking failures; or (3) supporting only unstructured point-based representations, which limit scan quality and applicability. We systematically address these issues with a novel, real-time, end-to-end reconstruction framework. At its core is a robust pose estimation strategy, optimizing per frame for a global set of camera poses by considering the complete history of RGB-D input with an efficient hierarchical approach. We remove the heavy reliance on temporal tracking and continually localize to the globally optimized frames instead. We contribute a parallelizable optimization framework, which employs correspondences based on sparse features and dense geometric and photometric matching. Our approach estimates globally optimized (i.e., bundle adjusted) poses in real time, supports robust tracking with recovery from gross tracking failures (i.e., relocalization), and re-estimates the 3D model in real time to ensure global consistency, all within a single framework. Our approach outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness. Our framework leads to a comprehensive online scanning solution for large indoor environments, enabling ease of use and high-quality results.1

Skip Supplemental Material Section

Supplemental Material

tog-13.mp4

mp4

436.4 MB

References

  1. S. Agarwal, K. Mierle, and Others. 2013. Ceres Solver. Retrieved from http://ceres-solver.org. (2013).Google ScholarGoogle Scholar
  2. P. J. Besl and N. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Trans. PAMI 14, 2 (1992), 239--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Chen, D. Bautembach, and S. Izadi. 2013. Scalable real-time volumetric surface reconstruction. ACM TOG 32, 4 (2013), 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Choi, Q.-Y. Zhou, and V. Koltun. 2015. Robust reconstruction of indoor scenes. In Proc. CVPR. 5556--5565.Google ScholarGoogle Scholar
  5. B. Curless and M. Levoy. 1996. A volumetric method for building complex models from range images. In Proc. SIGGRAPH. ACM, 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. DeVito, M. Mara, M. Zollhöfer, G. Bernstein, J. Ragan-Kelley, C. Theobalt, P. Hanrahan, M. Fisher, and M. Nießner. 2016. Opt: A domain specific language for non-linear least squares optimization in graphics and imaging. arXiv Preprint arXiv:1604.06525 (2016).Google ScholarGoogle Scholar
  7. A. Elfes and L. Matthies. 1987. Sensor integration for robot navigation: Combining sonar and stereo range data in a grid-based representataion. In 26th IEEE Conference on Decision and Control, 1987, Vol. 26. IEEE, 1802--1807. Google ScholarGoogle ScholarCross RefCross Ref
  8. F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard. 2012. An evaluation of the RGB-D SLAM system. In Proc. ICRA. IEEE, 1691--1696. Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Engel, T. Schöps, and D. Cremers. 2014. LSD-SLAM: Large-scale direct monocular SLAM. In European Conference on Computer Vision. Google ScholarGoogle ScholarCross RefCross Ref
  10. J. Engel, J. Sturm, and D. Cremers. 2013. Semi-dense visual odometry for a monocular camera. In Proc. ICCV. IEEE, 1449--1456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Fioraio, J. Taylor, A. Fitzgibbon, L. Di Stefano, and S. Izadi. 2015. Large-scale and drift-free surface reconstruction using online subvolume registration. Proc. CVPR (June 2015). Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Forster, M. Pizzoli, and D. Scaramuzza. 2014. SVO: Fast semi-direct monocular visual odometry. In Proc. ICRA. IEEE, 15--22. Google ScholarGoogle ScholarCross RefCross Ref
  13. S. Fuhrmann and M. Goesele. 2014. Floating scale surface reconstruction. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Gallup, M. Pollefeys, and J.-M. Frahm. 2010. 3D reconstruction using an n-layer heightmap. In Pattern Recognition. Springer, 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  15. B. Glocker, J. Shotton, A. Criminisi, and S. Izadi. 2015. Real-time RGB-D camera relocalization via randomized ferns for keyframe encoding. TVCG 21, 5 (2015), 571--583. Google ScholarGoogle ScholarCross RefCross Ref
  16. J. C. Gower. 1975. Generalized procrustes analysis. Psychometrika 40, 1 (1975), 33--51. Google ScholarGoogle Scholar
  17. A. Handa, T. Whelan, J. B. McDonald, and A. J. Davison. 2014. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In Proc. ICRA. Google ScholarGoogle ScholarCross RefCross Ref
  18. P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. 2010. RGB-D mapping: Using depth cameras for dense 3D modeling of indoor environments. In Proc. Int. Symp. Experimental Robotics, Vol. 20. 22--25.Google ScholarGoogle Scholar
  19. A. Hilton, A. Stoddart, J. Illingworth, and T. Windeatt. 1996. Reliable surface reconstruction from multiple range images. J. Proc. ECCV 1 (1996), 117--126. Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon. 2011. KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera. In Proc. UIST. 559--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Kabsch. 1976. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A: Crystal Phys. Diffract. Theoret. General Crystallogr. 32, 5 (1976), 922--923. Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, and A. Kolb. 2013. Real-time 3D reconstruction in dynamic scenes using point-based fusion. In Proc. 3DV. IEEE, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Kerl, J. Sturm, and D. Cremers. 2013. Dense visual SLAM for RGB-D cameras. In Proc. IROS. Google ScholarGoogle ScholarCross RefCross Ref
  24. G. Klein and D. Murray. 2007. Parallel tracking and mapping for small AR workspaces. In Proc. ISMAR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard. 2011. g 2 o: A general framework for graph optimization. In Proc. ICRA. IEEE, 3607--3613.Google ScholarGoogle Scholar
  26. M. Levoy, K. Pulli, B. Curless, S. Rusinkiewicz, D. Koller, L. Pereira, M. Ginzton, S. Anderson, J. Davis, J. Ginsberg, and D. Fulk. 2000. The digital michelangelo project: 3D scanning of large statues. In Proc. SIGGRAPH. ACM Press/Addison-Wesley Publishing Co., 131--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Li, E. Vouga, A. Gudym, L. Luo, J. T. Barron, and G. Gusev. 2013. 3D self-portraits. ACM TOG 32, 6 (2013), 187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. IJCV 60 (2004), 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. Maier, J. Sturm, and D. Cremers. 2014. Submap-based bundle adjustment for 3D reconstruction from RGB-D data. In Proc. GCPR. Google ScholarGoogle ScholarCross RefCross Ref
  30. M. Meilland and A. Comport. 2013. On unifying key-frame and voxel-based dense visual SLAM at large scales. In Proc. IROS. IEEE, 3677--3683. Google ScholarGoogle ScholarCross RefCross Ref
  31. M. Meilland, A. Comport, P. Rives, and I. S. Antipolis Méditerranée. 2011. Real-time dense visual tracking under large lighting variations. In Proc. BMVC, Vol. 29. Google ScholarGoogle ScholarCross RefCross Ref
  32. P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J. M. Frahm, R. Yang, D. Nistér, and M. Pollefeys. 2007. Real-time visibility-based fusion of depth maps. In Proc. ICCV. 1--8. Google ScholarGoogle ScholarCross RefCross Ref
  33. R. M. Murray, S. S. Sastry, and L. Zexiang. 1994. A Mathematical Introduction to Robotic Manipulation. CRC Press.Google ScholarGoogle Scholar
  34. R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon. 2011a. KinectFusion: Real-time dense surface mapping and tracking. In Proc. ISMAR. 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. 2011b. DTAM: Dense tracking and mapping in real-time. In Proc. ICCV. 2320--2327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Nießner, A. Dai, and M. Fisher. 2014. Combining inertial navigation and ICP for real-time 3d surface reconstruction. In Eurographics (Short Papers). 13--16.Google ScholarGoogle Scholar
  37. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger. 2013. Real-time 3D reconstruction at scale using voxel hashing. ACM TOG 32, 6 (2013), 169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Pradeep, C. Rhemann, S. Izadi, C. Zach, M. Bleyer, and S. Bathiche. 2013. MonoFusion: Real-time 3d reconstruction of small scenes with a single web camera. In Proc. ISMAR. 83--88. Google ScholarGoogle ScholarCross RefCross Ref
  39. F. Reichl, J. Weiss, and R. Westermann. 2015. Memory-efficient interactive online reconstruction from depth image streams. In Computer Graphics Forum. Wiley Online Library.Google ScholarGoogle Scholar
  40. H. Roth and M. Vona. 2012. Moving volume kinectfusion. In Proc. BMVC. Google ScholarGoogle ScholarCross RefCross Ref
  41. S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. 2002. Real-time 3D model acquisition. ACM TOG 21, 3 (2002), 438--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Rusinkiewicz and M. Levoy. 2001. Efficient variants of the ICP algorithm. In Proc. 3DIM. 145--152. Google ScholarGoogle ScholarCross RefCross Ref
  43. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. F. Steinbruecker, C. Kerl, J. Sturm, and D. Cremers. 2013. Large-scale multi-resolution surface reconstruction from RGB-D sequences. In Proc. ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. F. Steinbruecker, J. Sturm, and D. Cremers. 2014. Volumetric 3D mapping in real-time on a CPU. In 2014 IEEE International Conference on Robotics and Automation (ICRA’14). Google ScholarGoogle ScholarCross RefCross Ref
  46. J. Stückler and S. Behnke. 2014. Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J. Visual Communication Image Representation 25, 1 (2014), 137--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. 2012. A benchmark for the evaluation of RGB-D SLAM systems. In Proc. IROS. Google ScholarGoogle ScholarCross RefCross Ref
  48. B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon. 2000. Bundle adjustment, a modern synthesis. In Vision Algorithms: Theory and Practice. Springer, 298--372. Google ScholarGoogle ScholarCross RefCross Ref
  49. J. Valentin, M. Nießner, J. Shotton, A. Fitzgibbon, S. Izadi, and P. Torr. 2015. Exploiting uncertainty in regression forests for accurate camera relocalization. In Proc. CVPR. 4400--4408. Google ScholarGoogle ScholarCross RefCross Ref
  50. T. Weise, T. Wismer, B. Leibe, and L. Van Gool. 2009. In-hand scanning with online loop closure. In Proc. ICCV Workshops. 1630--1637. Google ScholarGoogle ScholarCross RefCross Ref
  51. T. Whelan, H. Johannsson, M. Kaess, J. Leonard, and J. McDonald. 2012. Robust Tracking for Real-Time Dense RGB-D Mapping with Kintinuous. Technical Report. Query date: 10-25-2012.Google ScholarGoogle Scholar
  52. T. Whelan, H. Johannsson, M. Kaess, J.J. Leonard, and J. McDonald. 2013a. Robust real-time visual odometry for dense RGB-D mapping. In Proc. ICRA. Google ScholarGoogle ScholarCross RefCross Ref
  53. T. Whelan, M. Kaess, J. J. Leonard, and J. McDonald. 2013b. Deformation-based loop closure for large scale dense RGB-D SLAM. In Proc. IROS. IEEE, 548--555. Google ScholarGoogle ScholarCross RefCross Ref
  54. T. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker, and A. J. Davison. 2015. ElasticFusion: Dense SLAM without a pose graph. In Proc. RSS. Rome, Italy. Google ScholarGoogle ScholarCross RefCross Ref
  55. C. Wu, M. Zollhöfer, M. Nießner, M. Stamminger, S. Izadi, and C. Theobalt. 2014. Real-time shading-based refinement for consumer depth cameras. ACM TOG 33, 6 (2014), 200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. K. M. Wurm, A. Hornung, M. Bennewitz, C. Stachniss, and W. Burgard. 2010. OctoMap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In Proc. ICRA, Vol. 2.Google ScholarGoogle Scholar
  57. J. Xiao, A. Owens, and A. Torralba. 2013. SUN3D: A database of big spaces reconstructed using sfm and object labels. In Proc. ICCV. IEEE, 1625--1632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. M. Zeng, F. Zhao, J. Zheng, and X. Liu. 2012. Octree-based fusion for realtime 3d reconstruction. Graph. Models 75, 3 (2012), 126--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Y. Zhang, W. Xu, Y. Tong, and K. Zhou. 2015. Online structure analysis for real-time indoor scene reconstruction. ACM TOG 34, 5 (2015), 159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Q-Y. Zhou and V. Koltun. 2013. Dense scene reconstruction with points of interest. ACM TOG 32, 4 (2013), 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Q.-Y. Zhou and V. Koltun. 2014. Color map optimization for 3D reconstruction with consumer depth cameras. ACM TOG 33, 4 (2014), 155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Q.-Y. Zhou, S. Miller, and V. Koltun. 2013. Elastic fragments for dense scene reconstruction. In 2013 IEEE International Conference on Computer Vision (ICCV’13). IEEE, 473--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. M. Zollhöfer, A. Dai, M. Innmann, C. Wu, M. Stamminger, C. Theobalt, and M. Nießner. 2015. Shading-based refinement on volumetric signed distance functions. ACM TOG 34, 4 (2015), 96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. M. Zollhöfer, M. Nießner, S. Izadi, C. Rehmann, C. Zach, M. Fisher, C. Wu, A. Fitzgibbon, C. Loop, C. Theobalt, and M. Stamminger. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM TOG 33, 4 (2014), 156.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BundleFusion: Real-Time Globally Consistent 3D Reconstruction Using On-the-Fly Surface Reintegration

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 36, Issue 4
      August 2017
      2155 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3072959
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 July 2017
      • Accepted: 1 January 2017
      • Revised: 1 December 2016
      • Received: 1 April 2016
      Published in tog Volume 36, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader