Large-scale, real-time 3D scene reconstruction on a mobile device

Dryanovski, Ivan; Klingensmith, Matthew; Srinivasa, Siddhartha S.; Xiao, Jizhong

doi:10.1007/s10514-017-9624-2

Large-scale, real-time 3D scene reconstruction on a mobile device

Published: 24 February 2017

Volume 41, pages 1423–1445, (2017)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Ivan Dryanovski¹,
Matthew Klingensmith ORCID: orcid.org/0000-0002-8419-2649²,
Siddhartha S. Srinivasa² &
…
Jizhong Xiao³

1872 Accesses
16 Citations
3 Altmetric
Explore all metrics

Abstract

Google’s Project Tango has made integrated depth sensing and onboard visual-intertial odometry available to mobile devices such as phones and tablets. In this work, we explore the problem of large-scale, real-time 3D reconstruction on a mobile devices of this type. Solving this problem is a necessary prerequisite for many indoor applications, including navigation, augmented reality and building scanning. The main challenges include dealing with noisy and low-frequency depth data and managing limited computational and memory resources. State of the art approaches in large-scale dense reconstruction require large amounts of memory and high-performance GPU computing. Other existing 3D reconstruction approaches on mobile devices either only build a sparse reconstruction, offload their computation to other devices, or require long post-processing to extract the geometric mesh. In contrast, we can reconstruct and render a global mesh on the fly, using only the mobile device’s CPU, in very large (300 m$^2$) scenes, at a resolutions of 2–3 cm. To achieve this, we divide the scene into spatial volumes indexed by a hash map. Each volume contains the truncated signed distance function for that area of space, as well as the mesh segment derived from the distance function. This approach allows us to focus computational and memory resources only in areas of the scene which are currently observed, as well as leverage parallelization techniques for multi-core processing. Furthermore, we describe an on-device post-processing method for fusing datasets from multiple, independent trials, in order to improve the quality and coverage of the reconstruction. We discuss how the particularities of the devices impact our algorithm and implementation decisions. Finally, we provide both qualitative and quantitative results on publicly available RGB-D datasets, and on datasets collected in real-time from two devices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time RGB-D Mapping and 3-D Modeling on the GPU Using the Random Ball Cover

BOR $$^2$$ G: Building Optimal Regularised Reconstructions with GPUs (in Cubes)

Lifetime and Deployment Limits for Mobile, 3D-Perceptual Applications

Notes

http://www.github.com/personalrobotics/OpenChisel.

References

Amanatides, J., & Woo, A. (1987). A fast voxel traversal algorithm for ray tracing. Eurographics, 87, 3–10.
Google Scholar
Bylow, E., Sturm, J., Kerl, C., Kahl, F., & Cremers D. (2013). Real-time camera tracking and 3D reconstruction using signed distance functions. In Robotics: Science and systems (RSS) conference 2013.
Chen, J., Bautembach, D., & Izadi, S. (2013). Scalable real-time volumetric surface reconstruction. ACM Transactions on Graphics (TOG), 32(4), 113.
MATH Google Scholar
Chen, Y., & Medioni, G. (1991, April). Object modeling by registration of multiple range images. In Proceedings., 1991 IEEE international conference on robotics and automation (Vol. 3, pp. 2724 –2729).
Chilimbi, T. M., Hill, M. D., & Larus, J. R. (2000). Making pointer-based data structures cache conscious. Computer, 33(12), 67–74.
Article Google Scholar
Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In SIGGRAPH 96 conference proceedings (pp. 303–312). ACM.
Elfes, A. (1989). Using occupancy grids for mobile robot perception and navigation. Computer, 22, 46–57.
Article Google Scholar
Engel, J., Schöps, T., & Cremers, D. (2014, September). LSD-SLAM: Large-scale direct monocular SLAM. In European conference on computer vision (ECCV).
Garland, M., & Heckbert, P. S. (1997). Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on computer graphics and interactive techniques (pp. 209–216). ACM Press/Addison-Wesley Publishing Co.
Google. Project Tango (2014). https://www.google.com/atap/projecttango.
Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., & Cipolla, R. (2015). Scenenet: Understanding real world indoor scenes with synthetic data. In CoRR. arXiv:1511.07041.
Hesch, J. A., Kottas, D. G., Bowman, Sean L., & Roumeliotis, S. I. (2014). Camera-IMU-based localization: Observability analysis and consistency improvement. The International Journal of Robotics Research, 33(1), 182–201.
Article Google Scholar
Kähler, O., Prisacariu, V. A., Ren, C. Y., Sun, X., Torr, P. H. S., & Murray, D. W. (2015). Very high frame rate volumetric integration of depth images on mobile devices. IEEE Transactions on Visualization and Computer Graphics, 21(11), 1241–1250.
Article Google Scholar
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In 2007 6th IEEE and ACM international symposium on mixed and augmented reality, ISMAR.
Klingensmith, M., Dryanovski, I., Srinivasa, S., & Xiao, J. (2015, July). Chisel: Real time large scale 3d reconstruction onboard a mobile device using spatially hashed signed distance fields. In Proceedings of robotics: Science and systems, Rome.
Klingensmith, M., Herrmann, M., & Srinivasa, S. S. (2014). Object modeling and recognition from sparse: Noisy data via voxel depth carving. In ISER, number d.
Lepetit, V., Moreno-Noguer, F., & Fua, P. (2009). Epnp: An accurate o (n) solution to the PnP problem. International Journal of Computer Vision, 81(2), 155–166.
Article Google Scholar
Lorensen, W. E., & Cline, H. E. (1987). Marching cubes: A high resolution 3D surface construction algorithm. In SIGGRAPH 1987, (Vol. 21 pp. 163–169). ACM.
Lynen, S., Bosse, M., Furgale, P., & Siegwart, R. (2014). Placeless place-recognition. In 2nd international conference on 3D vision (3DV)
Microsoft. Kinect for Windows. http://www.microsoft.com/en-us/kinectforwindows/.
Mourikis, A. I., & Roumeliotis, S. I. (2007). A multi-state constraint Kalman filter for vision-aided inertial navigation. In 2007 IEEE international conference on robotics and automation.
Nerurkar, E. D., Wu, K. J., & Roumeliotis, S. I. (2014). C-KLAM: Constrained keyframe-based localization and mapping. In 2014 IEEE international conference on robotics and automation (ICRA) (pp. 3638–3643).
Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., & Davison, A. J. Pushmeet K., Jamie S., Steve H., & Andrew F. (2011) KinectFusion: Real-time dense surface mapping and tracking. In 2011 10th IEEE international symposium on mixed and augmented reality, ISMAR 2011 (pp. 127–136).
Newcombe, R. A., Lovegrove, S. J., & Davison, A. J. (2011). DTAM: Dense tracking and mapping in real-time. 2011 IEEE international conference on computer vision (ICCV).
Nguyen, C. V., Izadi, S., & Lovell, D. (2012). Modeling kinect sensor noise for improved 3D reconstruction and tracking. In Proceedings—2nd joint 3DIM/3DPVT conference: 3D imaging, modeling, processing, visualization and transmission, 3DIMPVT 2012 (pp. 524–530).
Nieß ner, M., Zollhöfer, M., Izadi, S., & Stamminger, M. (2013). Real-time 3D reconstruction at scale using voxel hashing. In ACM transactions on graphics (TOG).
Rusinkiewicz, S., Hall-Holt, O., & Levoy, M. (2002). Real-time 3D model acquisition. In ACM transactions on graphics (Vol. 21, pp. 438–446). ACM
Scherzer, D., Wimmer, M., & Purgathofer, W. (2011). A survey of real-time hard shadow mapping methods. In Computer graphics forum (Vol. 30, pp. 169–186). Wiley Online Library.
Schöps, T., Sattler, T., Häne, C., & Pollefeys, M. (2015). 3D modeling on the go: Interactive 3D reconstruction of large-scale scenes on mobile devices. In International conference on 3D vision (3DV).
Structure Sensor. http://structure.io/
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. In IEEE international conference on intelligent robots and systems (pp. 573–580).
Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., & Pollefeys, M. (2013). Live metric 3D reconstruction on mobile phones. In 2013 IEEE international conference on computer vision (pp. 65–72).
Teschner, M., Hiedelberger, B., Müller, M., Pomeranets, D., & Gross, M. (2003). 2003. In: Vmv: Optimized spatial hashing for collision detection of deformable objects.
Weise, T., Leibe, B., & Van Gool, L. (2008). Accurate and robust registration for in-hand modeling. In 26th IEEE conference on computer vision and pattern recognition, CVPR (pp. 1–8).
Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B., & Davison, A. J. (2015, July). ElasticFusion: Dense SLAM without a pose graph. In Robotics: Science and systems (RSS), Rome.
Whelan, T., Johannsson, H., Kaess, M., Leonard, J. J., & McDonald, J. (2013). Robust real-time visual odometry for dense RGB-D mapping. In 2013 IEEE international conference on robotics and automation (ICRA).
Whelan, T., & Kaess, M. (2013, November). Deformation-based loop closure for large scale dense RGB-D SLAM. In 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS), Tokyo.
Wurm, K. M., Hornung, A., Bennewitz, M., Stachniss, C., & Burgard, W. (2010). OctoMap: A probabilistic, flexible, and compact 3D map representation for robotic systems. In Proceedings of the ICRA 2010 workshop on best practice in 3D perception and modeling for mobile manipulation.
Zeng, M., Zhao, F., Zheng, J., & Liu, X. (2013). Octree-based fusion for realtime 3D reconstruction. Graphical Models, 75(3), 126–136.

Download references

Acknowledgements

This work was done with the support of Googles Advanced Technologies and Projects division (ATAP) for Project Tango. The authors thank to Johnny Lee, Joel Hesch, Esha Nerurkar, Simon Lynen, Ryan Hickman and other ATAP members for their close collaboration and support on this project.

Author information

Authors and Affiliations

Department of Computer Science, The Graduate Center, The City University of New York (CUNY), 365 Fifth Avenue, New York, NY, 10016, USA
Ivan Dryanovski
Carnegie Mellon Robotics Institute, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
Matthew Klingensmith & Siddhartha S. Srinivasa
Electrical Engineering Department, The City College of New York, 160 Convent Avenue, New York, NY, 10031, USA
Jizhong Xiao

Authors

Ivan Dryanovski
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Klingensmith
View author publications
You can also search for this author in PubMed Google Scholar
Siddhartha S. Srinivasa
View author publications
You can also search for this author in PubMed Google Scholar
Jizhong Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jizhong Xiao.

Additional information

This work is supported in part by U.S. Army Research Office under Grant No. W911NF0910565, Federal Highway Administration (FHWA) under Grant No. DTFH61-12-H-00002, Google under Grant No. RF-CUNY-65789-00-43, Toyota USA Grant No. 1011344 and U.S. Office of Naval Research Grant No. N000141210613.

This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 215717 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dryanovski, I., Klingensmith, M., Srinivasa, S.S. et al. Large-scale, real-time 3D scene reconstruction on a mobile device. Auton Robot 41, 1423–1445 (2017). https://doi.org/10.1007/s10514-017-9624-2

Download citation

Received: 09 December 2015
Accepted: 11 January 2017
Published: 24 February 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s10514-017-9624-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale, real-time 3D scene reconstruction on a mobile device

Abstract

Access this article

Similar content being viewed by others

Real-Time RGB-D Mapping and 3-D Modeling on the GPU Using the Random Ball Cover

BOR $$^2$$ G: Building Optimal Regularised Reconstructions with GPUs (in Cubes)

Lifetime and Deployment Limits for Mobile, 3D-Perceptual Applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-scale, real-time 3D scene reconstruction on a mobile device

Abstract

Access this article

Similar content being viewed by others

Real-Time RGB-D Mapping and 3-D Modeling on the GPU Using the Random Ball Cover

BOR $$^2$$ G: Building Optimal Regularised Reconstructions with GPUs (in Cubes)

Lifetime and Deployment Limits for Mobile, 3D-Perceptual Applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation