Elsevier

Computers & Graphics

Volume 53, Part A, December 2015, Pages 44-53
Computers & Graphics

40 years of Computer Graphics in Darmstadt
MVE—An image-based reconstruction environment

https://doi.org/10.1016/j.cag.2015.09.003Get rights and content

Highlights

  • End-to-end multi-view geometry reconstruction and texturing pipeline.

  • Multi-scale reconstruction approach.

Abstract

We present an image-based reconstruction system, the Multi-View Environment. MVE is an end-to-end multi-view geometry reconstruction software which takes photos of a scene as input and produces a textured surface mesh as result. The system covers a structure-from-motion algorithm, multi-view stereo reconstruction, generation of extremely dense point clouds, reconstruction of surfaces from point clouds, and surface texturing. In contrast to most image-based geometry reconstruction approaches, our system is focused on reconstruction of multi-scale scenes, an important aspect in many areas such as cultural heritage. It allows to reconstruct large datasets containing some detailed regions with much higher resolution than the rest of the scene. Our system provides a graphical user interface for visual inspection of the individual steps of the pipeline, i.e., the structure-from-motion result, multi-view stereo depth maps, and rendering of scenes and meshes.

Introduction

Acquiring geometric data from natural and man-made objects or scenes is a fundamental field of research in computer vision and graphics. 3D digitization is relevant for designers, the entertainment industry, and for the preservation as well as digital distribution of cultural heritage objects and sites. In this paper, we introduce MVE, the Multi-View Environment, a free software solution for low-cost geometry acquisition from images. The system takes as input a set of photos and provides the algorithmic steps necessary to obtain a high-quality surface mesh of the captured object as final output. This includes structure-from-motion, multi-view stereo, surface reconstruction and texturing.

Geometric acquisition approaches are broadly classified into active and passive scanning. Active scanning technologies for 3D data acquisition exist in various flavors. Time of flight and structured light scanners are known to produce geometry with remarkable detail and accuracy. But these systems require expensive hardware and elaborate capture planning and execution. Real-time stereo systems such as the Kinect primarily exist for the purpose of gaming, but are often used for real-time geometry acquisition. These systems are based on structured infra-red light which is emitted into the scene. They are often of moderate quality and limited to indoor settings because of inference with sunlight׳s infrared component. Finally, there is some concern that active systems may damage objects of cultural value due to intense light emission.

Passive scanning systems do not emit light, are purely based on the existing illumination, and will not physically affect the subject matter. The main advantage of these systems is the cheap capture setup which does not require special hardware: A consumer-grade camera (or just a smartphone) is enough to capture datasets. The reconstruction process is based on finding visual correspondences in the input images, which, compared to active systems, usually leads to less complete geometry, and limits the scenes to static, well-textured surfaces. The inexpensive demands on the capture setup, however, come at the cost of much more elaborate computer software to process the unstructured input. The standard pipeline for geometry reconstruction from images involves four major algorithmic steps (see Fig. 1):

  • Structure-from-Motion (SfM) infers the extrinsic camera parameters (position and orientation) and the camera calibration (focal length and radial distortion) by finding sparse but stable correspondences between images. A sparse point-based 3D representation of the subject is created as a by-product of camera reconstruction.

  • Multi-View Stereo (MVS) reconstructs dense 3D geometry by finding visual correspondences in the images using the estimated camera parameters. These correspondences are triangulated yielding dense 3D information.

  • Surface Reconstruction takes as input a dense point cloud or individual depth maps and produces a globally consistent surface mesh.

  • Surface Texturing computes a consistent texture for the surface mesh using the input images.

It is not surprising that software solutions for end-to-end passive geometry reconstruction are rare. The reason lies in the technical complexity and the effort required to create such tools. Many projects cover parts of the pipeline, such as Bundler [1], VisualSfM [2], or OpenMVG [3] for structure-from-motion reconstruction, PMVS [4] for multi-view stereo, and Poisson Surface Reconstruction [5] for mesh reconstruction. A few commercial software projects offer complete end-to-end pipelines covering SfM, MVS, Surface Reconstruction and Texturing. This includes Arc3D, Agisoft Photoscan and Acute3D Smart3DCapture. All of them are, however, closed source and do not facilitate research. In contrast, we offer a complete pipeline as a free, open source software system, which was introduced in an earlier version of this paper [6].

Our system handles many kinds of scenes, such as compact objects, open outdoor scenes, and controlled studio datasets. It avoids to fill holes in regions with insufficient data for a reliable reconstruction. This may leave holes in the surfaces but does not introduce artificial geometry, common to many global reconstruction approaches. Our software puts a special emphasis on multi-resolution datasets which can contain very detailed regions in otherwise less detailed datasets. It has been shown that inferior results are produced if the multi-resolution nature of the input data is not considered properly [7], [8], [9].

In the paper׳s remainder we first give a technical overview of our system and introduce its individual components in Section 2. A few practical aspects and limitations of our system are discussed in Section 3. We then show reconstruction results on several datasets with different characteristics and demonstrate the versatility of our pipeline in Section 4. We briefly describe our software framework and conclude in Section 5.

Section snippets

System overview

Our system consists of four steps: Structure-from-motion (SfM) which reconstructs the parameters of the cameras, multi-view stereo (MVS) for establishing dense visual correspondences, a meshing step which merges the MVS geometry into a globally consistent mesh and finally a texturing step creating seamless textures from the input images. In the following, we give a concise overview of the process, using the Bronze Statue dataset as an example for a cultural heritage artifact, see Fig. 1. For a

Practical aspects

In this section we discuss some aspects that should be considered when using our image-based reconstruction system. We present some guidelines that can help users to capture better input data in order to facilitate high quality results. We also discuss some limitations of the presented approaches, which do not only apply to our reconstruction system but more generally to these types of algorithms.

Reconstruction results

In the following, we show results on a few datasets we acquired over time. We selected a variety of scenarios to show the broad applicability of our system.

Duck: The first dataset, called Duck, was captured in a controlled studio environment and contains 160 images of a small, diffuse ceramic duck figurine, see Fig. 11. This is a relatively compact dataset with uniform scale as the images have the same resolution and are evenly spaced around the object. Notice that, although the individual

Conclusion

In this paper we presented MVE, the Multi-View Environment, a free and open 3D reconstruction application, relevant to the cultural heritage community. It is versatile and can operate on a broad range of datasets, including the ability to handle quite uncontrolled photos. It is thus suitable for reconstruction amateurs. Our focus on multi-scale data allows to put an emphasis on interesting parts in larger scenes with close-up photos. We belief that the effort and expert knowledge that went into

Acknowledgments

Part of the research leading to these results has received funding from the European Commission׳s FP7 Framework Programme under grant agreements ICT-323567 (HARVEST4D) and ICT-611089 (CR-PLAY), the DFG Emmy Noether fellowship GO 1752/3-1 as well as the Intel Visual Computing Institute (Project RealityScan).

References (31)

  • H. Bay et al.

    Speeded-up robust features (SURF)

    Comput Vis Image Understand (CVIU)

    (2008)
  • M. Callieri et al.

    Masked photo blendingMapping dense photographic dataset on high-resolution sampled 3D models

    Comput Graph

    (2008)
  • N. Snavely et al.

    Photo tourismexploring photo collections in 3D

    Trans Graph

    (2006)
  • Wu C. Towards linear-time incremental structure from motion. In: International conference on 3D vision (3DV). 2013, p....
  • Moulon P, Monasse P, Marlet R, et al. OpenMVG,...
  • Y. Furukawa et al.

    Accurate, dense, and robust multi-view stereopsis

    Trans Pattern Anal Mach Intell (PAMI)

    (2010)
  • M. Kazhdan et al.

    Screened Poisson surface reconstruction

    Trans Graph

    (2013)
  • Fuhrmann S, Langguth F, Goesele M. MVE – a multi-view reconstruction environment. In: Eurographics workshop on graphics...
  • Mücke P, Klowsky R, Goesele M. Surface reconstruction from multi-resolution sample points. In: Vision, Modeling and...
  • Fuhrmann S, Goesele M. Fusion of depth maps with multiple scales. In: SIGGRAPH Asia; 2011. p....
  • Fuhrmann S, Goesele M. Floating Scale Surface Reconstruction. In: SIGGRAPH,...
  • R. Szeliski

    Computer Vision: Algorithms and Applications

    (2010)
  • Armstrong M, Zisserman A, Beardsley PA. Euclidean reconstruction from uncalibrated images. In: British Machine Vision...
  • Pollefeys M, Koch R, Gool LV. Self-calibration and metric reconstruction in spite of varying and unknown internal...
  • M. Pollefeys et al.

    3D recording for archaeological field work

    Comput Graph Appl (CGA)

    (2003)
  • Cited by (71)

    • 3D grape bunch model reconstruction from 2D images

      2023, Computers and Electronics in Agriculture
    • Sparse prior guided deep multi-view stereo

      2022, Computers and Graphics (Pergamon)
      Citation Excerpt :

      In the past years, many successful traditional MVS frameworks based on depth map have been proposed. MVE (Multi-View Environment) [24] is an image-based multi-view geometry reconstruction pipeline, which includes structure-from-motion, multi-view stereo, surface reconstruction and texturing. COLMAP [7] is a PatchMatch-based [25] MVS framework, which leverages the photometric and geometric priors to jointly estimate the depth map and surface normal, together with the pixel-wise view selection information.

    • Scene-graph-driven semantic feature matching for monocular digestive endoscopy

      2022, Computers in Biology and Medicine
      Citation Excerpt :

      Some specific examples of reprojection errors are shown in Table 6. We compared the sparse and dense reconstruction using our proposed semantic descriptor with COLMAP [12,35]. We use the number of registered views (Registered ratio) and the number of sparse reconstruction points (SR points) to evaluate the performance of our method in the task of SfM in endoscopy.

    • Shape Recovery from Polarization: A Review

      2023, ACM International Conference Proceeding Series
    View all citing articles on Scopus
    View full text