Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds

https://doi.org/10.1016/j.isprsjprs.2018.04.023Get rights and content

Abstract

Ancient Chinese architecture 3D digitalization and documentation is a challenging task for the image based modeling community due to its architectural complexity and structural delicacy. Currently, an effective approach to ancient Chinese architecture 3D reconstruction is to merge the two point clouds, separately obtained from ground and aerial images by the SfM technique. There are two understanding issues should be specially addressed: (1) it is difficult to find the point matches between the images from different sources due to their remarkable variations in viewpoint and scale; (2) due to the inevitable drift phenomenon in any SfM reconstruction process, the resulting two point clouds are no longer strictly related by a single similarity transformation as it should be theoretically. To address these two issues, a new point cloud merging method is proposed in this work. Our method has the following characteristics: (1) the images are matched by leveraging sparse mesh based image synthesis; (2) the putative point matches are filtered by geometrical consistency check and geometrical model verification; and (3) the two point clouds are merged via bundle adjustment by linking the ground-to-aerial tracks. Extensive experiments show that our method outperforms many of the state-of-the-art approaches in terms of ground-to-aerial image matching and point cloud merging.

Introduction

Ancient Chinese architecture is an important component of world architecture system, and its most significant characteristic is the timber framework. Compared with other architectural styles, although more delicate structures can be achieved, ancient Chinese architecture is more vulnerable to natural disasters, e.g. fire and earthquake. As a result, there is an urgent need for preservation of ancient Chinese architecture, and one of the best means is to digitally preserve it by reconstructing complete and detailed 3D models (Ikeuchi et al., 2007, Banno et al., 2008).

3D digitalization of architecture is an intensive research topic in the fields of computer vision and computer graphics. There are usually two ways for data acquisition, active vision based laser scanning and passive vision based image capturing. Laser scanning based methods (Nan et al., 2010, Lafarge and Mallet, 2011, Li et al., 2016), which are widely used in urban scene reconstruction, are not suitable for 3D digitalization of ancient Chinese architecture. The reasons are twofold: One is that as ancient Chinese architecture is structurally complicated, multi-viewpoint and close-range scanning are necessary to achieve a complete architectural model, which is inconvenient and impractical for cumbersome laser scanners, in particular for roof scanning; the other is that projected lasers of a certain frequency may damage the materials and paintings of ancient Chinese architecture. In contrast, image capturing based methods possess the strengths of lowcost and flexibility and are harmless to ancient Chinese architecture. As a result, the image based methods are preferred in this paper.

Image based architectural scene reconstruction is a classical and fundamental problem in the research fields of computer vision (Snavely et al., 2008, Furukawa and Ponce, 2010, Cui et al., 2015, Zheng et al., 2014) and remote sensing (Bartelsen et al., 2012, Mancini et al., 2013, Rottensteiner et al., 2014). Thanks to recent developments in algorithm efficiency and hardware performance, reconstruction systems have been extended from single buildings to an urban scale (Agarwal et al., 2011), nowadays even to a worldwide scale (Heinly et al., 2015). In order to achieve a complete 3D digitized model of ancient Chinese architecture that captures details of complex structures, e.g. cornices and brackets, usually two different sources of images, ground and aerial, are needed for close-range and large-scale photography (Bódis-Szomorú et al., 2016). When using both ground and aerial images, a common practice is to carry out the reconstruction separately for ground and aerial point clouds at first and then merge them afterwards. Considering the noisy nature of reconstructed 3D point clouds from image collections and the loss of rich textural and contextual information of 2D images in 3D point clouds, it is preferable to merge the point clouds via 2D image feature point matching rather than by direct 3D point cloud registration, e.g. ICP (Besl and McKay, 1992). The difficulties of merging ground and aerial point clouds are illustrated in Fig. 1, where Fig. 1a is an example pair of ground and aerial images and Fig. 1b are the ground and aerial sparse point clouds reconstructed from the images. Note that the sparse point cloud in this paper is the feature points (e.g. SIFT features) reconstructed by the structure-from-motion (SfM) procedure. In contrast, the dense point cloud is the points reconstructed by the multi-view stereo (MVS) procedure via pixel-wise dense matching. In this paper, only the sparse point cloud is involved during the point cloud merging process. Fig. 1 shows that there are two key issues for architectural scene reconstruction from ground and aerial images: (1) how to match the ground and aerial images with substantial variations in viewpoint and scale (Fig. 1a), and (2) how to merge the ground and aerial point clouds with drift phenomena and notable differences in noise level, density and accuracy (Fig. 1b).

To deal with the ground-to-aerial image matching problem, in this paper the ground image is warped to the viewpoint of the aerial image, by which the differences in viewpoint and scale between the two kinds of images are eliminated. Unlike the method proposed in Shan et al. (2014), which synthesizes the aerial-view image using the spatially discrete ground dense point cloud, the image synthesis method here resorts to the spatially continuous ground sparse mesh, which is reconstructed from the ground sparse point cloud. Then, the synthetic image is matched with the target aerial image by SIFT feature point extraction and matching. Subsequently, the putative point match outliers are filtered out by the following two techniques: (1) a consistency check of the feature scales and principal orientations between the point matches and (2) an affine transformation verification of the feature locations between the point matches.

After matching the ground and aerial images, rather than aligning the point clouds by estimating a similarity transformation between them, the point clouds are merged together by a global bundle adjustment to deal with the possible scene drift phenomenon. To achieve that, the obtained point matches are linked to the original aerial tracks first, and then a global bundle adjustment is performed to merge the ground and aerial point clouds with the augmented aerial tracks and original ground tracks.

This work has the following three main contributions: (1) the aerial-view image is synthesized based on the ground sparse mesh, (2) the putative ground-to-aerial point matches are filtered by geometrical consistency check and geometrical model verification, and (3) the ground and aerial point clouds are merged via bundle adjustment by linking the ground-to-aerial tracks.

The rest of this paper is organized as follows: Section 2 introduces some related works. Our proposed method is described in Section 3 and evaluated in Section 4. Section 5 gives an extension of our proposed method. Finally, Section 6 offers some concluding remarks.

Section snippets

Related work

There are four main categories of works related to ours: ground-to-aerial image matching; point match outliers filtering; image synthesis and rendering; and ground-to-aerial point cloud alignment.

Proposed method

In this paper, complete architectural scene reconstruction is achieved by first matching the ground and aerial images and then merging the ground and aerial point clouds. The pipeline of the proposed method is shown in Fig. 2. The inputs of the method are the ground and aerial images, and the outputs are the merged ground and aerial sparse point clouds. The method contains three main steps: pre-processing, ground-to-aerial image matching, and ground-to-aerial point cloud merging, which are

Experimental results

In this section, the proposed architectural scene reconstruction method by merging the ground and aerial point clouds is evaluated. First, four datasets used for method evaluation are presented. Then, the proposed methods of ground-to-aerial image matching and ground-to-aerial point cloud merging are evaluated based on the evaluation datasets.

Extension: from sparse to dense point cloud

In this section, we briefly introduce how to produce an integrated dense point cloud based on the merged sparse point clouds and cameras, which is a straightforward extension of this paper.

As the point clouds considered in this paper are sparse feature point clouds, in order to give a complete and detailed scene reconstruction, multiple-view stereo (MVS) should be performed to generate dense points, which is a standard procedure in image based modeling. Computing a depth-map for each image

Conclusion

In this paper, the issue of 3D preservation of ancient Chinese architecture by merging the ground and aerial point clouds is addressed. We propose dealing with ground-to-aerial image matching and ground-to-aerial point cloud merging problems in a unified framework. By taking advantage of the spatially continuous mesh, the aerial-view images can be synthesized without artifact holes, by which the differences in viewpoint and scale between ground and aerial images are largely eliminated and the

Acknowledgement

This work was supported by the National Science Foundation of China (NSFC) under grants 61333015, 61421004, 61632003, and 61473292.

References (54)

  • H. Bay et al.

    Speeded-up robust features (SURF)

    Comp. Vis. Image Understand.

    (2008)
  • X. Gao et al.

    Accurate and efficient ground-to-aerial model alignment

    Patt. Recog.

    (2018)
  • F. Rottensteiner et al.

    Results of the ISPRS benchmark on urban object detection and 3D building reconstruction

    ISPRS J. Photogram. Rem. Sens.

    (2014)
  • S. Agarwal et al.

    Building Rome in a day

    Commun. ACM

    (2011)
  • S. Agarwal et al.

    Bundle adjustment in the large

  • A. Banno et al.

    Flying laser range sensor for large-scale site-modeling and its applications in Bayon digital archival project

    Int. J. Comp. Vis.

    (2008)
  • M. Bansal et al.

    Ultrawide baseline facade matching for geo-localization

  • M. Bansal et al.

    Geo-localization of street views with aerial image databases

  • J. Bartelsen et al.

    Orientation and dense reconstruction of unordered terrestrial and aerial wide baseline image sets

    ISPRS Ann. Photogram., Rem. Sens. Spat. Inf. Sci.

    (2012)
  • P.J. Besl et al.

    A method for registration of 3-D shapes

    IEEE Trans. Patt. Anal. Mach. Intell.

    (1992)
  • A. Bódis-Szomorú et al.

    Efficient volumetric fusion of airborne and street-side data for urban reconstruction

  • H. Cui et al.

    HSfM: Hybrid structure-from-motion

  • H. Cui et al.

    Efficient large-scale structure from motion by fusing auxiliary imaging information

    IEEE Trans. Image Process.

    (2015)
  • M.A. Fischler et al.

    Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

    Commun. ACM

    (1981)
  • Y. Furukawa et al.

    Accurate, dense, and robust multiview stereopsis

    IEEE Trans. Patt. Anal. Mach. Intell.

    (2010)
  • M. Garland et al.

    Surface simplification using quadric error metrics

  • N. Greene et al.

    Hierarchical z-buffer visibility

  • Y. Guo et al.

    Rotational projection statistics for 3D local surface description and object recognition

    Int. J. Comp. Vis.

    (2013)
  • R. Hartley et al.

    Multiple View Geometry in Computer Vision

    (2003)
  • J. Heinly et al.

    Reconstructing the world∗ in six days

  • K. Ikeuchi et al.

    The great buddha project: digitally archiving, restoring, and analyzing cultural heritage objects

    Int. J. Comp. Vis.

    (2007)
  • Jancosek, M., Pajdla, T., 2014. Exploiting Visibility Information in Surface Reconstruction to Preserve Weakly...
  • H. Jégou et al.

    Improving bag-of-features for large scale image search

    Int. J. Comp. Vis.

    (2010)
  • A.E. Johnson et al.

    Using spin images for efficient object recognition in cluttered 3D scenes

    IEEE Trans. Patt. Anal. Mach. Intell.

    (1999)
  • F. Lafarge et al.

    Building large urban environments from unstructured point data

  • M. Li et al.

    Manhattan-world urban reconstruction from point clouds

  • W.-Y. Lin et al.

    RepMatch: Robust feature matching and pose for reconstructing modern cities

  • Cited by (40)

    • A framework for reconstructing building parametric models with hierarchical relationships from point clouds

      2023, International Journal of Applied Earth Observation and Geoinformation
    • Optimal planning of indoor laser scans based on continuous optimization

      2022, Automation in Construction
      Citation Excerpt :

      With the advance in digital techniques, sophisticated tools such as laser scanners have drawn extensive attention to three-dimensional (3D) information in the field of quality assessment and indoor information acquisition [1,2].

    • RANSAC-based multi primitive building reconstruction from 3D point clouds

      2022, ISPRS Journal of Photogrammetry and Remote Sensing
    View all citing articles on Scopus
    View full text