Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds

doi:10.1016/j.isprsjprs.2018.04.023

ISPRS Journal of Photogrammetry and Remote Sensing

Volume 143, September 2018, Pages 72-84

https://doi.org/10.1016/j.isprsjprs.2018.04.023 Get rights and content

Abstract

Ancient Chinese architecture 3D digitalization and documentation is a challenging task for the image based modeling community due to its architectural complexity and structural delicacy. Currently, an effective approach to ancient Chinese architecture 3D reconstruction is to merge the two point clouds, separately obtained from ground and aerial images by the SfM technique. There are two understanding issues should be specially addressed: (1) it is difficult to find the point matches between the images from different sources due to their remarkable variations in viewpoint and scale; (2) due to the inevitable drift phenomenon in any SfM reconstruction process, the resulting two point clouds are no longer strictly related by a single similarity transformation as it should be theoretically. To address these two issues, a new point cloud merging method is proposed in this work. Our method has the following characteristics: (1) the images are matched by leveraging sparse mesh based image synthesis; (2) the putative point matches are filtered by geometrical consistency check and geometrical model verification; and (3) the two point clouds are merged via bundle adjustment by linking the ground-to-aerial tracks. Extensive experiments show that our method outperforms many of the state-of-the-art approaches in terms of ground-to-aerial image matching and point cloud merging.

Introduction

Ancient Chinese architecture is an important component of world architecture system, and its most significant characteristic is the timber framework. Compared with other architectural styles, although more delicate structures can be achieved, ancient Chinese architecture is more vulnerable to natural disasters, e.g. fire and earthquake. As a result, there is an urgent need for preservation of ancient Chinese architecture, and one of the best means is to digitally preserve it by reconstructing complete and detailed 3D models (Ikeuchi et al., 2007, Banno et al., 2008).

3D digitalization of architecture is an intensive research topic in the fields of computer vision and computer graphics. There are usually two ways for data acquisition, active vision based laser scanning and passive vision based image capturing. Laser scanning based methods (Nan et al., 2010, Lafarge and Mallet, 2011, Li et al., 2016), which are widely used in urban scene reconstruction, are not suitable for 3D digitalization of ancient Chinese architecture. The reasons are twofold: One is that as ancient Chinese architecture is structurally complicated, multi-viewpoint and close-range scanning are necessary to achieve a complete architectural model, which is inconvenient and impractical for cumbersome laser scanners, in particular for roof scanning; the other is that projected lasers of a certain frequency may damage the materials and paintings of ancient Chinese architecture. In contrast, image capturing based methods possess the strengths of lowcost and flexibility and are harmless to ancient Chinese architecture. As a result, the image based methods are preferred in this paper.

Image based architectural scene reconstruction is a classical and fundamental problem in the research fields of computer vision (Snavely et al., 2008, Furukawa and Ponce, 2010, Cui et al., 2015, Zheng et al., 2014) and remote sensing (Bartelsen et al., 2012, Mancini et al., 2013, Rottensteiner et al., 2014). Thanks to recent developments in algorithm efficiency and hardware performance, reconstruction systems have been extended from single buildings to an urban scale (Agarwal et al., 2011), nowadays even to a worldwide scale (Heinly et al., 2015). In order to achieve a complete 3D digitized model of ancient Chinese architecture that captures details of complex structures, e.g. cornices and brackets, usually two different sources of images, ground and aerial, are needed for close-range and large-scale photography (Bódis-Szomorú et al., 2016). When using both ground and aerial images, a common practice is to carry out the reconstruction separately for ground and aerial point clouds at first and then merge them afterwards. Considering the noisy nature of reconstructed 3D point clouds from image collections and the loss of rich textural and contextual information of 2D images in 3D point clouds, it is preferable to merge the point clouds via 2D image feature point matching rather than by direct 3D point cloud registration, e.g. ICP (Besl and McKay, 1992). The difficulties of merging ground and aerial point clouds are illustrated in Fig. 1, where Fig. 1a is an example pair of ground and aerial images and Fig. 1b are the ground and aerial sparse point clouds reconstructed from the images. Note that the sparse point cloud in this paper is the feature points (e.g. SIFT features) reconstructed by the structure-from-motion (SfM) procedure. In contrast, the dense point cloud is the points reconstructed by the multi-view stereo (MVS) procedure via pixel-wise dense matching. In this paper, only the sparse point cloud is involved during the point cloud merging process. Fig. 1 shows that there are two key issues for architectural scene reconstruction from ground and aerial images: (1) how to match the ground and aerial images with substantial variations in viewpoint and scale (Fig. 1a), and (2) how to merge the ground and aerial point clouds with drift phenomena and notable differences in noise level, density and accuracy (Fig. 1b).

To deal with the ground-to-aerial image matching problem, in this paper the ground image is warped to the viewpoint of the aerial image, by which the differences in viewpoint and scale between the two kinds of images are eliminated. Unlike the method proposed in Shan et al. (2014), which synthesizes the aerial-view image using the spatially discrete ground dense point cloud, the image synthesis method here resorts to the spatially continuous ground sparse mesh, which is reconstructed from the ground sparse point cloud. Then, the synthetic image is matched with the target aerial image by SIFT feature point extraction and matching. Subsequently, the putative point match outliers are filtered out by the following two techniques: (1) a consistency check of the feature scales and principal orientations between the point matches and (2) an affine transformation verification of the feature locations between the point matches.

After matching the ground and aerial images, rather than aligning the point clouds by estimating a similarity transformation between them, the point clouds are merged together by a global bundle adjustment to deal with the possible scene drift phenomenon. To achieve that, the obtained point matches are linked to the original aerial tracks first, and then a global bundle adjustment is performed to merge the ground and aerial point clouds with the augmented aerial tracks and original ground tracks.

This work has the following three main contributions: (1) the aerial-view image is synthesized based on the ground sparse mesh, (2) the putative ground-to-aerial point matches are filtered by geometrical consistency check and geometrical model verification, and (3) the ground and aerial point clouds are merged via bundle adjustment by linking the ground-to-aerial tracks.

The rest of this paper is organized as follows: Section 2 introduces some related works. Our proposed method is described in Section 3 and evaluated in Section 4. Section 5 gives an extension of our proposed method. Finally, Section 6 offers some concluding remarks.

Section snippets

Related work

There are four main categories of works related to ours: ground-to-aerial image matching; point match outliers filtering; image synthesis and rendering; and ground-to-aerial point cloud alignment.

Proposed method

In this paper, complete architectural scene reconstruction is achieved by first matching the ground and aerial images and then merging the ground and aerial point clouds. The pipeline of the proposed method is shown in Fig. 2. The inputs of the method are the ground and aerial images, and the outputs are the merged ground and aerial sparse point clouds. The method contains three main steps: pre-processing, ground-to-aerial image matching, and ground-to-aerial point cloud merging, which are

Experimental results

In this section, the proposed architectural scene reconstruction method by merging the ground and aerial point clouds is evaluated. First, four datasets used for method evaluation are presented. Then, the proposed methods of ground-to-aerial image matching and ground-to-aerial point cloud merging are evaluated based on the evaluation datasets.

Extension: from sparse to dense point cloud

In this section, we briefly introduce how to produce an integrated dense point cloud based on the merged sparse point clouds and cameras, which is a straightforward extension of this paper.

As the point clouds considered in this paper are sparse feature point clouds, in order to give a complete and detailed scene reconstruction, multiple-view stereo (MVS) should be performed to generate dense points, which is a standard procedure in image based modeling. Computing a depth-map for each image

Conclusion

In this paper, the issue of 3D preservation of ancient Chinese architecture by merging the ground and aerial point clouds is addressed. We propose dealing with ground-to-aerial image matching and ground-to-aerial point cloud merging problems in a unified framework. By taking advantage of the spatially continuous mesh, the aerial-view images can be synthesized without artifact holes, by which the differences in viewpoint and scale between ground and aerial images are largely eliminated and the

Acknowledgement

This work was supported by the National Science Foundation of China (NSFC) under grants 61333015, 61421004, 61632003, and 61473292.

References (54)

H. Bay et al.
Speeded-up robust features (SURF)
Comp. Vis. Image Understand.
(2008)
X. Gao et al.
Accurate and efficient ground-to-aerial model alignment
Patt. Recog.
(2018)
F. Rottensteiner et al.
Results of the ISPRS benchmark on urban object detection and 3D building reconstruction
ISPRS J. Photogram. Rem. Sens.
(2014)
S. Agarwal et al.
Building Rome in a day
Commun. ACM
(2011)
S. Agarwal et al.
Bundle adjustment in the large
A. Banno et al.
Flying laser range sensor for large-scale site-modeling and its applications in Bayon digital archival project
Int. J. Comp. Vis.
(2008)
M. Bansal et al.
Ultrawide baseline facade matching for geo-localization
M. Bansal et al.
Geo-localization of street views with aerial image databases
J. Bartelsen et al.
Orientation and dense reconstruction of unordered terrestrial and aerial wide baseline image sets
ISPRS Ann. Photogram., Rem. Sens. Spat. Inf. Sci.
(2012)
P.J. Besl et al.
A method for registration of 3-D shapes
IEEE Trans. Patt. Anal. Mach. Intell.
(1992)

A. Bódis-Szomorú et al.

Efficient volumetric fusion of airborne and street-side data for urban reconstruction

H. Cui et al.

HSfM: Hybrid structure-from-motion

H. Cui et al.

Efficient large-scale structure from motion by fusing auxiliary imaging information

IEEE Trans. Image Process.

(2015)

M.A. Fischler et al.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

(1981)

Y. Furukawa et al.

Accurate, dense, and robust multiview stereopsis

IEEE Trans. Patt. Anal. Mach. Intell.

(2010)

M. Garland et al.

Surface simplification using quadric error metrics

N. Greene et al.

Hierarchical z-buffer visibility

Y. Guo et al.

Rotational projection statistics for 3D local surface description and object recognition

Int. J. Comp. Vis.

(2013)

R. Hartley et al.

Multiple View Geometry in Computer Vision

(2003)

J. Heinly et al.

Reconstructing the world∗ in six days

K. Ikeuchi et al.

The great buddha project: digitally archiving, restoring, and analyzing cultural heritage objects

Int. J. Comp. Vis.

(2007)

Jancosek, M., Pajdla, T., 2014. Exploiting Visibility Information in Surface Reconstruction to Preserve Weakly...

H. Jégou et al.

Improving bag-of-features for large scale image search

Int. J. Comp. Vis.

(2010)

A.E. Johnson et al.

Using spin images for efficient object recognition in cluttered 3D scenes

IEEE Trans. Patt. Anal. Mach. Intell.

(1999)

F. Lafarge et al.

Building large urban environments from unstructured point data

M. Li et al.

Manhattan-world urban reconstruction from point clouds

W.-Y. Lin et al.

RepMatch: Robust feature matching and pose for reconstructing modern cities

Cited by (40)

Automatic building footprint extraction from photogrammetric and LiDAR point clouds using a novel improved-Octree approach
2024, Journal of Building Engineering
Extracting building footprints from optical data is a time-consuming process. Automatic extraction of building footprints from point clouds is a challenging problem in terms of geometric irregularities, noisy points, points density, and accuracy. The aim of this paper is to automatically extract and regularize building footprints using point clouds with a new approach called Improved Octree (I-Octree) by modifying the Octree method. The method consists of the separation of ground and above ground objects from the point cloud by Simple Morphological Filter (SMRF), the removing noisy points from point cloud with Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, the automatic extraction of building footprints by I-Octree, and the regularization of the building footprints with Automatic Building Outline Regularization (ABORE) method. The proposed approach was implemented on photogrammetric and Light Detection and Ranging (LiDAR) in four test areas. Ground truth maps were utilized as reference data for accuracy analysis by using pixel-based accuracy method. The accuracy results were above 90 % for the photogrammetric point clouds and above 97 % for the LiDAR point cloud. It was proven that the proposed approach can extract and regularize the selected buildings with high accuracy compared the studies in literature. In conclusion, it was demonstrated that the proposed approach enables the automatic extraction and regularization of building footprints from point clouds. Consequently, the map production process with point cloud data is facilitated to be both more efficient and rapid, and the results confirm the high efficacy of the proposed approach.
Fusion of aerial, MMS and backpack images and point clouds for optimized 3D mapping in urban areas
2023, ISPRS Journal of Photogrammetry and Remote Sensing
Photorealistic 3D models are important data sources for digital twin cities and smart city applications. These models are usually generated from data collected by aerial or ground-based platforms (e.g., mobile mapping systems (MMSs) and backpack systems) separately. Aerial and ground-based platforms capture data from overhead and ground surfaces, respectively, offering complementary information for better 3D mapping in urban areas. Particularly, backpack mapping systems have gained popularity for 3D mapping in urban areas in recent years, as they offer more flexibility to reach regions (e.g., narrow alleys and pedestrian routes) inaccessible by vehicle-based MMSs. However, integration of aerial and ground data for 3D mapping suffers from difficulties such as tie-point matching among images from different platforms with large differences in perspective, coverage, and scale. Optimal fusion of the results from different platforms is also challenging. Therefore, this paper presents a novel method for the fusion of aerial, MMS, and backpack images and point clouds for optimized 3D mapping in urban areas. A geometric-aware model for feature matching is developed based on the SuperGlue algorithm to obtain sufficient tie-points between aerial and ground images, which facilitates the integrated bundle adjustment of images to reduce their geometric inconsistencies and the subsequent dense image matching to generate 3D point clouds from different image sources. After that, a graph-based method considering both geometric and texture traits is developed for the optimal fusion of point clouds from different sources to generate 3D mesh models of better quality. Experiments conducted on a challenging dataset in Hong Kong demonstrated that the geometric-aware model could obtain sufficient accurately matched tie-points among the aerial, MMS, and backpack images, which enabled the integrated bundle adjustment of the three image datasets to generate properly aligned point clouds. Compared with the results obtained from state-of-the-art commercial software, the 3D mesh models generated from the proposed point cloud fusion method exhibited better quality in terms of completeness, consistency, and level of detail.
A framework for reconstructing building parametric models with hierarchical relationships from point clouds
2023, International Journal of Applied Earth Observation and Geoinformation
Parametric reconstruction based on point clouds of 3D scenes has attracted much attention due to its application prospects in many fields. This paper proposes a novel framework for reconstructing parametric models of buildings with hierarchical relationships from point clouds. Different from traditional approaches focusing on extracting geometric primitives, this work aims to combine the semantic hierarchy of building components with edge measurements based on vertex connections to output parametric models. We show a three-stage automated pipeline for reconstructing parametric models from point clouds of buildings. Regarding to the first stage, we design a deep network architecture to achieve the task of hierarchical segmentation, and propose a metric to evaluate the semantic consistency of different hierarchies. In the second stage, in order to predict the position of corners and filter the correct connections to construct the skeleton-graph of the model, we present a deep network architecture that converts point clouds into skeleton-graph model. The network takes the labeled 3D points output by the semantic hierarchical segmentation network as input, and then outputs the skeleton-graph of the point clouds, which is a set of edge segments connected by corner points. In the third stage, the semantic hierarchical segmentation information of the point clouds is embedded as attributes, and then the geometric parameters are measured according to the edges connected by vertices, and finally the semantic information and geometric features are mapped into the schema of CityGML. We pre-train and validate the reliability of our framework on a finely crafted synthetic dataset and finally we conduct transfer learning and fine-tune the framework on a real scene dataset. Experiments show that our method not only generates high-quality hierarchical parametric models but also recovers clean features and is robust to noise. This study provides practical guidance and technical references for developing more intelligent modeling algorithms that could support data-driven decision-making in smart cities.
Optimal planning of indoor laser scans based on continuous optimization
2022, Automation in Construction
Citation Excerpt :
With the advance in digital techniques, sophisticated tools such as laser scanners have drawn extensive attention to three-dimensional (3D) information in the field of quality assessment and indoor information acquisition [1,2].
The automatic scanning system of terrestrial laser scanning requires an efficient scanning solution owing to the time-consuming and labor-intensive data acquisition process. Most existing methods tend to achieve a prior scan planning by using a discrete solving process but need a user-defined discretization resolution, which often leads to the challenge of finding the scanning locations. This study describes a two-dimensional (2D) based scheme for scan planning. In the proposed method, the scanning location consensus based on multi-objective optimization is designed to guarantee the registrability and completeness of scan planning. A breadth-first strategy combined with a backtracking process is employed to find a minimal scanning location set. Validation experiments have been conducted on two real-world scenes and verified the effectiveness and feasibility of the proposed method. The proposed method achieves the optimal scan planning and yields almost 90%, 70% improvement on planning time compared with greedy approach and expert plans separately.
RANSAC-based multi primitive building reconstruction from 3D point clouds
2022, ISPRS Journal of Photogrammetry and Remote Sensing
Building model reconstruction from 3D point clouds has been investigated for several decades with increasing interests. Building models represented by one or more parametric primitives can assure regularity and provide semantic information for the reconstructed buildings. However, there exist challenges in reliably determining building primitives, especially for compound buildings with multiple primitives. This paper presents a multi primitive reconstruction (MPR) approach to segment a compound bounding into several predefined primitives and determine their parameters from the point clouds. The method consists of primitive segmentation through a two-step RANSAC strategy, followed by holistic primitive fitting, and 3D Boolean operations. The first step segments the point cloud of a building into planar patches. The second step applies RANSAC strategy to further segment the predefined building primitives, where only points on adjacent planar patches are sampled to achieve high computational efficiency. The most probable primitive is then selected based on a set of quality metrics and corresponding parameters are holistically estimated with the identified inliers to form the building model. Finally, the 3D Boolean operation is used to reconstruct a topologically consistent 3D building model from its compositional primitives. The proposed RANSAC-MPR method has following advantages. (1) The framework for primitive segmentation is efficient since the sampling only occurs to adjacent planar patches; (2) the type of building primitives can be identified based on a score function without using advanced learning process; (3) compound buildings can be reconstructed through 3D union of the primitives determined by holistic fitting. Tested with 1054 buildings in three lidar and photogrammetry point clouds, the development is able to produce compound building models with regularized primitives at 85% boundary consistency and overall accuracy of 7 cm, which is about 0.14 times and 0.56 times ground point spacing for the lidar and photogrammetry datasets, respectively.
High-volume point cloud data simplification based on decomposed graph filtering
2021, Automation in Construction
Recent studies on three-dimensional (3D) point cloud data (PCD) simplification have played significant roles in computer-aided models for alleviating computational and storage burden. However, existing simplification methods are not suitable for the high-volume PCD with billions of point number, especially in construction industry. In this paper, a decomposed simplification method is developed to handle the high-volume buildings' PCD. Based on the divide-and-conquer philosophy, the new decomposed approach effectively reduces the memory usage. In the proposed approach, PCD is divided into several subsets according to the relationship of natural neighbor, and decomposed graph filtering with adaptive resampling rate is designed. It can be proved that the decomposed simplification results have no variance in the comparison with the optimal simplification of entire PCD. Verification experiments are conducted on different sizes of PCD, which indicate the effectiveness and feasibility of the proposed approach.

View all citing articles on Scopus

View full text

Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds

Abstract

Introduction

Section snippets

Related work

Proposed method

Experimental results

Extension: from sparse to dense point cloud

Conclusion

Acknowledgement

Comp. Vis. Image Understand.

Patt. Recog.

ISPRS J. Photogram. Rem. Sens.

Building Rome in a day

Commun. ACM

Bundle adjustment in the large

Flying laser range sensor for large-scale site-modeling and its applications in Bayon digital archival project

Int. J. Comp. Vis.

Ultrawide baseline facade matching for geo-localization

Geo-localization of street views with aerial image databases

Orientation and dense reconstruction of unordered terrestrial and aerial wide baseline image sets

ISPRS Ann. Photogram., Rem. Sens. Spat. Inf. Sci.

A method for registration of 3-D shapes

IEEE Trans. Patt. Anal. Mach. Intell.

Efficient volumetric fusion of airborne and street-side data for urban reconstruction

HSfM: Hybrid structure-from-motion

Efficient large-scale structure from motion by fusing auxiliary imaging information

IEEE Trans. Image Process.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

Accurate, dense, and robust multiview stereopsis

IEEE Trans. Patt. Anal. Mach. Intell.

Surface simplification using quadric error metrics

Hierarchical z-buffer visibility

Rotational projection statistics for 3D local surface description and object recognition

Int. J. Comp. Vis.

Multiple View Geometry in Computer Vision

Reconstructing the world∗ in six days

The great buddha project: digitally archiving, restoring, and analyzing cultural heritage objects

Int. J. Comp. Vis.

Improving bag-of-features for large scale image search

Int. J. Comp. Vis.

Using spin images for efficient object recognition in cluttered 3D scenes

IEEE Trans. Patt. Anal. Mach. Intell.

Building large urban environments from unstructured point data

Manhattan-world urban reconstruction from point clouds

RepMatch: Robust feature matching and pose for reconstructing modern cities