Ancient Chinese architecture 3D preservation by merging ground and aerial point clouds
Introduction
Ancient Chinese architecture is an important component of world architecture system, and its most significant characteristic is the timber framework. Compared with other architectural styles, although more delicate structures can be achieved, ancient Chinese architecture is more vulnerable to natural disasters, e.g. fire and earthquake. As a result, there is an urgent need for preservation of ancient Chinese architecture, and one of the best means is to digitally preserve it by reconstructing complete and detailed 3D models (Ikeuchi et al., 2007, Banno et al., 2008).
3D digitalization of architecture is an intensive research topic in the fields of computer vision and computer graphics. There are usually two ways for data acquisition, active vision based laser scanning and passive vision based image capturing. Laser scanning based methods (Nan et al., 2010, Lafarge and Mallet, 2011, Li et al., 2016), which are widely used in urban scene reconstruction, are not suitable for 3D digitalization of ancient Chinese architecture. The reasons are twofold: One is that as ancient Chinese architecture is structurally complicated, multi-viewpoint and close-range scanning are necessary to achieve a complete architectural model, which is inconvenient and impractical for cumbersome laser scanners, in particular for roof scanning; the other is that projected lasers of a certain frequency may damage the materials and paintings of ancient Chinese architecture. In contrast, image capturing based methods possess the strengths of lowcost and flexibility and are harmless to ancient Chinese architecture. As a result, the image based methods are preferred in this paper.
Image based architectural scene reconstruction is a classical and fundamental problem in the research fields of computer vision (Snavely et al., 2008, Furukawa and Ponce, 2010, Cui et al., 2015, Zheng et al., 2014) and remote sensing (Bartelsen et al., 2012, Mancini et al., 2013, Rottensteiner et al., 2014). Thanks to recent developments in algorithm efficiency and hardware performance, reconstruction systems have been extended from single buildings to an urban scale (Agarwal et al., 2011), nowadays even to a worldwide scale (Heinly et al., 2015). In order to achieve a complete 3D digitized model of ancient Chinese architecture that captures details of complex structures, e.g. cornices and brackets, usually two different sources of images, ground and aerial, are needed for close-range and large-scale photography (Bódis-Szomorú et al., 2016). When using both ground and aerial images, a common practice is to carry out the reconstruction separately for ground and aerial point clouds at first and then merge them afterwards. Considering the noisy nature of reconstructed 3D point clouds from image collections and the loss of rich textural and contextual information of 2D images in 3D point clouds, it is preferable to merge the point clouds via 2D image feature point matching rather than by direct 3D point cloud registration, e.g. ICP (Besl and McKay, 1992). The difficulties of merging ground and aerial point clouds are illustrated in Fig. 1, where Fig. 1a is an example pair of ground and aerial images and Fig. 1b are the ground and aerial sparse point clouds reconstructed from the images. Note that the sparse point cloud in this paper is the feature points (e.g. SIFT features) reconstructed by the structure-from-motion (SfM) procedure. In contrast, the dense point cloud is the points reconstructed by the multi-view stereo (MVS) procedure via pixel-wise dense matching. In this paper, only the sparse point cloud is involved during the point cloud merging process. Fig. 1 shows that there are two key issues for architectural scene reconstruction from ground and aerial images: (1) how to match the ground and aerial images with substantial variations in viewpoint and scale (Fig. 1a), and (2) how to merge the ground and aerial point clouds with drift phenomena and notable differences in noise level, density and accuracy (Fig. 1b).
To deal with the ground-to-aerial image matching problem, in this paper the ground image is warped to the viewpoint of the aerial image, by which the differences in viewpoint and scale between the two kinds of images are eliminated. Unlike the method proposed in Shan et al. (2014), which synthesizes the aerial-view image using the spatially discrete ground dense point cloud, the image synthesis method here resorts to the spatially continuous ground sparse mesh, which is reconstructed from the ground sparse point cloud. Then, the synthetic image is matched with the target aerial image by SIFT feature point extraction and matching. Subsequently, the putative point match outliers are filtered out by the following two techniques: (1) a consistency check of the feature scales and principal orientations between the point matches and (2) an affine transformation verification of the feature locations between the point matches.
After matching the ground and aerial images, rather than aligning the point clouds by estimating a similarity transformation between them, the point clouds are merged together by a global bundle adjustment to deal with the possible scene drift phenomenon. To achieve that, the obtained point matches are linked to the original aerial tracks first, and then a global bundle adjustment is performed to merge the ground and aerial point clouds with the augmented aerial tracks and original ground tracks.
This work has the following three main contributions: (1) the aerial-view image is synthesized based on the ground sparse mesh, (2) the putative ground-to-aerial point matches are filtered by geometrical consistency check and geometrical model verification, and (3) the ground and aerial point clouds are merged via bundle adjustment by linking the ground-to-aerial tracks.
The rest of this paper is organized as follows: Section 2 introduces some related works. Our proposed method is described in Section 3 and evaluated in Section 4. Section 5 gives an extension of our proposed method. Finally, Section 6 offers some concluding remarks.
Section snippets
Related work
There are four main categories of works related to ours: ground-to-aerial image matching; point match outliers filtering; image synthesis and rendering; and ground-to-aerial point cloud alignment.
Proposed method
In this paper, complete architectural scene reconstruction is achieved by first matching the ground and aerial images and then merging the ground and aerial point clouds. The pipeline of the proposed method is shown in Fig. 2. The inputs of the method are the ground and aerial images, and the outputs are the merged ground and aerial sparse point clouds. The method contains three main steps: pre-processing, ground-to-aerial image matching, and ground-to-aerial point cloud merging, which are
Experimental results
In this section, the proposed architectural scene reconstruction method by merging the ground and aerial point clouds is evaluated. First, four datasets used for method evaluation are presented. Then, the proposed methods of ground-to-aerial image matching and ground-to-aerial point cloud merging are evaluated based on the evaluation datasets.
Extension: from sparse to dense point cloud
In this section, we briefly introduce how to produce an integrated dense point cloud based on the merged sparse point clouds and cameras, which is a straightforward extension of this paper.
As the point clouds considered in this paper are sparse feature point clouds, in order to give a complete and detailed scene reconstruction, multiple-view stereo (MVS) should be performed to generate dense points, which is a standard procedure in image based modeling. Computing a depth-map for each image
Conclusion
In this paper, the issue of 3D preservation of ancient Chinese architecture by merging the ground and aerial point clouds is addressed. We propose dealing with ground-to-aerial image matching and ground-to-aerial point cloud merging problems in a unified framework. By taking advantage of the spatially continuous mesh, the aerial-view images can be synthesized without artifact holes, by which the differences in viewpoint and scale between ground and aerial images are largely eliminated and the
Acknowledgement
This work was supported by the National Science Foundation of China (NSFC) under grants 61333015, 61421004, 61632003, and 61473292.
References (54)
- et al.
Speeded-up robust features (SURF)
Comp. Vis. Image Understand.
(2008) - et al.
Accurate and efficient ground-to-aerial model alignment
Patt. Recog.
(2018) - et al.
Results of the ISPRS benchmark on urban object detection and 3D building reconstruction
ISPRS J. Photogram. Rem. Sens.
(2014) - et al.
Building Rome in a day
Commun. ACM
(2011) - et al.
Bundle adjustment in the large
- et al.
Flying laser range sensor for large-scale site-modeling and its applications in Bayon digital archival project
Int. J. Comp. Vis.
(2008) - et al.
Ultrawide baseline facade matching for geo-localization
- et al.
Geo-localization of street views with aerial image databases
- et al.
Orientation and dense reconstruction of unordered terrestrial and aerial wide baseline image sets
ISPRS Ann. Photogram., Rem. Sens. Spat. Inf. Sci.
(2012) - et al.
A method for registration of 3-D shapes
IEEE Trans. Patt. Anal. Mach. Intell.
(1992)
Efficient volumetric fusion of airborne and street-side data for urban reconstruction
HSfM: Hybrid structure-from-motion
Efficient large-scale structure from motion by fusing auxiliary imaging information
IEEE Trans. Image Process.
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun. ACM
Accurate, dense, and robust multiview stereopsis
IEEE Trans. Patt. Anal. Mach. Intell.
Surface simplification using quadric error metrics
Hierarchical z-buffer visibility
Rotational projection statistics for 3D local surface description and object recognition
Int. J. Comp. Vis.
Multiple View Geometry in Computer Vision
Reconstructing the world∗ in six days
The great buddha project: digitally archiving, restoring, and analyzing cultural heritage objects
Int. J. Comp. Vis.
Improving bag-of-features for large scale image search
Int. J. Comp. Vis.
Using spin images for efficient object recognition in cluttered 3D scenes
IEEE Trans. Patt. Anal. Mach. Intell.
Building large urban environments from unstructured point data
Manhattan-world urban reconstruction from point clouds
RepMatch: Robust feature matching and pose for reconstructing modern cities
Cited by (40)
Automatic building footprint extraction from photogrammetric and LiDAR point clouds using a novel improved-Octree approach
2024, Journal of Building EngineeringFusion of aerial, MMS and backpack images and point clouds for optimized 3D mapping in urban areas
2023, ISPRS Journal of Photogrammetry and Remote SensingA framework for reconstructing building parametric models with hierarchical relationships from point clouds
2023, International Journal of Applied Earth Observation and GeoinformationOptimal planning of indoor laser scans based on continuous optimization
2022, Automation in ConstructionCitation Excerpt :With the advance in digital techniques, sophisticated tools such as laser scanners have drawn extensive attention to three-dimensional (3D) information in the field of quality assessment and indoor information acquisition [1,2].
RANSAC-based multi primitive building reconstruction from 3D point clouds
2022, ISPRS Journal of Photogrammetry and Remote SensingHigh-volume point cloud data simplification based on decomposed graph filtering
2021, Automation in Construction