Review
Automated digital modeling of existing buildings: A review of visual object recognition methods

https://doi.org/10.1016/j.autcon.2020.103131Get rights and content

Highlights

  • Review of building related object recognition methods

  • Coverage of building component classes reported using a building taxonomy

  • Object recognition performances summarized and reported

  • A simple conceptualization of object recognition systems presented

Abstract

Digital building representations enable and promote new forms of simulation, automation, and information sharing. However, creating and maintaining these representations is prohibitively expensive. In an effort to make the adoption of this technology easier, researchers have been automating the digital modeling of existing buildings by applying reality capture devices and computer vision algorithms. This article is a summary of the efforts of the past ten years, with a particular focus on object recognition methods. We rectify three limitations of existing review articles by describing the general structure and variations of object recognition systems and performing an extensive and quantitative comparative performance evaluation. The coverage of building component classes (i.e. semantic coverage) and recognition performances are reported in-depth and framed using a building taxonomy. Research programs demonstrate sparse semantic coverage with a clear bias towards recognizing floor, wall, ceiling, door, and window classes. Comprehensive semantic coverage of building infrastructure will require a radical scaling and diversification of efforts.

Introduction

In 2017, the stored value of all global real estate was US$228 trillion with commercial real estate comprising US$32.3 trillion [138]. Assets in this system are subject to adaptive reuse, addition, alteration, preservation, reconstruction, recycle, rehabilitation, remodeling, renovation, restoration, retrofit, repair, and cleaning [123]. Optimization of these operations is facilitated by the development of information systems. Companies in almost every industry are building digital assets, expanding digital usage, and creating a more digital workforce [27]. Building information modeling (BIM) is the building industry's contribution to this digitization trend. BIM is a system of digital building representation (Fig. 1) created to give architects, engineers, contractors, and facility managers more control of over the built environment by enabling and promoting new forms of simulation, automation, and information sharing.

A comprehensive list of well established and emerging building information model (BIM) uses was assembled by Change Agents as part of their BIMe initiative [132]. Their list reveals the vast diversity of information BIMs can contain. Since every model has limited capacity and could not possibly represent every detail of a building, modeling is fundamentally a process of selection. The modeler makes inclusion decisions based on relevance and convenience. “We try to abstract only the information essential for our objectives while ignoring other information” (p. 1) [92]. Geometrical information is often relevant when representing buildings and is, therefore, the foundation of every BIM [20].

Geometric information describes the shape of a building. This includes both the shape of individual building components as well as the way these building components are arranged to make up the building (e.g. aggregation relationships, topological relationships, and directional relationships [135]). Selecting the level of fidelity is central to geometrical modeling. Triangle meshes and point clouds [153] can preserve a high degree of surface detail and are often used for higher fidelity representation. Lower fidelity representations can be made by approximating building components using geometric primitives [82,83,161] such as planes [119], lines, rectangles, cuboids [150], circles [40], ellipses, arcs, spheres, cylinders, cones, ellipsoids, and composites. These geometric primitives provide a compact set of parameters/ features used to describe the geometry. More complex curved surfaces can be represented by Bezier splines and non-uniform rational basis splines (NURBS) [12,13,19,48,50]. Decreasing the fidelity of a representation sacrifices flexibility and surface detail for storage efficiency. The literature refers to these varying degrees of fidelity using several dichotomies (Table 1).

With geometry as the foundation, modelers augment the representation with object metadata (i.e. semantics). This metadata can relate to any facet of the built environment, including design, engineering, fabrication, construction, and facility management [132]. For example, a window could potentially have metadata including: unit cost, area, weight, material, manufacturer, supplier, heat transfer coefficient, adjoining room name, and installation date. These semantics can be used to perform analysis and computation far exceeding the capabilities provided by geometry alone.

BIM-based analysis, simulation, and automation often span several information systems. Interoperability between information systems is improved by following standards that define building taxonomies. Example standards include Uniformat [31], OmniClass [137], and Industry Foundations Classes (IFC) [71]. Uniformat and Table 21 of the OmniClass system provide a standardized basis for classifying information using hierarchies of construction elements. The OmniClass system defines an element as: a major component, assembly, or “construction entity part which, in itself or in combination with other parts, fulfills a predominating function of the construction entity” (ISO 12006-2) [72]. IFC was developed by the International Alliance for Interoperability, now known as buildingSMART International. IFC provides a formalized representation of typical building components (e.g., wall, door), attributes (e.g., type, function, geometric description), relationships (e.g., physical relationships, such as supported-by, connected-to), and more abstract concepts, such as schedules, activities, spaces, and construction costs, in the form of entities [77].

Despite its superior capabilities for managing information, BIM has yet to enable multi-decade information stewardship [116]. The organizations that have adopted BIM find the upkeep of models [12,70] prohibitively expensive and facility operators in the United States continue to spend $4.8 billion annually searching for, validating, and recreating facility information [52]. In an effort to make BIM adoption easier, researchers have been automating parts of BIM creation by applying reality capture devices, computer vision, and 3D modeling algorithms [32,51,105,135,143,151]. The process has four steps (Fig. 2):

  • 3D Reconstruction - the existing building is digitized using reality capture technologies, e.g. laser scanners and range cameras,

  • Semantic modeling - subsets of the 3D reconstruction are sorted into semantic classes defined by a BIM taxonomy, e.g. wall, door, window, and column,

  • Geometrical modeling - the shape of each class instance and the spatial relationships between class instances are described using geometric parameters, e.g. height, elevation, equal height to, parallel to, and

  • Building information modeling - semantics and geometrical parameters are used to generate building components in a BIM authoring software.

Many reviewed publications address automated geometrical modeling, fitting polyhedral models to 3D reconstructions and imposing additional constraints [110,163], such as alignment with gravity [150], co-linearity [50], rectangularity, and parallelism [24]. In fact, automation of geometrical modeling has progressed to the points where there are various commercial software applications available on the market (e.g., EdgeWise by ClearEdge3D, CloudWorx by Digital Scan 3D, PolyWorks Modeler by innovmetric, RealWorks by Trimble) that perform semi-automated generation of BIMs and computer-aided design (CAD) models from various sources [84].

Semantic modeling encompasses a number of methods that transform input digital representations (i.e. 3D reconstructions) into more abstract semantic representations. Published articles will typically each have a narrow focus on a few types of semantic object categories such as plumbing [129], structural steel [21], and walls, floor, doors, and windows [8,16,94,101,111].

Due to the diversity of objects and systems encountered in buildings as well as the diversity of applications and contexts these objects and systems are found in, no method has yet been able to fulfil the promise of comprehensive semantically-rich BIM creation.

On our research community's mission to successfully contend with the diversity of buildings, review articles provide frameworks that guide our community's process of combining, scaling, and creating visual object recognition methods. Here we identify three limitations of the review articles in Table 2. These limitations relate to method evaluation: (1) understanding fundamental function, (2) inferring scope of effectiveness, and (3) performing an extensive and quantitative comparative performance evaluation.

The review articles in Table 2 partition semantic modeling into processes including: feature extraction (geometric, radiometric, colorimetric, and contextual), segmentation, object recognition, and classification [32]; geometric primitive detection, point cloud clustering, shape fitting, one-to-one matching [105]; object identification, extraction of relational and semantic information, data-driven feature/shape/material/statistical matching, model-based knowledge and contextual information based matching [143]. These review articles successfully account for the many abstraction methods involved in semantic modeling, but fail to present these methods with any explanation of their fundamental function. For example, Tang et al. [135] states that these abstraction methods are necessary because “explicit shape [lower abstraction] representations… are not very well suited for… automatically segmenting or recognizing building components”. The article does not elaborate on why low abstraction representations are not well suited and why higher abstraction representations are needed. A hint of insight about fundamental function is found in Pătrăucean et al. [105] when it states that these abstraction methods “encode, as compactly as possible, the distinctiveness of an object.” This distillation of distinct (discriminating) features enables classifiers to more effectively perform object recognition. This kind of clarity about function enables researchers to accurately interpret parts of methods as being vital or unnecessary. Understanding fundamental function is especially important when adopting parts of existing methods and is currently missing in the review articles.

Inferring scope of effectiveness is a critical step in:

  • (1)

    applying existing methods to new data, new contexts, and new representations or as Tang et al. [135] states, determining “extensibility to new environments. Can the algorithm be extended to handle new types of objects? Can the algorithm be applied to different types of environments, or is it specific to one or a certain class of spaces?” and

  • (2)

    designing new methods to effectively process data falling outside the scope of effectiveness of other existing methods

Existing methods are typically validated on a particular domain represented by a validation dataset. The existing method's scope of effectiveness is defined by all domains that are sufficiently similar to the domain represented by the validation dataset. “Sufficiently” depends on the method's generalization [102] ability. Any discussion of similarity between validation domains and target domains requires a rigorous definition of the possible dimensions of similarity. These dimensions of similarity follow directly from any framework for describing data (i.e. representations). Pătrăucean et al. [105] describes representations by completeness, compactness, and uniqueness. Both Pătrăucean et al. [105] and Tang et al. [135] describe representations as explicit, implicit, parametric, and non-parametric. It can be argued that a method validated on implicit representations will have other similar implicit representations within its scope of effectiveness. Although accurate, this determination lacks the requisite detail to be useful. On the other hand, Tang et al. [135] presents a second and more rigorous framework for describing representations that includes the dimensions:

  • types of objects present

  • level of sensor noise

  • level of occlusion

  • level of clutter

  • presence of moving objects

  • presence of specular surfaces

  • presence of dark (low-reflectance) surfaces

  • sparseness of data

Comparing a validation domain and target domain using these dimensions provides a similarity analysis with greater detail. However, even these dimensions have shortcomings. “Level of clutter” is not an objective description of digital representations, since it will change depending on the particular recognition task and its definition of clutter. The relevance of “presence of specular and dark surfaces” changes depending on the computer vision hardware used to collect the digital representations, since these types of surfaces cause erroneous measurements with some hardware [104] and not with others. In order to rigorously and objectively infer a method's scope of effectiveness, the community requires a new framework for describing representations. This framework must be: comprehensive, agnostic to computer-vision hardware, and agnostic to the particular recognition task the representation is used for.

In order for the research community to understand the current state of research, the literature must be surveyed to determine overall semantic coverage, i.e. what building component classes fall within the scope of effectiveness of existing methods, and the object recognition performance achieved for each semantic class. According to Pătrăucean et al. [105], “the high diversity of works… in terms of application field, methodology, and goals, makes it difficult to [perform] an objective comparative analysis.” However, this kind of comparative analysis is important for the community and must be performed.

This article reviews visual object recognition methods as they relate to automating the digital modeling of existing buildings and has three main points of departure from the review articles listed in Table 2. These points of departure are:

  • 1.

    Understanding fundamental function. We aggregate all processes involved in semantic modeling under the single concept of abstraction. We assert the two fundamental functions of abstraction are (1) feature standardization and (2) discriminator distillation. This is covered in Section 2.3.

  • 2.

    Inferring scope of effectiveness. We present a new framework for describing representations. Our set of characteristics is comprehensive and agnostic to computer-vision hardware and the application of the representation. The set enables a rigorous delineation of an object recognition method's scope of effectiveness. This framework is described in Section 2.1

  • 3.

    Performing an extensive and quantitative comparative performance evaluation. We present an objective comparative analysis of many visual object recognition methods using a set of commonalities referred to as the Minimal Standard Set. Semantic coverage and recognition performances of presented methods are reported in-depth and framed using a building taxonomy. This analysis is covered in Section 3.

Section snippets

Object recognition systems: general structure and variations

Here we define a search space as any digital representation, e.g. point cloud, 2D image or photo, and RGB-D frame, in which an object recognition algorithm will search for objects of interest. These objects of interest are defined by digital object representations sourced either from data or expert conception. Object recognition is the process of recognizing equivalence between an object representation and a subset of a search space. It requires a system capable of perception and analogical

Comparative performance analysis

We have reviewed the general structure and variations of visual object recognition methods and provided a rigorous framework for describing the digital representations these methods process. Now we perform an extensive and quantitative comparative performance evaluation of the recognition methods presented in the literature. The scope and methodology of the performance evaluation are as follows:

The scope of the performance evaluation is limited to visual object recognition methods involved in

Conclusions

The object recognition literature, as it relates to the automated digital modeling of existing buildings, was reviewed. Evidently, no method has yet been able to fulfil the promise of comprehensive semantically-rich BIM creation. In order to successfully contend with the diversity of buildings, our community must combine and scale existing object recognition methods and create new object recognition methods. In order to facilitate this mission, three limitations of existing review articles were

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was supported by the National Science Foundation (NSF) of the United States of America under award number 1562438. Their support is gratefully acknowledged. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Mention of trade names in this article does not imply endorsement by the University of Texas at Austin or NSF.

References (166)

  • T. Czerniawski et al.

    Pipe spool recognition in cluttered point clouds using a curvature-based shape descriptor

    Autom. Constr.

    (2016)
  • T. Czerniawski et al.

    6D DBSCAN-based segmentation of building point clouds for planar object recognition

    Autom. Constr.

    (2018)
  • L. Díaz-Vilariño et al.

    Indoor daylight simulation performed on automatically generated as-built 3D models

    Energ. Buildings

    (2014)
  • A. Dimitrov et al.

    Vision-based material recognition for automated monitoring of construction progress and generating building information modeling from unordered site image collections

    Adv. Eng. Inform.

    (2014)
  • H. Fathi et al.

    Automated as-built 3D reconstruction of civil infrastructure using computer vision: achievements, opportunities, and challenges

    Adv. Eng. Inform.

    (2015)
  • Y. Ham et al.

    Mapping actual thermal properties to building elements in gbXML-based BIM for reliable building energy performance modeling

    Autom. Constr.

    (2015)
  • H. Hamledari et al.

    Automated computer vision-based detection of components of under-construction indoor partitions

    Autom. Constr.

    (2017)
  • J. Jung et al.

    Productive modeling for development of as-built BIM of existing indoor structures

    Autom. Constr.

    (2014)
  • C. Kropp et al.

    Interior construction state recognition with 4D BIM registered image sequences

    Autom. Constr.

    (2018)
  • D.F. Laefer et al.

    Toward automatic generation of 3D steel structures for building information modelling

    Autom. Constr.

    (2017)
  • Q. Lu et al.

    Image-driven fuzzy-based system to construct as-is IFC BIM objects

    Autom. Constr.

    (2018)
  • R. Maalek et al.

    Extraction of pipes and flanges from point clouds for automated verification of pre-fabricated modules in oil and gas refinery projects

    Autom. Constr.

    (2019)
  • C. Mura et al.

    Automatic room detection and reconstruction in cluttered indoor environments with complex room layouts

    Comput. Graph.

    (2014)
  • A.C. Murillo et al.

    Visual door detection integrating appearance and shape cues

    Robot. Auton. Syst.

    (2008)
  • M. Nahangi et al.

    Automated 3D compliance checking in pipe spool fabrication

    Adv. Eng. Inform.

    (2014)
  • M. Nahangi et al.

    Automated assembly discrepancy feedback using 3D imaging and forward kinematics

    Autom. Constr.

    (2015)
  • M. Neuhausen et al.

    Automatic window detection in facade images

    Autom. Constr.

    (2018)
  • A. Adam et al.

    H-Ransac: a hybrid point cloud segmentation combining 2D and 3D data

    ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences

    (2018)
  • A. Adan et al.

    3D reconstruction of interior wall surfaces under occlusion and clutter

  • E. Agapaki et al.

    CLOI: A shape classification benchmark dataset for industrial facilities

  • M.F. Ahmed et al.

    Automatic detection of cylindrical objects in built facilities

    J. Comput. Civ. Eng.

    (2014)
  • H. Aljumaily et al.

    Big-data approach for three-dimensional building extraction from aerial laser scanning

    J. Comput. Civ. Eng.

    (2015)
  • M. Alzantot et al.

    Crowdinside: automatic construction of indoor floorplans

  • I. Anagnostopoulos et al.

    Detection of walls, floors, and ceilings in point cloud data

  • F. Apollonio et al.

    3D modeling and data enrichment in digital reconstruction of architectural heritage

    ISPRS Archives

    (2013)
  • I. Armeni et al.

    3D semantic parsing of large-scale indoor spaces

  • F. Banfi et al.

    BIM automation: Advanced modeling generative process for complex structures

  • M. Bassier et al.

    IFC wall reconstruction from unstructured point clouds, ISPRS annals of photogrammetry

    Remote Sensing & Spatial Information Sciences

    (2018)
  • M. Bassier et al.

    Automated semantic labelling of 3D vector models for scan-to-BIM

  • S. Becker et al.

    Grammar-supported 3D indoor reconstruction from point clouds for "as-built" BIM

    ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences

    (2015)
  • M. Blaha et al.

    Large-scale semantic 3D reconstruction: An adaptive multi-resolution model for multi-class volumetric labeling

  • A. Borrmann et al.

    Principles of geometric modeling

  • A. Borrmann et al.

    Building Information Modeling: Why? What? How?, Building Information Modeling: Technology Foundations and Industry Practice

    (2018)
  • A. Budroni et al.

    Automated 3D reconstruction of interiors from point clouds

    Int. J. Archit. Comput.

    (2010)
  • A. Budroni, J. Böhm. Toward automatic reconstruction of interiors from laser data, in: Proc. Virtual Reconstruction and...
  • J. Bughin, E. Hazan, S. Ramaswamy, M. Chui, T. Allas, P. Dahlström, N. Henke, M. Trench. Artificial Intelligence–The...
  • R. Cabezas et al.

    Semantically-aware aerial reconstruction from multi-modal data

  • J. Chai et al.

    Automatic as-built modeling for concurrent progress tracking of plant construction based on laser scanning

    Concurr. Eng.

    (2016)
  • A. Chang et al.

    Matterport3D: Learning from RGB-D Data in Indoor Environments

    (2017)
  • R.P. Charette et al.

    UNIFORMAT II Elemental Classification for Building Specifications, Cost Estimating, and Cost Analysis

    (1999)
  • Cited by (0)

    View full text