ReviewAutomated digital modeling of existing buildings: A review of visual object recognition methods
Introduction
In 2017, the stored value of all global real estate was US$228 trillion with commercial real estate comprising US$32.3 trillion [138]. Assets in this system are subject to adaptive reuse, addition, alteration, preservation, reconstruction, recycle, rehabilitation, remodeling, renovation, restoration, retrofit, repair, and cleaning [123]. Optimization of these operations is facilitated by the development of information systems. Companies in almost every industry are building digital assets, expanding digital usage, and creating a more digital workforce [27]. Building information modeling (BIM) is the building industry's contribution to this digitization trend. BIM is a system of digital building representation (Fig. 1) created to give architects, engineers, contractors, and facility managers more control of over the built environment by enabling and promoting new forms of simulation, automation, and information sharing.
A comprehensive list of well established and emerging building information model (BIM) uses was assembled by Change Agents as part of their BIMe initiative [132]. Their list reveals the vast diversity of information BIMs can contain. Since every model has limited capacity and could not possibly represent every detail of a building, modeling is fundamentally a process of selection. The modeler makes inclusion decisions based on relevance and convenience. “We try to abstract only the information essential for our objectives while ignoring other information” (p. 1) [92]. Geometrical information is often relevant when representing buildings and is, therefore, the foundation of every BIM [20].
Geometric information describes the shape of a building. This includes both the shape of individual building components as well as the way these building components are arranged to make up the building (e.g. aggregation relationships, topological relationships, and directional relationships [135]). Selecting the level of fidelity is central to geometrical modeling. Triangle meshes and point clouds [153] can preserve a high degree of surface detail and are often used for higher fidelity representation. Lower fidelity representations can be made by approximating building components using geometric primitives [82,83,161] such as planes [119], lines, rectangles, cuboids [150], circles [40], ellipses, arcs, spheres, cylinders, cones, ellipsoids, and composites. These geometric primitives provide a compact set of parameters/ features used to describe the geometry. More complex curved surfaces can be represented by Bezier splines and non-uniform rational basis splines (NURBS) [12,13,19,48,50]. Decreasing the fidelity of a representation sacrifices flexibility and surface detail for storage efficiency. The literature refers to these varying degrees of fidelity using several dichotomies (Table 1).
With geometry as the foundation, modelers augment the representation with object metadata (i.e. semantics). This metadata can relate to any facet of the built environment, including design, engineering, fabrication, construction, and facility management [132]. For example, a window could potentially have metadata including: unit cost, area, weight, material, manufacturer, supplier, heat transfer coefficient, adjoining room name, and installation date. These semantics can be used to perform analysis and computation far exceeding the capabilities provided by geometry alone.
BIM-based analysis, simulation, and automation often span several information systems. Interoperability between information systems is improved by following standards that define building taxonomies. Example standards include Uniformat [31], OmniClass [137], and Industry Foundations Classes (IFC) [71]. Uniformat and Table 21 of the OmniClass system provide a standardized basis for classifying information using hierarchies of construction elements. The OmniClass system defines an element as: a major component, assembly, or “construction entity part which, in itself or in combination with other parts, fulfills a predominating function of the construction entity” (ISO 12006-2) [72]. IFC was developed by the International Alliance for Interoperability, now known as buildingSMART International. IFC provides a formalized representation of typical building components (e.g., wall, door), attributes (e.g., type, function, geometric description), relationships (e.g., physical relationships, such as supported-by, connected-to), and more abstract concepts, such as schedules, activities, spaces, and construction costs, in the form of entities [77].
Despite its superior capabilities for managing information, BIM has yet to enable multi-decade information stewardship [116]. The organizations that have adopted BIM find the upkeep of models [12,70] prohibitively expensive and facility operators in the United States continue to spend $4.8 billion annually searching for, validating, and recreating facility information [52]. In an effort to make BIM adoption easier, researchers have been automating parts of BIM creation by applying reality capture devices, computer vision, and 3D modeling algorithms [32,51,105,135,143,151]. The process has four steps (Fig. 2):
- •
3D Reconstruction - the existing building is digitized using reality capture technologies, e.g. laser scanners and range cameras,
- •
Semantic modeling - subsets of the 3D reconstruction are sorted into semantic classes defined by a BIM taxonomy, e.g. wall, door, window, and column,
- •
Geometrical modeling - the shape of each class instance and the spatial relationships between class instances are described using geometric parameters, e.g. height, elevation, equal height to, parallel to, and
- •
Building information modeling - semantics and geometrical parameters are used to generate building components in a BIM authoring software.
Many reviewed publications address automated geometrical modeling, fitting polyhedral models to 3D reconstructions and imposing additional constraints [110,163], such as alignment with gravity [150], co-linearity [50], rectangularity, and parallelism [24]. In fact, automation of geometrical modeling has progressed to the points where there are various commercial software applications available on the market (e.g., EdgeWise by ClearEdge3D, CloudWorx by Digital Scan 3D, PolyWorks Modeler by innovmetric, RealWorks by Trimble) that perform semi-automated generation of BIMs and computer-aided design (CAD) models from various sources [84].
Semantic modeling encompasses a number of methods that transform input digital representations (i.e. 3D reconstructions) into more abstract semantic representations. Published articles will typically each have a narrow focus on a few types of semantic object categories such as plumbing [129], structural steel [21], and walls, floor, doors, and windows [8,16,94,101,111].
Due to the diversity of objects and systems encountered in buildings as well as the diversity of applications and contexts these objects and systems are found in, no method has yet been able to fulfil the promise of comprehensive semantically-rich BIM creation.
On our research community's mission to successfully contend with the diversity of buildings, review articles provide frameworks that guide our community's process of combining, scaling, and creating visual object recognition methods. Here we identify three limitations of the review articles in Table 2. These limitations relate to method evaluation: (1) understanding fundamental function, (2) inferring scope of effectiveness, and (3) performing an extensive and quantitative comparative performance evaluation.
The review articles in Table 2 partition semantic modeling into processes including: feature extraction (geometric, radiometric, colorimetric, and contextual), segmentation, object recognition, and classification [32]; geometric primitive detection, point cloud clustering, shape fitting, one-to-one matching [105]; object identification, extraction of relational and semantic information, data-driven feature/shape/material/statistical matching, model-based knowledge and contextual information based matching [143]. These review articles successfully account for the many abstraction methods involved in semantic modeling, but fail to present these methods with any explanation of their fundamental function. For example, Tang et al. [135] states that these abstraction methods are necessary because “explicit shape [lower abstraction] representations… are not very well suited for… automatically segmenting or recognizing building components”. The article does not elaborate on why low abstraction representations are not well suited and why higher abstraction representations are needed. A hint of insight about fundamental function is found in Pătrăucean et al. [105] when it states that these abstraction methods “encode, as compactly as possible, the distinctiveness of an object.” This distillation of distinct (discriminating) features enables classifiers to more effectively perform object recognition. This kind of clarity about function enables researchers to accurately interpret parts of methods as being vital or unnecessary. Understanding fundamental function is especially important when adopting parts of existing methods and is currently missing in the review articles.
Inferring scope of effectiveness is a critical step in:
- (1)
applying existing methods to new data, new contexts, and new representations or as Tang et al. [135] states, determining “extensibility to new environments. Can the algorithm be extended to handle new types of objects? Can the algorithm be applied to different types of environments, or is it specific to one or a certain class of spaces?” and
- (2)
designing new methods to effectively process data falling outside the scope of effectiveness of other existing methods
Existing methods are typically validated on a particular domain represented by a validation dataset. The existing method's scope of effectiveness is defined by all domains that are sufficiently similar to the domain represented by the validation dataset. “Sufficiently” depends on the method's generalization [102] ability. Any discussion of similarity between validation domains and target domains requires a rigorous definition of the possible dimensions of similarity. These dimensions of similarity follow directly from any framework for describing data (i.e. representations). Pătrăucean et al. [105] describes representations by completeness, compactness, and uniqueness. Both Pătrăucean et al. [105] and Tang et al. [135] describe representations as explicit, implicit, parametric, and non-parametric. It can be argued that a method validated on implicit representations will have other similar implicit representations within its scope of effectiveness. Although accurate, this determination lacks the requisite detail to be useful. On the other hand, Tang et al. [135] presents a second and more rigorous framework for describing representations that includes the dimensions:
- •
types of objects present
- •
level of sensor noise
- •
level of occlusion
- •
level of clutter
- •
presence of moving objects
- •
presence of specular surfaces
- •
presence of dark (low-reflectance) surfaces
- •
sparseness of data
Comparing a validation domain and target domain using these dimensions provides a similarity analysis with greater detail. However, even these dimensions have shortcomings. “Level of clutter” is not an objective description of digital representations, since it will change depending on the particular recognition task and its definition of clutter. The relevance of “presence of specular and dark surfaces” changes depending on the computer vision hardware used to collect the digital representations, since these types of surfaces cause erroneous measurements with some hardware [104] and not with others. In order to rigorously and objectively infer a method's scope of effectiveness, the community requires a new framework for describing representations. This framework must be: comprehensive, agnostic to computer-vision hardware, and agnostic to the particular recognition task the representation is used for.
In order for the research community to understand the current state of research, the literature must be surveyed to determine overall semantic coverage, i.e. what building component classes fall within the scope of effectiveness of existing methods, and the object recognition performance achieved for each semantic class. According to Pătrăucean et al. [105], “the high diversity of works… in terms of application field, methodology, and goals, makes it difficult to [perform] an objective comparative analysis.” However, this kind of comparative analysis is important for the community and must be performed.
This article reviews visual object recognition methods as they relate to automating the digital modeling of existing buildings and has three main points of departure from the review articles listed in Table 2. These points of departure are:
- 1.
Understanding fundamental function. We aggregate all processes involved in semantic modeling under the single concept of abstraction. We assert the two fundamental functions of abstraction are (1) feature standardization and (2) discriminator distillation. This is covered in Section 2.3.
- 2.
Inferring scope of effectiveness. We present a new framework for describing representations. Our set of characteristics is comprehensive and agnostic to computer-vision hardware and the application of the representation. The set enables a rigorous delineation of an object recognition method's scope of effectiveness. This framework is described in Section 2.1
- 3.
Performing an extensive and quantitative comparative performance evaluation. We present an objective comparative analysis of many visual object recognition methods using a set of commonalities referred to as the Minimal Standard Set. Semantic coverage and recognition performances of presented methods are reported in-depth and framed using a building taxonomy. This analysis is covered in Section 3.
Section snippets
Object recognition systems: general structure and variations
Here we define a search space as any digital representation, e.g. point cloud, 2D image or photo, and RGB-D frame, in which an object recognition algorithm will search for objects of interest. These objects of interest are defined by digital object representations sourced either from data or expert conception. Object recognition is the process of recognizing equivalence between an object representation and a subset of a search space. It requires a system capable of perception and analogical
Comparative performance analysis
We have reviewed the general structure and variations of visual object recognition methods and provided a rigorous framework for describing the digital representations these methods process. Now we perform an extensive and quantitative comparative performance evaluation of the recognition methods presented in the literature. The scope and methodology of the performance evaluation are as follows:
The scope of the performance evaluation is limited to visual object recognition methods involved in
Conclusions
The object recognition literature, as it relates to the automated digital modeling of existing buildings, was reviewed. Evidently, no method has yet been able to fulfil the promise of comprehensive semantically-rich BIM creation. In order to successfully contend with the diversity of buildings, our community must combine and scale existing object recognition methods and create new object recognition methods. In order to facilitate this mission, three limitations of existing review articles were
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research was supported by the National Science Foundation (NSF) of the United States of America under award number 1562438. Their support is gratefully acknowledged. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Mention of trade names in this article does not imply endorsement by the University of Texas at Austin or NSF.
References (166)
- et al.
Scan-to-BIM for ‘secondary’ building components
Adv. Eng. Inform.
(2018) - et al.
50 years of object recognition: directions forward
Comput. Vis. Image Underst.
(2013) Parametric as-built model generation of complex shapes from point clouds
Adv. Eng. Inform.
(2016)- et al.
Classification of sensor independent point cloud data of building objects using random forests
Journal of Building Engineering
(2019) Automated recognition of 3D CAD model objects in laser scans and calculation of as-built dimensions for dimensional compliance control in construction
Adv. Eng. Inform.
(2010)- et al.
The value of integrating scan-to-BIM and scan-vs-BIM techniques for construction monitoring using laser scanning and BIM: the case of cylindrical MEP components
Autom. Constr.
(2015) - et al.
Automating surface flatness control using terrestrial laser scanning and building information models
Autom. Constr.
(2014) Building reconstruction from images and laser scanning
Int. J. Appl. Earth Obs. Geoinf.
(2005)- et al.
Performance evaluation of 3D descriptors for object recognition in construction applications
Autom. Constr.
(2018) - et al.
Automatic building information model reconstruction in high-density urban areas: augmenting multi-source data with architectural knowledge
Autom. Constr.
(2018)