An ontology-based spatial data harmonisation for urban analytics

https://doi.org/10.1016/j.compenvurbsys.2018.06.009Get rights and content

Highlights

  • This paper presents a new means of using the ontology to resolve heterogenous data problems in urban analytics.

  • The practice of developing domain ontology from domain knowledge is articulated.

  • The heterogeneities among datasets are eliminated by applying two level of mapping mechanism.

  • A semantic translation engine is implemented to convert mapped data layers into a consolidated format described in the ontology schema.

  • An urban density analysis tool which can handle heterogeneous data is created to test the usability of the proposed method.

Abstract

Data heterogeneity is one of the most challenging problems in urban data analytics. When obtained from various providers or custodians, datasets for the same domain themes may dramatically differ in formats due to many reasons such as historical legacies, changing definitions or standards across jurisdictions etc. It hinders urban analysts and researchers from understanding and using these data and makes results comparison and interpretation obscure. Ontology, usually created by domain experts, offers a comprehensive representation of knowledge including concepts, relations and properties in a domain. It defines the real world in abstract and offers a universal and stable schema for data harmonisation. This paper proposes a fast, extensible solution for eliminating data heterogeneity by using ontology. Starting from conceptualising domain knowledge to domain ontology, we discuss a two-level mapping mechanism which bonds the nexus between data and ontology using mapping rules. A semantic translation engine is also introduced to automate the data harmonisation process. A real case - urban density indicators computation - also demonstrates the usability of the proposed framework and the results show strong potentials for applying this method to broader urban analytics application scenarios.

Introduction

Over the next three decades, more than half of the world's population is expected to live in cities. While cities occupy about 2% of land mass worldwide, they produce more than 80% of global GDP (Dobbs et al., 2011), which is a large economic footprint. Cities also contribute more than 70% of the world's greenhouse gas emissions which add significantly to severe environmental footprint (UN-HABITAT, 2011). In addition to these are several other challenges associated with urbanisation that impact the quality of life, public services accessibility, housing affordability, and health.

Rapid urbanisation worldwide is known to challenge urban planning and management tasks as it brings treats and opportunities for cities. In a current digital era, data management is considered as the main enabler in urban planning, management, and decision-making. However, urban data is still challenged by its notorious heterogeneity (Psyllidis, Bozzon, Bocconi, & Titos Bolivar, 2015; Rajabifard, Ho, & Sabri, 2016). The new streams of big data have further complicated these issues. Big data is usually composed of volumetric and complex data from various sources (e.g. sensor data, social media, and enterprise data) that need classic decision-making organisations to revise their regulatory frameworks for effective utilisation (Sabri, Rajabifard, Ho, Namazi-Rad, & Pettit, 2015). Urban planners are still struggling to interpret the various dimensions of available urban data; particularly when required to understand and plan for complex urban issues such as high-rise building development and its impact on urban temperature. The main challenge is the ability to effectively source, access and leverage the appropriate data for evidence-based planning and decision making.

Urban development analyses involve multi-disciplinary data gathering and analytics (e.g. buildings, infrastructures, populations, and green spaces). As a result, the multi-disciplinary and multi-scale data challenges of urban analytics make the task unique and complex. In general, each discipline has its own data sources that need to be standardised for interoperability, harmonisation and integration for analysis and modelling, fostering complex planning and decision-making tasks.

There have been several initiatives around the world to address the issues of urban data accessibility and interoperability. Examples are the Australia Urban Research Infrastructure Network (AURIN) (Sinnott et al., 2015), Urban Big Data Centre (UBDC) in the UK (Thakuriah, Dirks, & Keita, 2016), and the University of Chicago's Urban Centre for Computation and Data (UrbanCCD) (Catlett et al., 2014). These initiatives and several other similar platforms provide urban researchers and decision-makers unique access to thousands of datasets and analytic tools. However, notwithstanding these initiatives, there are still challenges in harnessing data from different sources and the integration of diverse types of data for robust analyses. For instance, urban planning issues such as decision making for housing affordability need data about land use, income, population density, and transport. While these data might be available through existing platforms as aforementioned, automated data integration is not possible due to data heterogeneity. Another limitation is the provenance and extensibility of data used and models developed within these platforms. As an example, Pettit et al. (Pettit, Tanton, & Hunter, 2017) ascertain that the Shift-Share analysis tool developed in AURIN defined names of derived variables and files but does not provide users with information about input data and the method. As such, the lack of provenance will limit the tools' and models' ability to be scalable and extended to other contexts. Consequently, it underscores the needs to apply the different domains knowledge in determining the semantics of data and their ontology across different jurisdictions while engaging with urban analytics, planning, and management (Catlett et al., 2014; Rajabifard et al., 2016; Thakuriah et al., 2016; Villa, Molina, Gomarasca, & Roccatagliata, 2011).

As such, city planning and policy-making that are location-based and evidence-based reportedly suffer from practical analytics and data-driven decision making due to the lack of access to robust spatial platforms and data sharing infrastructures (Kyttä, Broberg, Tzoulas, & Snabb, 2013; Sabri, Rajabifard, Ho, Amirebrahimi, & Bishop, 2016). In addition, current geospatial databases are used for local- or domain-specific analyses. As a result, city planning and urban development monitoring activities are challenged by the lack of integrated spatial planning and management due to the absence of organised and complex spatial data infrastructures. Substantial work has been undertaken in the past decade. For example, Benslimane et al. (Benslimane, Leclercq, Savonnet, Terrasse, & Yetongnon, 2000) define a spatial ontology to describe key features of urban applications, providing a foundation for semantic reconciliation among heterogeneous spatial information sources. Fonseca et al. (Fonseca, Egenhofer, Davis Jr, & Borges, 2000) propose a creation of software components from diverse ontologies using an object-oriented mapping as a way to share knowledge and data. Raskin and Pan (Raskin & Pan, 2005) develop a collection of ontologies using the web ontology language (OWL) that include both orthogonal concepts (space, time, Earth realms, physical quantities, etc.) and integrative science knowledge concepts (phenomena, events, etc.) for their environmental research. Konstantinou et al. (Konstantinou, Spanos, & Mitrou, 2008) also raise and discuss the problem of mapping relational database contents and ontologies and argue that the addition of formal semantics to the databases is important to make information searchable, accessible and retrievable. Consistent with these efforts, Buccella et al. (Buccella et al., 2011) design and implement a system called GeoMergeP to build a global normalised ontology for integrating geographic data sources. They devise two steps for this purpose. First, by applying a semantic enrichment process on data, a top-level and domain ontology based on the domain ontology of the source and the ISO standards is derived; then continue with a merging process, a shared vocabulary or global ontology is created out of the enriched ontologies. When all data sources are mapped to the global ontology, a federated database is formed for use. Pileggi and Hunter (Pileggi & Hunter, 2017) introduce their ontological approach for establishing the interoperability among heterogeneous datasets for urban indicators computation. In evaluating these previous efforts, a key observation is that most of them parse datasets into a semantic format (e.g., tuples) and provides data discovery and reasoning capabilities by adopting semantic technologies. There are, however, two main drawbacks to these methods. First, by converting and storing datasets as semantic format, an extra copy of data has to be maintained, and it will become intractable for data update and synchronisation. Second, in geospatial and urban analysis domain, a lot of existing models (e.g., road network connectivity, spatial association, agglomeration, clustering, isochrone (Day, Chen, Ellis, & Roberts, 2016; Day, Chen, Ellis, & Roberts, 2017; Yiqun Chen & Rajabifard, 2017; Yiqun Chen, Rajabifard, Spring, Gouldbourn, & Griffin, 2016) and procedures (e.g., spatial union, join, buffer, intersect, clip) are not designed for consuming semantic data format or compatible with semantic technologies. They expect inputs described in traditional geospatial formats while eliminating the heterogeneous data issues, thus, improving their usability.

This paper proposes an ontology-based framework for data heterogeneity elimination by focusing on data accessibility and integration, including provenance and extensibility. It starts with conceptualising domain knowledge and developing this into a domain ontology. It continues with the introduction of a two-level mapping mechanism, which bonds the nexus between data and ontology using semantic enrichment rules. This approach is different from existing methodologies for semantic enrichment of geospatial data, which converts the raw data layer format into a uniformed structure described by the ontology schema. This approach, as explained in Section 4, will mitigate the issue of physically storing any extra data. Section 4 also introduces a semantic translation engine that automates data harmonisation processes. Section 6 explains and demonstrates the usability of the proposed framework in a real case – urban density indicator computation. The last section gives an account of how the proposed framework enables robust urban analytics and decision making and offers suggestions for improvement while speculating on the future directions of ontology-based spatial data harmonisation and urban analytics.

Section snippets

From domain knowledge to domain ontology

Ontologies are used for different purposes including intelligent integration of information, the Semantic Web, natural language processing, and knowledge management. From a computer science point of view and in the context of knowledge acquisition, an ontology could be defined as “a formal, explicit specification of a shared conceptualisation” (Staab et al., 2009). In this definition, “formal” refers to the language that is used for the description of the ontology specifications. This language

Heterogeneity elimination – mapping data with ontology

As explained in the introduction section, a big challenge in urban analytics is to resolve data heterogeneity problem so that the analysis process can be performed in a unified manner and the results are comparable and interpretable. This way, analysts can consistently present inferences and arguments. In many situations, urban analysts and researchers are handling data from diverse sources, and things get worse when datasets tangle with historical legacies and changing definitions or standards

Semantic translation engine

In the previous section, the two-level mapping (‘data-concept’ and ‘attribute-property’) of semantic enrichment theoretically addresses the data heterogeneity problem. By adopting the mapping rules, data layers will be translated into a consolidated format described in ontology schema for future analyses. A semantic translation engine is required for automating this process. Here, ‘semantic’ means that the ontology schema will retain during and after the translation. In other words, the names

Case study – urban density indicators calculation

Planners make abstract assumptions about density when carrying out strategic planning. Also, politicians use density as an indicator to show concerns about the quality of urban life, but it is unclear which density (in different morphologies) is appropriate due to the lack of understanding about the relation of the building, people, and open spaces densities. It assumes that simplification of approach due to the complexity of computation and significant parameters are not considered in decision

Discussion & conclusions

Heterogeneity-eliminated datasets offer urban researchers and planners a simple way for data analysis, visualisation, and interpretation. Our proposed semantic data harmonisation and consolidation procedure tackle this problem by exploiting the stableness and comprehensiveness of ontology. Once data is semantically enriched, analysis tools that conform with a uniformed ontology schema can be concisely designed, without using hardcoded logic to resolve diverse structures of inputs. It shows a

References (36)

  • C. Catlett

    Plenario: An open data discovery and exploration platform for urban science

    IEEE Computer Society Technical Committee on Data Engineering

    (2014)
  • J. Day et al.

    A free, open-source tool for identifying urban agglomerations using point data

    Spatial Economic Analysis

    (Mar. 2016)
  • J. Day et al.

    A free, open-source tool for identifying urban agglomerations using polygon data

    Environment Systems and Decisions

    (2017)
  • R. Dobbs et al.

    Urban world: Mapping the economic power of cities

    (2011)
  • K. Dovey et al.

    The urban density assemblage: Modelling multiple measures

    URBAN DESIGN International

    (2014)
  • G. Falquet et al.

    Ontologies in urban development projects

    (2011)
  • F.T. Fonseca et al.

    Ontologies and knowledge sharing in urban GIS

  • A. Gómez-Pérez et al.

    Ontological engineering: With examples from the areas of knowledge management, e-commerce and the Semantic Web

    (2004)
  • Cited by (24)

    • Transport sustainability indicators for an enhanced urban analytics data infrastructure

      2020, Sustainable Cities and Society
      Citation Excerpt :

      The ontology interconnects data layers within the UADI and provides heterogeneous data layers with a uniformed data structure described by the ontology schema and eventually makes data easy to use for further analysis. The detail of this process is explained in Chen et al. (2018). The UADI utilises ‘Ontology’ as its core component.

    • The design and practice of a semantic-enabled urban analytics data infrastructure

      2020, Computers, Environment and Urban Systems
      Citation Excerpt :

      Attribute level metadata is critical for understanding the data structure and hence assisting in the semantic enrichment process. The semantic enrichment process in the UADI comprises two mapping steps (Chen et al., 2018). The first mapping step is “data-concept” level mapping.

    • Smart Dubai IoT strategy: Aspiring to the promotion of happiness for residents and visitors through a continuous commitment to innovation

      2020, Smart Cities for Technological and Social Innovation: Case Studies, Current Trends, and Future Steps
    • Ontology-based knowledge representation of urban heat island mitigation strategies

      2020, Sustainable Cities and Society
      Citation Excerpt :

      Developing an ontology-based prototype for domain experts (such as hydrologists and urban planners) is relatively easy. Chen et al. (2018) showed that two urban researchers without knowledge about ontology were able to build ontological terminologies and relationships after only three days of training. Therefore, our prototype has great potential to support urban disaster mitigation.

    View all citing articles on Scopus
    View full text