Elsevier

Neurocomputing

Volume 161, 5 August 2015, Pages 3-16
Neurocomputing

Logics based on qualitative descriptors for scene understanding

https://doi.org/10.1016/j.neucom.2015.01.074Get rights and content

Abstract

An approach for scene understanding based on qualitative descriptors, domain knowledge and logics is proposed in this paper. Qualitative descriptors, qualitative models of shape, colour, topology and location are used for describing any object in the scene. Two kinds of domain knowledge are provided: (i) categorizations of objects according to their qualitative descriptors, and (ii) semantics for describing the affordances, mobility and other functional properties of target objects. First order logics are obtained for reasoning and scene understanding. Tests were carried out at the Interact@Cartesium scenario and promising results were obtained.

Introduction

In an envisaged future in which we live in smart homes and intelligent robots populate our world we imagine an everyday phone conversation with our home ambient intelligent vision system (or robot), asking it “Is everything fine?” which the system happily confirms. However, if everything is fine or not at home from the human perspective is not a simple observable fact for a machine. It requires cognitive interpretation. So that an artificial intelligent agent can reach such a conclusion in alignment with the human, the agent needs to be able to observe its environment and interpret the information perceived in a cognitive manner. Let us continue with the example and consider that our robot detects a patch of water on the floor—should that be considered to be normal? Clearly, for a human, the interpretation of such observed information varies, depending on where and when the patch of water was recognized: a patch of water on the bathroom floor close to the shower can be a typical outcome of normal use, while a patch of water on a parquet flooring next to a window after a rainy day is a clear indication of something wrong (i.e. maybe a broken window, maybe a leaking flowerpot, etc.). Interpreting the meaning of a feature occurrence like the patch of water thus requires consideration of context. Due to the variety of possible situations, reasoning is required in order to state a hypothesis and reach a conclusion based on a manageable knowledge base which can differentiate normal from abnormal situations. In order to realize a system like the one sketched above, integrating perception with abstract reasoning in a way which matches human interpretation is necessary.

Ambient intelligent (AmI) systems and companion robots interacting with human beings are the scenarios proposed in this paper. These systems usually integrate digital cameras, from which they obtain information about the environment. As such systems need to align their interpretation with the human interpretation, one can go a step deeper and assume that the ideal systems for interacting with people would be those capable of interpreting their environment (captured by a digital image) cognitively (that is similarly to how people do it). In this way, those systems may directly align their concepts with human concepts and thus provide common ground for establishing sophisticated communication.

Psychological studies on image description point to the fact that people generally find the most relevant content and use words (qualitative tags) to describe it [1], [2], [3], [4]. Usually different colours/textures in an image indicate different objects/regions of interest to people [5]. Moreover, cognitive studies [6] explain that qualitative representations of images are in many ways similar to the mental images that people report when they describe what they have seen or when they attempt to answer questions based on visual memories [7].

Because digital images represent visual data numerically, most image processing has been successfully carried out by applying mathematical techniques to obtain and describe image content. Some examples are object feature invariant descriptors and detectors, such as SIFT and SURF (see the work by Mikolajczyk et al. [8] for an overview). All these approaches succeeded in extracting and using features from digital images for describing complex real world objects and then detecting them within other images. However, these approaches need to produce and store in memory huge numerical descriptions that cannot be interpreted or given a meaning. To establish semantics, features need to be grouped and linked to cognitive concepts first. A disadvantage of these feature detectors is their requirement of a repository of all possible images of objects existing in a scenario for identification, because they lack the ability to describe any feature of an object that they have not seen before. Qualitative approaches for object description in digital images are successful at identifying simple objects, but they may be ambiguous when detecting complex objects in the real world, since these approaches use abstractions of features which sometimes may produce too general categorizations [9], [10], but which can be complemented with spatial features (i.e. topology, distance, direction, location, etc.) for disambiguation. Advantages of qualitative descriptors are that they can be applied to: (i) describe objects which are ‘unknown’ by the system (i.e. not stored in memory, not seen before in a scenario) and (ii) identify them by matching their features without any previous training. Qualitative representations have also been successfully employed in querying spatial databases [11]. The main aim here is to analyse if they can enhance cognitive image/scene descriptions with reasoning, since qualitative relations have already been acknowledged for their contribution in object recognition [12], [13] and qualitative image description (QID) [14].

In the literature, QID has been applied to extract qualitative features from real-world scenarios such as (i) images captured by the webcam of a mobile robot [15], integrated also with qualitative distances for representing indoor scenes in robotics [16], and (ii) images of tiles captured by a camera located on a robotic arm used to detect tile pieces and assemble mosaics [9]. The QID description was designed for recognition by matching, then QID-Ontology [10], [17] was designed for formalizing the meaning of the representation and for categorizing objects by their qualitative shape and colour using description logics. In this paper, the QID approach is extended and redesigned for enhancing its reasoning capabilities by obtaining a qualitative image description in first order logics (QIDL) of shape, colour, topology and location of the objects in scenes captured by digital images, which are integrated with domain knowledge for scene understanding in an ambient intelligent system scenario, the Interact@Cartesium at the Universität Bremen.

The rest of the paper is organized as follows. In Section 2, the related work in the literature is presented and discussed. In Section 3, the approach for obtaining a qualitative image description in first order logics (QIDL) in Prolog [18] is outlined. Section 4 describes how the qualitative descriptors of shape, colour, topology and location are extracted. Section 5 presents the facts in first order logic which are obtained from the qualitative descriptors. Section 6 explains the domain knowledge provided to the system and Section 7 presents the logics defined. In Section 8, tests and results are presented. Section 9 discusses the results and finally, conclusions and future work are drawn in Section 10.

Section snippets

Related work

The amount of scientific literature on recognition and action for scene understanding is increasing and bringing together researchers on multidisciplinary fields such as computer vision, artificial intelligence and cognition.

Some approaches to scene understanding use low-level scene features extracted from a cognitive perspective in order to model visual attention [19], [20], [21], [22]. Carbone and Baccino [19] presented histograms of motion field orientations as a global motion information

An approach to generate a qualitative description image logic (QIDL)

The QIDL approach is outlined in Fig. 1. For extracting qualitative descriptors from the input image, a graph-based region segmentation method [51] is applied and then the closed boundary of the relevant regions detected is extracted. For each of the regions, qualitative descriptors of shape, colour, topology and location are obtained, as described in Section 4. And from these descriptors, first order logics in Prolog [18] syntax are obtained, as shown in Section 6. These logic descriptors of

Obtaining qualitative image descriptors (QIDs)

In the QID approach [14], each object/region extracted is described qualitatively by its shape and its colour, as explained in 4.1 Qualitative shape descriptors (QSD), 4.3 Qualitative colour descriptors (QCD). The spatial object description is composed of a topological description (Section 4.4) and a location description (Section 4.5).

To build this representation, the object is considered to be positioned in the 2D image space, allowing the rich repertoire of existing spatial representations to

Generating logics based on QID

The approach presented in this paper generates first order logics related to all the objects in the image. Prolog syntax has been used for expressing these logics as described in Table 1: where α1 to α7 use variables Image, Object and P which represent any image, any object inside the image, and any point belonging to an object in the image respectively. The predicate hasQSDpoint relates any point with the object to which it belongs to and also to its coordinates in the image space (see

Incorporating domain knowledge

The domain knowledge included in this approach consists of (i) logic definitions to categorize objects based on their qualitative descriptors (Section 6.1), (ii) images of ‘target’ objects which are ‘known’ by the agent and can be detected using a feature invariant descriptor and detectors (Section 6.2) and (iii) semantics for target objects regarding their use, dynamics, expected location, etc. (Section 6.3).

Reasoning using the qualitative image description logics (QIDL)

Some logics are provided to the agent for scene interpretation and understanding. First, this helps avoiding false positives, as the qualitative features of the objects are also considered in the matching process according to the domain knowledge provided. For example a target object will not be matched to an object the colour of which is different than the one provided in the domain knowledge. This pruning can be calculated with or without a certainty.pruning_target_object_matching(TargetName,

Experimentation and results

The considered scenario involves an agent which can be integrated into a robot or an intelligent ambient system. As a robot, any mobile or humanoid system which includes a digital camera could be used. As an AmI system, the Interact@Cartesium system located in the Cartesium building at Universität Bremen, Germany, is considered, which incorporates intelligent door tags (computers) installed in the walls next to every office of the CoSy group (see Fig. 6) from which video cameras can take

Discussion

The results of the experimentation and the methods used are discussed in 9.1 About the results, 9.2 About the methods used, respectively, and then a possible benchmarking of QIDL approach in RoboCup@Home competition is explained in Section 9.3.

Conclusions and future work

In this paper, qualitative image descriptors of shape, colour, topology and location are combined with domain knowledge and feature detectors for improving categorizations of objects (i.e. target objects or unknown objects with surfaces without textures).

Moreover, semantics are provided to target objects for describing their affordances, mobility and other functional properties of objects. Logics have been also defined for reasoning about the provided semantics and interpreting scenes for

Acknowledgements

This work was conducted on the scope of the project COGNITIVE-AMI (GA 328763) funded by the European Commission through FP7 Marie Curie IEF actions. The support by the Universität Bremen and the Interdisciplinary Transregional Collaborative Research Center SFB/TR 8 is also acknowledged by Dr. Ing. Zoe Falomir.

The support by the Cognitive Systems Research Center and by the German Research Foundation (DFG) is gratefully acknowledged by Ana-Maria Olteţeanu.

We would also like to thank the reviewers

Zoe Falomir is a post-doctoral Marie Curie fellow at the Cognitive Systems Research Group, Spatial Cognition Centre, at the Universität Bremen. She is the principal investigator of the COGNITIVE-AMI project funded by European Union under FP7-People. She was graduated in Computer Science Engineering at Universitat Jaume I (UJI), Castellón, Spain, in 2004. In 2006, Zoe was awarded a grant by Generalitat Valenciana (Spain) to carry out her PhD thesis in Qualitative Representations applied to

References (71)

  • J. Duncan et al.

    Competitive brain activity in visual attention

    Curr. Opin. Neurobiol.

    (1997)
  • D. Vernon

    Image and vision computing special issue on cognitive vision

    Image Vis. Comput.

    (2008)
  • H. Bay et al.

    Speeded-up robust features (SURF)

    Comput. Vis. Image Understand.

    (2008)
  • M. Laine-Hernandez, S. Westman, Image semantics in the description and categorization of journalistic photographs, in:...
  • H. Greisdorf et al.

    Modelling what users see when they look at imagesa cognitive viewpoint

    J. Doc.

    (2002)
  • X. Wang, P. Matsakis, L. Trick, B. Nonnecke, M. Veltman, A study on how humans describe relative positions of image...
  • S. Palmer

    Vision Science: Photons to Phenomenology

    (1999)
  • C. Freksa

    Qualitative spatial reasoning

  • S.M. Kosslyn et al.

    The Case for Mental Imagery

    (2006)
  • K. Mikolajczyk et al.

    A comparison of affine region detectors

    Int. J. Comput. Vis

    (2005)
  • Z. Falomir et al.

    Describing images using qualitative models and description logics

    Spat. Cognit. Comput.

    (2011)
  • J.O. Wallgrün, D. Wolter, K.-F. Richter, Qualitative matching of spatial information, in: Proceedings of the 18th...
  • Z. Falomir et al.

    Measures of similarity between objects from a qualitative shape description

    Spat. Cogn. Comput.

    (2013)
  • W.E.L. Grimson

    Object Recognition by Computer: The Role of Geometric Constraints

    (1990)
  • Z. Falomir, L. Museros, L. Gonzalez-Abril, Towards a similarity between qualitative image descriptions for comparing...
  • Z. Falomir, Towards scene understanding using contextual knowledge and spatial logics, in: J. Dias, F. Escolano, R....
  • L. Sterling et al.

    The Art of Prolog: Advanced Programming Techniques

    (1994)
  • A. Carbone, T. Baccino, Histograms of motion field orientations as a gist descriptor for the prediction of eye...
  • R. Marfil, E. Antúnez, F. Arrebola, A. Bandera, Towards active image segmentation: the foveal bounded irregular...
  • A. Palomino, R. Marfil, J. Bandera, A. Bandera, Multi-feature bottom-up processing and top-down selection for an...
  • R. Martins, J. Ferreira, J. Dias, Touch attention bayesian models for robotic active haptic exploration, in: J. Dias,...
  • A. Oliva et al.

    Modeling the shape of the scene: a holistic representation of the spatial envelope

    Int. J. Comput. Vis.

    (2001)
  • A. Quattoni, A. Torralba, Recognizing indoor scenes, in: IEEE Conf. on Computer Vision and Pattern Recognition, IEEE...
  • Z.U. Qayyum, A.G. Cohn, Image retrieval through qualitative representations over semantic features, in: Proceedings of...
  • S. Aksoy, C. Tusk, K. Koperski, G. Marchisio, Scene Modeling and Image Mining with a Visual Grammar, 2003, Chapter 3,...
  • Cited by (20)

    • Spatial query based virtual reality GIS analysis platform

      2018, Neurocomputing
      Citation Excerpt :

      On the basis of our previous framework [35] this research provides a new effective model of three-dimensional spatial information framework and its application in urban construction and development, which will significantly improve the technical level and efficiency of urban management and emergency response and bring revolutionary changes to the field of engineering design and construction management from two-dimensional drawing to three-dimensional collaborative design and construction. Some previous work has inspired our research [36–39]. In a single-computer environment, three-dimensional space analytical components have access to interface access space to analyze the data which will be treated through the uniform data provided by a three-dimensional spatial data engine; according to analysis requirements, the access is made to the interface of general spatial analytical components or comprehensive spatial analytical components, and the analysis result can be returned to database or applied in three-dimensional visualization and professional application through the uniform data access interface.

    • Rule-guided human classification of Volunteered Geographic Information

      2017, ISPRS Journal of Photogrammetry and Remote Sensing
      Citation Excerpt :

      Another advantage of a description based on qualitative relations is also that semantics can be assigned to them by means of logics and ontologies. QSR has been successfully applied to many areas such as robotics (Falomir et al., 2013b; Wolter et al., 2011), computer vision (Falomir et al., 2011; Cohn et al., 2006), ambient intelligence (Bhatt and Dylla, 2009; Falomir and Olteţeanu, 2015), shape recognition (Falomir et al., 2013a), architecture and design (Richter et al., 2010; Bhatt and Freksa, 2015), etc. Specifically GIS has been the field in which most QSR models – for example RCC-8 (Randell et al., 1992), 9-Intersection model (9IM) (Egenhofer, 1995) – have found a direct application when investigating: topological changes in space (Egenhofer and Al-Taha, 1992), and in sensor networks (Jiang and Worboys, 2008), topological relations between multi-holed regions (Vasardani and Egenhofer, 2009), the extraction of qualitative spatial relations between recognized places from natural language place descriptions (Khan et al., 2013; Vasardani et al., 2013), the generation of narratives to explain spatio-temporal dynamics (Bhatt and Wallgrün, 2014), spatial query solving and retrieval (Fogliaroni, 2013; Al-Salman, 2014), the alignment of sketch and metric maps (Schwering et al., 2014), etc.

    • A numerical model based on prior distribution fuzzy inference and neural networks

      2017, Renewable Energy
      Citation Excerpt :

      The construction of the fuzzy inference system can achieve the more accurate wind speed forecasting. 3) Compared to the single-variable forecasting model, the hybrid model has better robustness and faster learning rate, because a modified PSO-based algorithm is used to train neural networks to estimate the fuzzy rules and it is an optimization technique that has been employed effectively by previous researchers to determine the optimal values of modeling parameters [58–70] and improve the training accuracy. In addition, in order to prove the effectiveness of the proposed hybrid model, three experiments covering the data collected from Hebei province of China are performed to examine the validity of wind speed prediction.

    • Object replacement and object composition in a creative cognitive system. Towards a computational solver of the Alternative Uses Test

      2016, Cognitive Systems Research
      Citation Excerpt :

      In the computational domain, cognitive properties have been shown by a qualitative shape description and its corresponding similarity measure, such as: invariance to translations, rotations and scaling, or implicitly managing deformation of shape parts and incompleteness (Falomir, Gonzalez-Abril, Museros, & Ortega, 2013). Logics have been used for scene understanding which include categorizations of objects according to their qualitative descriptors and semantics for describing the affordances, mobility and other functional properties of target objects (Falomir & Olteţeanu, 2015). Finally, re-representation (Batchelder & Alexander, 2012) is well known as a relevant creative process.

    View all citing articles on Scopus

    Zoe Falomir is a post-doctoral Marie Curie fellow at the Cognitive Systems Research Group, Spatial Cognition Centre, at the Universität Bremen. She is the principal investigator of the COGNITIVE-AMI project funded by European Union under FP7-People. She was graduated in Computer Science Engineering at Universitat Jaume I (UJI), Castellón, Spain, in 2004. In 2006, Zoe was awarded a grant by Generalitat Valenciana (Spain) to carry out her PhD thesis in Qualitative Representations applied to Robotics. In 2011, she became Dr. Computer Science at UJI I and Dr.-Ing. Informatik at Universität Bremen. From 2010 to 2011, she was a research engineer in Cognitive Robots SL (Castellón, Spain) where she applied some of the results of her PhD thesis to the automation of scrubber machines and automatic mosaic assembling. In 2012, Zoe won a Post-Doc position funded by the Exzellent Initiative at Universität Bremen, and in 2013 she won a Marie Curie Intra-European Fellowships for Career Development.

    She is main board member at the association of Spanish researchers in Germany (CERFA), and she was main board member the Catalan Association for Artificial Intelligence (ACIA) from 2010-2014. She is also a member of the Cognitive Science Society (CSS) and of the Spatial Intelligence Learning Center (SILC) funded by the National Science Foundation (NCS). In 2014 she received the Castelló City Award to the best work on Experimental Sciences and Technology and then the Prize to the most outstanding PhD thesis at Universitat Jaume I for the academic year 2011/2012. Her main research interests are cognitive systems, spatial reasoning, qualitative modeling and reasoning, logics, similarity matching, computer vision, knowledge extraction, and applied ontology methods.

    Ana-Maria Olteţeanu is a doctoral researcher at the Cognitive Systems Research Group SFB/TR 8, Spatial Cognition, at the Universität Bremen. A previous concert pianist (graduating at Universitatea de Vest, Timis˛oara, Romania, 2005), she became interested in Cognitive Science while undertaking her first doctorate in Musicology at the National University of Music, Bucharest, Romania (summa cum laude awarded in 2011), which dealt with the cognitive relation between musical form and aesthetical expression and emotion. Meanwhile she furthered her education in Computer Science with a Masters in Cognitive Computing at Goldsmiths, University of London, UK (MSc with Distinction, 2010). From 2011, she joined the Cognitive Systems group in Bremen, working on a thesis which deals with cognitively inspired representation structures for artificial systems that use visuospatial abilities for creative problem-solving. Ana has been a Fellow of the Heidelberg Laureate Forum in 2013, an event bringing together Abel, Fields and Turing Laureates with the most promising mathematics and informatics researchers. She is a member of the Cognitive Science Society (CSS) and The Society for the Study of Artificial Intelligence and the Simulation of Behaviour (AISB). Ana׳s main research interests are cognitive systems, problem-solving, knowledge representation structures, human and computer vision, and spatial cognition.

    View full text