Elsevier

Ecological Informatics

Volume 26, Part 2, March 2015, Pages 127-139
Ecological Informatics

A knowledge discovery process for spatiotemporal data: Application to river water quality monitoring

https://doi.org/10.1016/j.ecoinf.2014.05.011Get rights and content

Highlights

  • We have presented a knowledge discovery process on hydrological data.

  • We have proposed an objective measure of validation.

  • We have applied a similarity measure to compare of patterns extracted.

Abstract

Rapid population growth and human activity (such as agriculture, industry, transports,…) development have increased vulnerability risk for water resources. Due to the complexity of natural processes and the numerous interactions between hydro-systems and human pressures, water quality is difficult to be quantified. In this context, we present a knowledge discovery process applied to hydrological data. To achieve this objective, we combine successive methods to extract knowledge on data collected at stations located along several rivers. Firstly, data is pre-processed in order to obtain different spatial proximities. Later, we apply a standard algorithm to extract sequential patterns. Finally we propose a combination of two techniques (1) to filter patterns based on interest measure, and; (2) to group and present them graphically, to help the experts. Such elements can be used to assess spatialized indicators to assist the interpretation of ecological and river monitoring pressure data.

Introduction

Improvements in digital data collection devices and data storage technology have allowed companies and organizations to store increasingly huge amounts of data thus making it harder to analyze them manually. Therefore, new techniques have been developed to help humans to automatically turn this huge volume of data into useful knowledge that enables a better understanding of phenomena occurring in their environment. These techniques make up knowledge discovery in databases (KDD) which is characterized as a multi-step process for discovering valid, novel and potentially useful information.

Natural phenomena involve both spatial and temporal components. For example, in environmental contexts, river pollution is a phenomenon which is observed by measuring physicochemical and biological indicators for water quality. These indicators which evolve over time, depend explicitly on the location of sampling stations strategically located along several rivers.

If systems dedicated to water quality monitoring have existed for several decades, the challenge is now to define indicators to take into account the impact of uses and water quality restoration measures. In this context, to build an efficient tool, spatial relations both metric (e.g., distance) and non-metric (e.g., topology, locations,…) and temporal relations (e.g., before or after) must be considered in the KDD process in order to better understand spatiotemporal phenomena.

In this paper, our objective is to analyze the water quality in the hydrological network of Saône watershed (located in the East of France, see Fig. 1). To achieve this goal, we describe a KDD process for hydrological data consisting of: (1) a pre-processing step to transform data by grouping stations that consider their different spatial proximities according to their distance, to membership in a common area,…; (2) a second step dedicated to the extraction of sequential patterns in order to take into account the temporal aspect, and; (3) post-processing step, combining a new interest measure called the least temporal contradiction in order to filter sequences to retain only the least contradicted over time. This technique is coupled with another one that allows us to determine the degree of similarity between patterns obtained and regroups them.

This paper is organized as follows: in Section 2, we present a brief overview of knowledge discovery process in spatiotemporal data. After, in Section 3, we describe a framework for extracting knowledge. The experiments performed are described in Section 4. Finally, we show the results of our proposals by highlighting the short and medium term perspectives in Section 5.

Section snippets

Related work

Knowledge discovery in databases (KDD) is a dynamic research field. Fayyad et al. (1996) presented the most widely used KDD framework and provide a broad overview of knowledge discovery techniques. Here KDD was described as a set of interactive and iterative steps: data selection, pre-processing, transformation, data mining, and post processing or interpretation. As mentioned by Fayyad et al. (1996), the basic problem addressed by the KDD process is one of mapping low-level data into other

A framework for mining spatiotemporal data

In this section, we describe the steps of the general process used to extract knowledge in spatiotemporal database.

Application to hydrological data

In this section, we describe the application of our spatiotemporal knowledge discovery process to hydrological data of the Saône watershed.

Conclusion and future research directions

In this paper, we have presented a knowledge discovery process on hydrological data. In particular, we have applied a conventional algorithm for sequential pattern extraction according to three spatialization approaches. We highlighted the problems that are posed regarding choices made in terms of spatialization and their influence on the number of extracted patterns. We have proposed an objective measure of validation: the least temporal contradiction measure which provides experts with an

References (39)

  • O. Brazhnik

    Databases and the geometry of knowledge

    Data Knowl. Eng.

    (2007)
  • R. Agrawal et al.

    Mining sequential patterns

  • J. Azé

    Une nouvelle mesure de qualité pour l'extraction de pépites de connaissances

  • J. Azé et al.

    A study of the robustness of association rules

  • E. Bertini et al.

    Investigating and reflecting on the integration of automatic data analysis and visualization in knowledge discovery

  • T. Bie

    Subjective interestingness in exploratory data mining

  • V. Bogorny et al.

    Spatial data preparation for knowledge discovery

    IEEE Comput. Graph.

    (2005)
  • D. Bollegala et al.

    A web search engine-based approach to measure semantic similarity between words

    IEEE Trans. Knowl. Data Eng.

    (2011)
  • H. Cao et al.

    Mining frequent spatio-temporal sequential patterns

  • L. Cao et al.

    Combined mining: discovering informative knowledge in complex data

    IEEE Trans. Syst. Man Cybern. B

    (2011)
  • M. Capelle et al.

    Mining frequent sequential patterns under regular expressions: a highly adaptive strategy for pushing constraints

  • M. Celik et al.

    Mixed-drove spatio-temporal co-occurrence pattern mining: a summary of results

  • D.-A. Chiang et al.

    Mining interval sequential patterns

    Int. J. Intell. Syst.

    (2005)
  • B. Elias

    Extracting landmarks with data mining methods

  • M. Ester et al.

    Algorithms and applications for spatial data mining

    Geogr. Data Min. Knowl. Discov.

    (2001)
  • U.M. Fayyad et al.

    Advances in Knowledge Discovery and Data Mining

    (1996)
  • C. Fiot et al.

    Evolution patterns and gradual trends

    Int. J. Intell. Syst.

    (2009)
  • L. Fleury et al.

    Some aspects of rule discovery in data bases

  • J. Han et al.

    Freespan: frequent pattern-projected sequential pattern mining

  • Cited by (0)

    View full text