A knowledge discovery process for spatiotemporal data: Application to river water quality monitoring
Introduction
Improvements in digital data collection devices and data storage technology have allowed companies and organizations to store increasingly huge amounts of data thus making it harder to analyze them manually. Therefore, new techniques have been developed to help humans to automatically turn this huge volume of data into useful knowledge that enables a better understanding of phenomena occurring in their environment. These techniques make up knowledge discovery in databases (KDD) which is characterized as a multi-step process for discovering valid, novel and potentially useful information.
Natural phenomena involve both spatial and temporal components. For example, in environmental contexts, river pollution is a phenomenon which is observed by measuring physicochemical and biological indicators for water quality. These indicators which evolve over time, depend explicitly on the location of sampling stations strategically located along several rivers.
If systems dedicated to water quality monitoring have existed for several decades, the challenge is now to define indicators to take into account the impact of uses and water quality restoration measures. In this context, to build an efficient tool, spatial relations both metric (e.g., distance) and non-metric (e.g., topology, locations,…) and temporal relations (e.g., before or after) must be considered in the KDD process in order to better understand spatiotemporal phenomena.
In this paper, our objective is to analyze the water quality in the hydrological network of Saône watershed (located in the East of France, see Fig. 1). To achieve this goal, we describe a KDD process for hydrological data consisting of: (1) a pre-processing step to transform data by grouping stations that consider their different spatial proximities according to their distance, to membership in a common area,…; (2) a second step dedicated to the extraction of sequential patterns in order to take into account the temporal aspect, and; (3) post-processing step, combining a new interest measure called the least temporal contradiction in order to filter sequences to retain only the least contradicted over time. This technique is coupled with another one that allows us to determine the degree of similarity between patterns obtained and regroups them.
This paper is organized as follows: in Section 2, we present a brief overview of knowledge discovery process in spatiotemporal data. After, in Section 3, we describe a framework for extracting knowledge. The experiments performed are described in Section 4. Finally, we show the results of our proposals by highlighting the short and medium term perspectives in Section 5.
Section snippets
Related work
Knowledge discovery in databases (KDD) is a dynamic research field. Fayyad et al. (1996) presented the most widely used KDD framework and provide a broad overview of knowledge discovery techniques. Here KDD was described as a set of interactive and iterative steps: data selection, pre-processing, transformation, data mining, and post processing or interpretation. As mentioned by Fayyad et al. (1996), the basic problem addressed by the KDD process is one of mapping low-level data into other
A framework for mining spatiotemporal data
In this section, we describe the steps of the general process used to extract knowledge in spatiotemporal database.
Application to hydrological data
In this section, we describe the application of our spatiotemporal knowledge discovery process to hydrological data of the Saône watershed.
Conclusion and future research directions
In this paper, we have presented a knowledge discovery process on hydrological data. In particular, we have applied a conventional algorithm for sequential pattern extraction according to three spatialization approaches. We highlighted the problems that are posed regarding choices made in terms of spatialization and their influence on the number of extracted patterns. We have proposed an objective measure of validation: the least temporal contradiction measure which provides experts with an
References (39)
Databases and the geometry of knowledge
Data Knowl. Eng.
(2007)- et al.
Mining sequential patterns
Une nouvelle mesure de qualité pour l'extraction de pépites de connaissances
- et al.
A study of the robustness of association rules
- et al.
Investigating and reflecting on the integration of automatic data analysis and visualization in knowledge discovery
Subjective interestingness in exploratory data mining
- et al.
Spatial data preparation for knowledge discovery
IEEE Comput. Graph.
(2005) - et al.
A web search engine-based approach to measure semantic similarity between words
IEEE Trans. Knowl. Data Eng.
(2011) - et al.
Mining frequent spatio-temporal sequential patterns
- et al.
Combined mining: discovering informative knowledge in complex data
IEEE Trans. Syst. Man Cybern. B
(2011)