Elsevier

Water Research

Volume 38, Issue 18, November 2004, Pages 3980-3992
Water Research

Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India)—a case study

https://doi.org/10.1016/j.watres.2004.06.011Get rights and content

Abstract

This case study reports different multivariate statistical techniques applied for evaluation of temporal/spatial variations and interpretation of a large complex water-quality data set obtained during monitoring of Gomti River in Northern part of India. Water quality of the Gomti River, a major tributary of the Ganga River was monitored at eight different sites selected in relatively low, moderate and high pollution regions, regularly over a period of 5 years (1994–1998) for 24 parameters. The complex data matrix (17,790 observations) was treated with different multivariate techniques such as cluster analysis, factor analysis/principal component analysis (FA/PCA) and discriminant analysis (DA). Cluster analysis (CA) showed good results rendering three different groups of similarity between the sampling sites reflecting the different water-quality parameters of the river system. FA/PCA identified six factors, which are responsible for the data structure explaining 71% of the total variance of the data set and allowed to group the selected parameters according to common features as well as to evaluate the incidence of each group on the overall variation in water quality. However, significant data reduction was not achieved, as it needed 14 parameters to explain 71% of both the temporal and spatial changes in water quality. Discriminant analysis showed the best results for data reduction and pattern recognition during both temporal and spatial analysis. Discriminant analysis showed five parameters (pH, temperature, conductivity, total alkalinity and magnesium) affording more than 88% right assignations in temporal analysis, while nine parameters (pH, temperature, alkalinity, Ca-hardness, DO, BOD, chloride, sulfate and TKN) to afford 91% right assignations in spatial analysis of three different regions in the basin. Thus, DA allowed reduction in dimensionality of the large data set, delineating a few indicator parameters responsible for large variations in water quality. This study presents necessity and usefulness of multivariate statistical techniques for evaluation and interpretation of large complex data sets with a view to get better information about the water quality and design of monitoring network for effective management of water resources.

Introduction

Surface waters are most vulnerable to pollution due to their easy accessibility for disposal of wastewaters. Both the natural processes, such as precipitation inputs, erosion, weathering of crustal materials, as well as the anthropogenic influences viz. urban, industrial and agricultural activities, increasing exploitation of water resources, together determine the quality of surface water in a region (Carpenter et al., 1998; Jarvie et al., 1998). Rivers play a major role in assimilation or carrying off the municipal and industrial wastewater and run-off from agricultural land. The municipal and industrial wastewater discharge constitutes the constant polluting source, whereas, the surface run-off is a seasonal phenomenon, largely affected by climate in the basin. Seasonal variations in precipitation, surface run-off, interflow, groundwater flow and pumped in and outflows have a strong effect on river discharge and subsequently on the concentration of pollutants in river water (Vega et al., 1998). Since, rivers constitute the main inland water resources for domestic, industrial and irrigation purposes, it is imperative to prevent and control the rivers pollution and to have reliable information on quality of water for effective management. In view of the spatial and temporal variations in hydrochemistry of rivers, regular monitoring programs are required for reliable estimates of the water quality. This results in a huge and complex data matrix comprised of a large number of physico-chemical parameters (Chapman, 1992), which are often difficult to interpret drawing meaningful conclusions (Dixon and Chiswell, 1996). In India, under the National Rivers Conservation Program, water quality of all the major river systems is regularly monitored at several sites for a large number of physico-chemical, bacteriological and hydrological parameters with an outcome of tremendous databases of high complexity. Such monitoring programs involve huge financial inputs. Thus, there is a need to optimize the monitoring networks, number of water-quality parameters, reducing these to representative ones without losing useful information. The multivariate statistical techniques and exploratory data analysis are the appropriate tools for a meaningful data reduction and interpretation of multi-constituent chemical and physical measurements (Massart et al., 1988).

The multivariate statistical techniques such as cluster analysis (CA), factor analysis (FA), principal component analysis (PCA) and discriminant analysis (DA) have widely been used as unbiased methods in analysis of water-quality data for drawing meaningful information (Brown et al., 1996; Vega et al., 1998; Helena et al., 2000; Bengraine and Marhaba, 2003; Voncina et al., 2002; Liu et al., 2003; Reghunath et al., 2002; Wunderlin et al., 2001; Simeonov et al., 2003). The multivariate treatment of data is widely used to characterize and evaluate surface and freshwater quality and it is useful for evidencing temporal and spatial variations caused by natural and anthropogenic factors linked to seasonality (Vega et al., 1998; Reisenhofer et al., 1998; Helena et al., 2000).

Cluster analysis helps in grouping objects (cases) into classes (clusters) on the basis of similarities within a class and dissimilarities between different classes. The class characteristics are not known in advance but may be determined from the analysis. The results of CA help in interpreting the data and indicate patterns (Vega et al., 1998). Factor analysis, which includes PCA is a very powerful technique applied to reduce the dimensionality of a data set consisting of a large number of inter-related variables, while retaining as much as possible the variability present in data set. This reduction is achieved by transforming the data set into a new set of variables, the principal components (PCs), which are orthogonal (non-correlated) and are arranged in decreasing order of importance. Mathematically, the PCs are computed from covariance or other cross-product matrix, which describes the dispersion of the multiple measured parameters to obtain eigenvalues and eigenvectors. Principal components are the linear combinations of the original variables and the eigenvectors (Wunderlin et al., 2001). Varifactors (VFs), a new group of variables are obtained by rotating the axis defined by PCA. Varimax rotation distributes the PC loadings such that their dispersion is maximized by minimizing the number of large and small coefficients (Richman, 1986). Besides considerable data reduction, entire data set variability is described through only a few VFs/PCs without losing much information. Further, grouping of the studied variables according to their common features by VFs helps in data interpretation (Vega et al., 1998; Morales et al., 1999; Helena et al., 2000; Simeonov et al., 2003). In contrast to the exploratory features of CA, DA provides statistical classification of samples and it is performed with prior knowledge of membership of objects to particular group or cluster (such as temporal or spatial grouping of a sample is known from its sampling time or site). Further, DA helps in grouping the samples sharing common properties. Although, not as common as CA and FA/PCA, DA has recently been applied successfully to water quality (Wunderlin et al., 2001) and other data sets (Wiggins et al., 1999; Parveen et al., 1999; Hagedorn et al., 1999).

In the present study, the efficiency of three different multivariate statistical techniques (CA, FA/PCA, DA) were applied to evaluate both the spatial and temporal variations in water-quality data matrix of the Gomti River (India) without loosing important information, which were generated under the 5-years (1994–1998) monitoring program (17,790 observations).

Section snippets

Monitoring area

The Gomti River, a major tributary of the Ganga River system in northern India has been selected for this case study. The river originates from a natural reservoir in the forested area (elevation of about 200 m; North latitude 28° 34′ and East longitude 80° 07′) near Pilibhit town in Uttar Pradesh, about 50 km south of the Himalayan foothills. The river flowing through the central and eastern part of Uttar Pradesh traverses a total distance of about 730 km before finally merging with the Ganga

Results and discussion

Water-quality monitoring of the Gomti River was regularly conducted over a period of 5-years (1994–1998) at eight different sites. All the samples were analysed for various parameters (24 nos.) and their site wise mean values and standard deviations are summarized in Table 1.

The temporal variations of the river water-quality parameters (Table 1) were evaluated through season-parameter correlation matrix, which showed that all the measured parameters (24 nos.) were found significantly (p<0.05)

Conclusions

Water-quality monitoring programs generate complex multidimensional data that need multivariate statistical treatment for their analysis and interpretation of the underlying information. In this case study, hierarchical CA helped to group the eight sampling sites into three clusters of similar characteristics pertaining to water-quality characteristics and pollution (natural and anthropogenic) sources. Extracted grouping information can be of use in reducing the number of sampling sites on the

Acknowledgements

The authors would like to thank the National River Conservation Directorate (NRCD), Ministry of Environment & Forests, Govt. of India for financial support and Director, ITRC, Lucknow for encouragement. Suggestions and help provided by Prof. V. Simeonov (Faculty of Chemistry, University of Sofia, Bulgaria) and Prof. DA Wunderlin (Facultad de Ciencias Quimicas, Universidad National de Cordoba, Argentina) in multivariate analysis of data are thankfully acknowledged.

Cited by (0)

View full text