Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India)—a case study
Introduction
Surface waters are most vulnerable to pollution due to their easy accessibility for disposal of wastewaters. Both the natural processes, such as precipitation inputs, erosion, weathering of crustal materials, as well as the anthropogenic influences viz. urban, industrial and agricultural activities, increasing exploitation of water resources, together determine the quality of surface water in a region (Carpenter et al., 1998; Jarvie et al., 1998). Rivers play a major role in assimilation or carrying off the municipal and industrial wastewater and run-off from agricultural land. The municipal and industrial wastewater discharge constitutes the constant polluting source, whereas, the surface run-off is a seasonal phenomenon, largely affected by climate in the basin. Seasonal variations in precipitation, surface run-off, interflow, groundwater flow and pumped in and outflows have a strong effect on river discharge and subsequently on the concentration of pollutants in river water (Vega et al., 1998). Since, rivers constitute the main inland water resources for domestic, industrial and irrigation purposes, it is imperative to prevent and control the rivers pollution and to have reliable information on quality of water for effective management. In view of the spatial and temporal variations in hydrochemistry of rivers, regular monitoring programs are required for reliable estimates of the water quality. This results in a huge and complex data matrix comprised of a large number of physico-chemical parameters (Chapman, 1992), which are often difficult to interpret drawing meaningful conclusions (Dixon and Chiswell, 1996). In India, under the National Rivers Conservation Program, water quality of all the major river systems is regularly monitored at several sites for a large number of physico-chemical, bacteriological and hydrological parameters with an outcome of tremendous databases of high complexity. Such monitoring programs involve huge financial inputs. Thus, there is a need to optimize the monitoring networks, number of water-quality parameters, reducing these to representative ones without losing useful information. The multivariate statistical techniques and exploratory data analysis are the appropriate tools for a meaningful data reduction and interpretation of multi-constituent chemical and physical measurements (Massart et al., 1988).
The multivariate statistical techniques such as cluster analysis (CA), factor analysis (FA), principal component analysis (PCA) and discriminant analysis (DA) have widely been used as unbiased methods in analysis of water-quality data for drawing meaningful information (Brown et al., 1996; Vega et al., 1998; Helena et al., 2000; Bengraine and Marhaba, 2003; Voncina et al., 2002; Liu et al., 2003; Reghunath et al., 2002; Wunderlin et al., 2001; Simeonov et al., 2003). The multivariate treatment of data is widely used to characterize and evaluate surface and freshwater quality and it is useful for evidencing temporal and spatial variations caused by natural and anthropogenic factors linked to seasonality (Vega et al., 1998; Reisenhofer et al., 1998; Helena et al., 2000).
Cluster analysis helps in grouping objects (cases) into classes (clusters) on the basis of similarities within a class and dissimilarities between different classes. The class characteristics are not known in advance but may be determined from the analysis. The results of CA help in interpreting the data and indicate patterns (Vega et al., 1998). Factor analysis, which includes PCA is a very powerful technique applied to reduce the dimensionality of a data set consisting of a large number of inter-related variables, while retaining as much as possible the variability present in data set. This reduction is achieved by transforming the data set into a new set of variables, the principal components (PCs), which are orthogonal (non-correlated) and are arranged in decreasing order of importance. Mathematically, the PCs are computed from covariance or other cross-product matrix, which describes the dispersion of the multiple measured parameters to obtain eigenvalues and eigenvectors. Principal components are the linear combinations of the original variables and the eigenvectors (Wunderlin et al., 2001). Varifactors (VFs), a new group of variables are obtained by rotating the axis defined by PCA. Varimax rotation distributes the PC loadings such that their dispersion is maximized by minimizing the number of large and small coefficients (Richman, 1986). Besides considerable data reduction, entire data set variability is described through only a few VFs/PCs without losing much information. Further, grouping of the studied variables according to their common features by VFs helps in data interpretation (Vega et al., 1998; Morales et al., 1999; Helena et al., 2000; Simeonov et al., 2003). In contrast to the exploratory features of CA, DA provides statistical classification of samples and it is performed with prior knowledge of membership of objects to particular group or cluster (such as temporal or spatial grouping of a sample is known from its sampling time or site). Further, DA helps in grouping the samples sharing common properties. Although, not as common as CA and FA/PCA, DA has recently been applied successfully to water quality (Wunderlin et al., 2001) and other data sets (Wiggins et al., 1999; Parveen et al., 1999; Hagedorn et al., 1999).
In the present study, the efficiency of three different multivariate statistical techniques (CA, FA/PCA, DA) were applied to evaluate both the spatial and temporal variations in water-quality data matrix of the Gomti River (India) without loosing important information, which were generated under the 5-years (1994–1998) monitoring program (17,790 observations).
Section snippets
Monitoring area
The Gomti River, a major tributary of the Ganga River system in northern India has been selected for this case study. The river originates from a natural reservoir in the forested area (elevation of about 200 m; North latitude 28° 34′ and East longitude 80° 07′) near Pilibhit town in Uttar Pradesh, about 50 km south of the Himalayan foothills. The river flowing through the central and eastern part of Uttar Pradesh traverses a total distance of about 730 km before finally merging with the Ganga
Results and discussion
Water-quality monitoring of the Gomti River was regularly conducted over a period of 5-years (1994–1998) at eight different sites. All the samples were analysed for various parameters (24 nos.) and their site wise mean values and standard deviations are summarized in Table 1.
The temporal variations of the river water-quality parameters (Table 1) were evaluated through season-parameter correlation matrix, which showed that all the measured parameters (24 nos.) were found significantly (p<0.05)
Conclusions
Water-quality monitoring programs generate complex multidimensional data that need multivariate statistical treatment for their analysis and interpretation of the underlying information. In this case study, hierarchical CA helped to group the eight sampling sites into three clusters of similar characteristics pertaining to water-quality characteristics and pollution (natural and anthropogenic) sources. Extracted grouping information can be of use in reducing the number of sampling sites on the
Acknowledgements
The authors would like to thank the National River Conservation Directorate (NRCD), Ministry of Environment & Forests, Govt. of India for financial support and Director, ITRC, Lucknow for encouragement. Suggestions and help provided by Prof. V. Simeonov (Faculty of Chemistry, University of Sofia, Bulgaria) and Prof. DA Wunderlin (Facultad de Ciencias Quimicas, Universidad National de Cordoba, Argentina) in multivariate analysis of data are thankfully acknowledged.
References (24)
- et al.
Using principal component analysis to monitor spatial and temporal changes in water quality
J. Hazard. Mater. B
(2003) - et al.
Review of aquatic monitoring program design
Water Res.
(1996) - et al.
Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga river, Spain) by principal component analysis
Water Res.
(2000) - et al.
Nitrogen and phosphorus in east-coast British riversspeciation, sources and biological significance
Sci. Tot. Environ.
(1998) - et al.
Application of factor analysis in the assessment of groundwater quality in a blackfoot disease area in Taiwan
Sci. Tot. Environ.
(2003) - et al.
An environmental study by factor analysis of surface seawaters in the gulf of Valencia (Western Mediterranean)
Anal. Chim. Acta
(1999) - et al.
The utility of multivariate statistical techniques in hydrogeochemical studiesan example from Karnataka, India
Water Res.
(2002) - et al.
Using chemical and physical parameters to define the quality of karstic freshwaters (Timavo River, North-eastern Italy)a chemometric approach
Water Res.
(1998) - et al.
Assessment of the surface water quality in Northern Greece
Water Res.
(2003) - et al.
Assessment of seasonal and polluting effects on the quality of river water by exploratory data analysis
Water Res.
(1998)