Abstract
In several real life and research situations data are collected in the form of intervals, the so called interval-valued data. In this paper a fuzzy clustering method to analyse interval-valued data is presented. In particular, we address the problem of interval-valued data corrupted by outliers and noise. In order to cope with the presence of outliers we propose to employ a robust metric based on the exponential distance in the framework of the Fuzzy C-medoids clustering mode, the Fuzzy C-medoids clustering model for interval-valued data with exponential distance. The exponential distance assigns small weights to outliers and larger weights to those points that are more compact in the data set, thus neutralizing the effect of the presence of anomalous interval-valued data. Simulation results pertaining to the behaviour of the proposed approach as well as two empirical applications are provided in order to illustrate the practical usefulness of the proposed method.
Similar content being viewed by others
Notes
http://www.brace.sinanet.apat.it/web/struttura.html. Data retrieved on 2015-05-03.
References
Anderson, D. T., Bezdek, J. C., Popescu, M., & Keller, J. M. (2010). Comparing fuzzy, probabilistic, and possibilistic partitions. IEEE Transactions on Fuzzy Systems, 18(5), 906–918.
Campello, R. J., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems, 157(21), 2858–2875.
Cazes, P., Chouakria, A., Diday, E., & Schektrman, Y. (1997). Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée, 45(3), 5–24.
Coppi, R., & D’Urso, P. (2002). Fuzzy k-means clustering models for triangular fuzzy time trajectories. Statistical Methods and Applications, 11(1), 21–40.
De Carvalho, Fd A T, & Lechevallier, Y. (2009). Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recognition, 42(7), 1223–1236.
De Carvalho, Fd A T, & Tenório, C. P. (2010). Fuzzy k-means clustering algorithms for interval-valued data based on adaptive quadratic distances. Fuzzy Sets and Systems, 161(23), 2978–2999.
De Carvalho, Fd A T, De Souza, R. M., Chavent, M., & Lechevallier, Y. (2006). Adaptive hausdorff distances and dynamic clustering of symbolic interval data. Pattern Recognition Letters, 27(3), 167–179.
Denoeux, T., & Masson, M. (2000). Multidimensional scaling of interval-valued dissimilarity data. Pattern Recognition Letters, 21(1), 83–92.
Dey, V., Pratihar, D. K., & Datta, G. L. (2011). Genetic algorithm-tuned entropy-based fuzzy c-means algorithm for obtaining distinct and compact clusters. Fuzzy Optimization and Decision Making, 10(2), 153–166.
Duarte Silva, A. P., & Brito, P. (2015). Discriminant analysis of interval data: An assessment of parametric and distance-based approaches. Journal of Classification, 32(3), 516–541. doi:10.1007/s00357-015-9189-8.
D’Urso, P., & De Giovanni, L. (2014). Robust clustering of imprecise data. Chemometrics and Intelligent Laboratory Systems, 136, 58–80.
D’Urso, P., & Giordani, P. (2004). A least squares approach to principal component analysis for interval valued data. Chemometrics and Intelligent Laboratory Systems, 70(2), 179–192.
D’Urso, P., & Giordani, P. (2006). A robust fuzzy k-means clustering model for interval valued data. Computational Statistics, 21(2), 251–269.
D’Urso, P., De Giovanni, L., & Massari, R. (2015a). Time series clustering by a robust autoregressive metric with application to air pollution. Chemometrics and Intelligent Laboratory Systems, 141, 107–124.
D’Urso, P., De Giovanni, L., & Massari, R. (2015b). Trimmed fuzzy clustering for interval-valued data. Advances in Data Analysis and Classification, 9(1), 21–40.
García-Escudero, L. A., & Gordaliza, A. (2005). A proposal for robust curve clustering. Journal of Classification, 22(2), 185–201.
Giordani, P., & Kiers, H. A. (2004). Three-way component analysis of interval-valued data. Journal of Chemometrics, 18(5), 253–264.
Gowda, K. C., & Diday, E. (1991). Symbolic clustering using a new dissimilarity measure. Pattern Recognition, 24(6), 567–578.
Guru, D. S., Kiranagi, B. B., & Nagabhushan, P. (2004). Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns. Pattern Recognition Letters, 25(10), 1203–1213.
Hung, T. W. (2007). The bi-objective fuzzy c-means cluster analysis for tsk fuzzy system identification. Fuzzy Optimization and Decision Making, 6(1), 51–61.
Kim, J., Krishnapuram, R., & Davé, R. (1996). Application of the least trimmed squares technique to prototype-based clustering. Pattern Recognition Letters, 17(6), 633–641.
Krishnapuram, R., Joshi, A., Nasraoui, O., & Yi, L. (2001). Low-complexity fuzzy relational clustering algorithms for web mining. IEEE Transactions on Fuzzy Systems, 9(4), 595–607.
Leite, D., Ballini, R., Costa, P., & Gomide, F. (2012). Evolving fuzzy granular modeling from nonstationary fuzzy data streams. Evolving Systems, 3(2), 65–79.
Wu, K. L., & Yang, M. S. (2002). Alternative c-means clustering algorithms. Pattern Recognition, 35(10), 2267–2278.
Xu, Z. (2012). Fuzzy ordered distance measures. Fuzzy Optimization and Decision Making, 11(1), 73–97.
Acknowledgments
The authors thank the Editors and the referees for their useful comments and suggestions which helped to improve the quality and presentation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
D’Urso, P., Massari, R., De Giovanni, L. et al. Exponential distance-based fuzzy clustering for interval-valued data. Fuzzy Optim Decis Making 16, 51–70 (2017). https://doi.org/10.1007/s10700-016-9238-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10700-016-9238-8