Interpreting Cluster Structure in Waveform Data with Visual Assessment and Dunn’s Index

Mahallati, Sara; Bezdek, James C.; Kumar, Dheeraj; Popovic, Milos R.; Valiante, Taufik A.

doi:10.1007/978-3-319-67789-7_6

Sara Mahallati^5,6,
James C. Bezdek⁹,
Dheeraj Kumar⁸,
Milos R. Popovic^5,6 &
…
Taufik A. Valiante^5,7

Part of the book series: Studies in Computational Intelligence ((SCI,volume 739))

876 Accesses
7 Citations
2 Altmetric

Abstract

Dunn’s index was introduced in 1974 as a way to define and identify a “best” crisp partition on n objects represented by either unlabeled feature vectors or dissimilarity matrix data. This article examines the intimate relationship that exists between Dunn’s index, single linkage clustering, and a visual method called iVAT for estimating the number of clusters in the input data. The relationship of Dunn’s index to iVAT and single linkage in the labeled data case affords a means to better understand the utility of these three companion methods when data are crisply clustered in the unlabeled case (the real case). Numerical examples using simulated waveform data drawn from the field of neuroscience illustrate the natural compatibility of Dunn’s index with iVAT and single linkage. A second aim of this note is to study customizing the three methods by changing the distance measure from Euclidean distance to one that may be more appropriate for assessing the validity of crisp clusters of finite sets of waveform data. We present numerical examples that support our assertion that when used collectively, the three methods afford a useful approach to evaluation of crisp clusters in unlabeled waveform data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bezdek James C (2017) A primer on cluster analysis: 4 basic methods that (usually) work, 1st edn. Design Publishing, Sarasota, FL
Google Scholar
Theodoridis S (2009) Pattern recognition. Academic Press, London. ISBN 978-1-59749-272-0
Google Scholar
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley
Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall College Div, Englewood Cliffs, NJ
Google Scholar
Dubes R, Jain AK (1979) Validity studies in clustering methodologies. Pattern recognition, vol 11, no 4, pp 235–254, Jan 1979. ISSN 0031-3203. doi:10.1016/0031-3203(79)90034-7
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2):159–179. ISSN 0033-3123, 1860-0980. doi:10.1007/BF02294245
Gurrutxaga I, Muguerza J, Arbelaitz O, Pérez JM, Martín JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognit. Lett., 32(3):505–515, February 2011. ISSN 0167-8655. doi:10.1016/j.patrec.2010.11.006
Dimitriadou E, Dolničar S, Weingessel A (2002) An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1):137–159. ISSN 0033-3123, 1860-0980. doi:10.1007/BF02294713
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res, 11:2837–2854. ISSN 1532-4435
Google Scholar
Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recognition 46(1):243–256. ISSN 0031-3203. doi:10.1016/j.patcog.2012.07.021
Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. ISSN 0022-0280. doi:10.1080/01969727308546046
Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern Part B (Cybern) 28(3):301–315. ISSN 1083-4419. doi:10.1109/3477.678624
Paparrizos J, Gravano L (2015) K-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15, New York, NY, USA, 2015. ACM, pp 1855–1870. ISBN 978-1-4503-2758-9. doi:10.1145/2723372.2737793
Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans. Circuits Syst Video Technol 18(8):1114–1127. ISSN 1051-8215. doi:10.1109/TCSVT.2008.927109
Valdés JJ, Alsulaiman FA, Saddik AEl (2016) Visualization of handwritten signatures based on haptic information. In: Abielmona R, Falcon R, Zincir-Heywood N, Abbass HA (eds) Recent advances in computational intelligence in defense and security, number 621 in studies in computational intelligence. Springer International Publishing, pp 277–307. ISBN 978-3-319-26448-6 978-3-319-26450-9. doi:10.1007/978-3-319-26450-9-11
Bezdek JC, Hathaway RJ (2002) VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 international joint conference on neural networks, 2002. IJCNN ’02, vol 3, pp 2225–2230. doi:10.1109/IJCNN.2002.1007487
John N. Weinstein. A Postgenomic Visual Icon. Science, 319(5871):1772–1773, March 2008. ISSN 0036-8075, 1095-9203. doi:10.1126/science.1151888
Wilkinson L, Friendly M (2009) The history of the cluster heat map. Am Stat 63(2):179–184. ISSN 0003-1305. doi:10.1198/tas.2009.0033
Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36(6):1389–1401. ISSN 1538-7305. doi:10.1002/j.1538-7305.1957.tb01515.x
Havens TC, Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5):813–822. ISSN 1041-4347. doi:10.1109/TKDE.2011.33
Gower JC, Ross GJS (1969) Minimum spanning trees and single linkage cluster analysis. J R Stat Soc Ser C (Appl Stat), 18(1):54–64. ISSN 0035-9254. doi:10.2307/2346439
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. ISSN 2168-2267. doi:10.1109/TCYB.2015.2477416
Havens TC, Bezdek JC, Keller JM, Popescu M, Huband JM (2009) Is VAT really single linkage in disguise? Ann Math Artif Intell 55(3–4):237. ISSN 1012–2443:1573–7470. doi:10.1007/s10472-009-9157-2
Havens TC, Bezdek JC, Palaniswami M (2013) Scalable single linkage hierarchical clustering for big data. In: 2013 IEEE eighth international conference on intelligent sensors, sensor networks and information processing, pp 396–401. doi:10.1109/ISSNIP.2013.6529823
Havens TC, Bezdek JC, Keller JM, Popescu M (2008) Dunn’s cluster validity index as a contrast measure of VAT images. In: 2008 19th international conference on pattern recognition, pp 1–4. doi:10.1109/ICPR.2008.4761772
Regalia G, Coelli S, Biffi E, Ferrigno G, Pedrocchi A (2016) A framework for the comparative assessment of neuronal spike sorting algorithms towards more accurate off-line and on-line microelectrode arrays data analysis. Comput Intell Neurosci 2016:e8416237. ISSN 1687-5265. doi:10.1155/2016/8416237
Barthó P, Hirase H, Monconduit L, Zugaro M, Harris KD, Buzsáki G (2004) Characterization of neocortical principal cells and interneurons by network interactions and extracellular features. J Neurophys 92(1):600–608. ISSN 0022-3077. doi:10.1152/jn.01170.2003
Kumar D, Bezdek JC, Rajasegarar S, Leckie C, Palaniswami M (2017) A visual-numeric approach to clustering and anomaly detection for trajectory data. Vis Comput 33(3):265–281. ISSN 0178-2789, 1432-2315. doi:10.1007/s00371-015-1192-x
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605. ISSN 1533-7928
Google Scholar
Rutishauser U, Schuman EM, Mamelak AN (2006) Online detection and sorting of extracellularly recorded action potentials in human medial temporal lobe recordings, in vivo. J Neurosc Methods 154(1–2):204–224. ISSN 0165-0270. doi:10.1016/j.jneumeth.2005.12.033
Ruiz EV, Nolla FC, Segovia HR (1985) Is the DTW “distance” really a metric? An algorithm reducing the number of DTW comparisons in isolated word recognition. Speech Commun 4(4):333–344. ISSN 0167-6393. doi:10.1016/0167-6393(85)90058-5
Wachman G, Khardon R, Protopapas P, Alcock CR (2009) Kernels for periodic time series arising in astronomy. In: Machine learning and knowledge discovery in databases. Springer, Heidelberg, pp 489–505. doi:10.1007/978-3-642-04174-7-32
Cao Y, Rakhilin N, Gordon PH, Shen X, Kan EC (2016) A real-time spike classification method based on dynamic time warping for extracellular enteric neural recording with large waveform variability. J Neurosci Methods 261:97–109. ISSN 0165-0270. doi:10.1016/j.jneumeth.2015.12.006
Kim S, McNames J (2007) Automatic spike detection based on adaptive template matching for extracellular neural recordings. J Neurosci Methods 165(2):165–174. ISSN 0165-0270. doi:10.1016/j.jneumeth.2007.05.033
Franke F, Pröpper R, Alle H, Meier P, Geiger JRP, Obermayer K, Munk MHJ (2015) Spike sorting of synchronous spikes from local neuron ensembles. J Neurophysiol 114(4):2535–2549. ISSN 0022-3077, 1522-1598. doi:10.1152/jn.00993.2014

Download references

Author information

Authors and Affiliations

Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, Canada
Sara Mahallati, Milos R. Popovic & Taufik A. Valiante
Toronto Rehabilitation Institute, University Health Network, Toronto, Canada
Sara Mahallati & Milos R. Popovic
Krembil Research Institute, University Health Network, Toronto, Canada
Taufik A. Valiante
Lyles School of Civil Engineering, Purdue University, West Lafayette, IN, USA
Dheeraj Kumar
Computer Science and Information Systems Departments, University of Melbourne, Melbourne, Australia
James C. Bezdek

Authors

Sara Mahallati
View author publications
You can also search for this author in PubMed Google Scholar
James C. Bezdek
View author publications
You can also search for this author in PubMed Google Scholar
Dheeraj Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Milos R. Popovic
View author publications
You can also search for this author in PubMed Google Scholar
Taufik A. Valiante
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sara Mahallati .

Editor information

Editors and Affiliations

Faculty of Computer Science, Otto von Guericke University Magdeburg Faculty of Computer Science, Magdeburg, Germany
Sanaz Mostaghim
Faculty of Computer Science, Otto von Guericke University Magdeburg, Faculty of Computer Science, Magdeburg, Germany
Andreas Nürnberger
Department of Computer and Information Science,, University of Konstanz, Department of Computer and Information Science, Konstanz, Germany
Christian Borgelt

Appendices

Appendix 1. The VAT and IVAT Reordering Algorithms

A.1 The input matrix D for VAT in line 1 is positive definite and symmetric. Any distance matrix will be of this type, but there are a number of cases that don’t satisfy these constraints. And the size of D can be an issue. This basic version is only useful for fairly small values of n (say, n 10,000 or so). Extensions to rectangular, asymmetric and big data inputs are covered in the notes and remarks for this chapter.

A.2 Prim’s MST algorithm usually starts at either end (i.e., vertex) of a smallest weight edge. Initialization at line 3 starts at the opposite extreme - either end of a largest weight edge. This prevents VAT from a certain type of off-course deviation that is discussed in Bezdek and Hathaway (2002).

A.3 The argmax and argmin function calls in lines 3, 7 and 15 produce sets, not single values. For example, in A4.1 is the set of all ordered pairs (i, j) that have a maximum distance. In case of ties, use a vertex from either end of any one edge in the set.

Appendix 2. Basic Single Linkage Clustering Algorithm

Appendix 3. Shape Based Distance

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mahallati, S., Bezdek, J.C., Kumar, D., Popovic, M.R., Valiante, T.A. (2018). Interpreting Cluster Structure in Waveform Data with Visual Assessment and Dunn’s Index. In: Mostaghim, S., Nürnberger, A., Borgelt, C. (eds) Frontiers in Computational Intelligence. Studies in Computational Intelligence, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-319-67789-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-67789-7_6
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67788-0
Online ISBN: 978-3-319-67789-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics