Skip to main content

Interpreting Cluster Structure in Waveform Data with Visual Assessment and Dunn’s Index

  • Chapter
  • First Online:
Book cover Frontiers in Computational Intelligence

Part of the book series: Studies in Computational Intelligence ((SCI,volume 739))

Abstract

Dunn’s index was introduced in 1974 as a way to define and identify a “best” crisp partition on n objects represented by either unlabeled feature vectors or dissimilarity matrix data. This article examines the intimate relationship that exists between Dunn’s index, single linkage clustering, and a visual method called iVAT for estimating the number of clusters in the input data. The relationship of Dunn’s index to iVAT and single linkage in the labeled data case affords a means to better understand the utility of these three companion methods when data are crisply clustered in the unlabeled case (the real case). Numerical examples using simulated waveform data drawn from the field of neuroscience illustrate the natural compatibility of Dunn’s index with iVAT and single linkage. A second aim of this note is to study customizing the three methods by changing the distance measure from Euclidean distance to one that may be more appropriate for assessing the validity of crisp clusters of finite sets of waveform data. We present numerical examples that support our assertion that when used collectively, the three methods afford a useful approach to evaluation of crisp clusters in unlabeled waveform data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bezdek James C (2017) A primer on cluster analysis: 4 basic methods that (usually) work, 1st edn. Design Publishing, Sarasota, FL

    Google Scholar 

  2. Theodoridis S (2009) Pattern recognition. Academic Press, London. ISBN 978-1-59749-272-0

    Google Scholar 

  3. Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley

    Google Scholar 

  4. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall College Div, Englewood Cliffs, NJ

    Google Scholar 

  5. Dubes R, Jain AK (1979) Validity studies in clustering methodologies. Pattern recognition, vol 11, no 4, pp 235–254, Jan 1979. ISSN 0031-3203. doi:10.1016/0031-3203(79)90034-7

  6. Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2):159–179. ISSN 0033-3123, 1860-0980. doi:10.1007/BF02294245

  7. Gurrutxaga I, Muguerza J, Arbelaitz O, Pérez JM, Martín JI (2011) Towards a standard methodology to evaluate internal cluster validity indices. Pattern Recognit. Lett., 32(3):505–515, February 2011. ISSN 0167-8655. doi:10.1016/j.patrec.2010.11.006

  8. Dimitriadou E, Dolničar S, Weingessel A (2002) An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1):137–159. ISSN 0033-3123, 1860-0980. doi:10.1007/BF02294713

  9. Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res, 11:2837–2854. ISSN 1532-4435

    Google Scholar 

  10. Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recognition 46(1):243–256. ISSN 0031-3203. doi:10.1016/j.patcog.2012.07.021

  11. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. ISSN 0022-0280. doi:10.1080/01969727308546046

  12. Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern Part B (Cybern) 28(3):301–315. ISSN 1083-4419. doi:10.1109/3477.678624

  13. Paparrizos J, Gravano L (2015) K-shape: efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, SIGMOD ’15, New York, NY, USA, 2015. ACM, pp 1855–1870. ISBN 978-1-4503-2758-9. doi:10.1145/2723372.2737793

  14. Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans. Circuits Syst Video Technol 18(8):1114–1127. ISSN 1051-8215. doi:10.1109/TCSVT.2008.927109

  15. Valdés JJ, Alsulaiman FA, Saddik AEl (2016) Visualization of handwritten signatures based on haptic information. In: Abielmona R, Falcon R, Zincir-Heywood N, Abbass HA (eds) Recent advances in computational intelligence in defense and security, number 621 in studies in computational intelligence. Springer International Publishing, pp 277–307. ISBN 978-3-319-26448-6 978-3-319-26450-9. doi:10.1007/978-3-319-26450-9-11

  16. Bezdek JC, Hathaway RJ (2002) VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 international joint conference on neural networks, 2002. IJCNN ’02, vol 3, pp 2225–2230. doi:10.1109/IJCNN.2002.1007487

  17. John N. Weinstein. A Postgenomic Visual Icon. Science, 319(5871):1772–1773, March 2008. ISSN 0036-8075, 1095-9203. doi:10.1126/science.1151888

  18. Wilkinson L, Friendly M (2009) The history of the cluster heat map. Am Stat 63(2):179–184. ISSN 0003-1305. doi:10.1198/tas.2009.0033

  19. Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36(6):1389–1401. ISSN 1538-7305. doi:10.1002/j.1538-7305.1957.tb01515.x

  20. Havens TC, Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5):813–822. ISSN 1041-4347. doi:10.1109/TKDE.2011.33

  21. Gower JC, Ross GJS (1969) Minimum spanning trees and single linkage cluster analysis. J R Stat Soc Ser C (Appl Stat), 18(1):54–64. ISSN 0035-9254. doi:10.2307/2346439

  22. Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. ISSN 2168-2267. doi:10.1109/TCYB.2015.2477416

  23. Havens TC, Bezdek JC, Keller JM, Popescu M, Huband JM (2009) Is VAT really single linkage in disguise? Ann Math Artif Intell 55(3–4):237. ISSN 1012–2443:1573–7470. doi:10.1007/s10472-009-9157-2

  24. Havens TC, Bezdek JC, Palaniswami M (2013) Scalable single linkage hierarchical clustering for big data. In: 2013 IEEE eighth international conference on intelligent sensors, sensor networks and information processing, pp 396–401. doi:10.1109/ISSNIP.2013.6529823

  25. Havens TC, Bezdek JC, Keller JM, Popescu M (2008) Dunn’s cluster validity index as a contrast measure of VAT images. In: 2008 19th international conference on pattern recognition, pp 1–4. doi:10.1109/ICPR.2008.4761772

  26. Regalia G, Coelli S, Biffi E, Ferrigno G, Pedrocchi A (2016) A framework for the comparative assessment of neuronal spike sorting algorithms towards more accurate off-line and on-line microelectrode arrays data analysis. Comput Intell Neurosci 2016:e8416237. ISSN 1687-5265. doi:10.1155/2016/8416237

  27. Barthó P, Hirase H, Monconduit L, Zugaro M, Harris KD, Buzsáki G (2004) Characterization of neocortical principal cells and interneurons by network interactions and extracellular features. J Neurophys 92(1):600–608. ISSN 0022-3077. doi:10.1152/jn.01170.2003

  28. Kumar D, Bezdek JC, Rajasegarar S, Leckie C, Palaniswami M (2017) A visual-numeric approach to clustering and anomaly detection for trajectory data. Vis Comput 33(3):265–281. ISSN 0178-2789, 1432-2315. doi:10.1007/s00371-015-1192-x

  29. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605. ISSN 1533-7928

    Google Scholar 

  30. Rutishauser U, Schuman EM, Mamelak AN (2006) Online detection and sorting of extracellularly recorded action potentials in human medial temporal lobe recordings, in vivo. J Neurosc Methods 154(1–2):204–224. ISSN 0165-0270. doi:10.1016/j.jneumeth.2005.12.033

  31. Ruiz EV, Nolla FC, Segovia HR (1985) Is the DTW “distance” really a metric? An algorithm reducing the number of DTW comparisons in isolated word recognition. Speech Commun 4(4):333–344. ISSN 0167-6393. doi:10.1016/0167-6393(85)90058-5

  32. Wachman G, Khardon R, Protopapas P, Alcock CR (2009) Kernels for periodic time series arising in astronomy. In: Machine learning and knowledge discovery in databases. Springer, Heidelberg, pp 489–505. doi:10.1007/978-3-642-04174-7-32

  33. Cao Y, Rakhilin N, Gordon PH, Shen X, Kan EC (2016) A real-time spike classification method based on dynamic time warping for extracellular enteric neural recording with large waveform variability. J Neurosci Methods 261:97–109. ISSN 0165-0270. doi:10.1016/j.jneumeth.2015.12.006

  34. Kim S, McNames J (2007) Automatic spike detection based on adaptive template matching for extracellular neural recordings. J Neurosci Methods 165(2):165–174. ISSN 0165-0270. doi:10.1016/j.jneumeth.2007.05.033

  35. Franke F, Pröpper R, Alle H, Meier P, Geiger JRP, Obermayer K, Munk MHJ (2015) Spike sorting of synchronous spikes from local neuron ensembles. J Neurophysiol 114(4):2535–2549. ISSN 0022-3077, 1522-1598. doi:10.1152/jn.00993.2014

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Mahallati .

Editor information

Editors and Affiliations

Appendices

Appendix 1. The VAT and IVAT Reordering Algorithms

figure a

A.1 The input matrix D for VAT in line 1 is positive definite and symmetric. Any distance matrix will be of this type, but there are a number of cases that don’t satisfy these constraints. And the size of D can be an issue. This basic version is only useful for fairly small values of n (say, n 10,000 or so). Extensions to rectangular, asymmetric and big data inputs are covered in the notes and remarks for this chapter.

A.2 Prim’s MST algorithm usually starts at either end (i.e., vertex) of a smallest weight edge. Initialization at line 3 starts at the opposite extreme - either end of a largest weight edge. This prevents VAT from a certain type of off-course deviation that is discussed in Bezdek and Hathaway (2002).

A.3 The argmax and argmin function calls in lines 3, 7 and 15 produce sets, not single values. For example, in A4.1 is the set of all ordered pairs (i, j) that have a maximum distance. In case of ties, use a vertex from either end of any one edge in the set.

Appendix 2. Basic Single Linkage Clustering Algorithm

figure b

Appendix 3. Shape Based Distance

figure c

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Mahallati, S., Bezdek, J.C., Kumar, D., Popovic, M.R., Valiante, T.A. (2018). Interpreting Cluster Structure in Waveform Data with Visual Assessment and Dunn’s Index. In: Mostaghim, S., Nürnberger, A., Borgelt, C. (eds) Frontiers in Computational Intelligence. Studies in Computational Intelligence, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-319-67789-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67789-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67788-0

  • Online ISBN: 978-3-319-67789-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics