Skip to main content

Entropy-Based Variational Inference for Semi-Bounded Data Clustering in Medical Applications

  • Chapter
  • First Online:
Artificial Intelligence and Data Mining in Healthcare

Abstract

Over the past decades, the unprecedented availability of various types of data and simultaneous development of technology established extensive interest in applying numerous machine learning approaches to extract the implicit patterns, acquire information and retrieve latent meaningful knowledge. Such powerful statistical tools have been applied in various fields of science.

One of the vital domains where these techniques could be potentially deployed is healthcare. The main intention in this field is that medical diagnosis procedures and healthcare examinations generate a huge amount of various data types such as text, image, video and signal. Dealing with such large complex data is beyond the scope of human competence. Consequently, machine learning tools are significantly valuable as they assist the clinicians in processing medical datasets, achieving broader insight, planning and managing diseases, providing better care which leads to having better outcomes including elimination of unnecessary costs and increasing patient satisfaction.

In this work, we focus on one of the main clustering methods of machine learning approaches, namely mixture models. These capable techniques have demonstrated high potential and flexibility to express data. Gaussian mixture models (GMM) have been widely applied in various fields of research to express symmetric data. However, for asymmetric and non-Gaussian data, other alternatives such as inverted Dirichlet mixture models could describe the data more accurately. To learn our model, we employ an entropy-based variational approach and then evaluate it on four medical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.

    Google Scholar 

  2. Geoffrey McLachlan and David Peel. Finite mixture models. John Wiley & Sons, 2004.

    MATH  Google Scholar 

  3. Trevor Hastie and Robert Tibshirani. Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):155–176, 1996.

    MathSciNet  MATH  Google Scholar 

  4. Wentao Fan, Nizar Bouguila, and Djemel Ziou. Variational learning for finite Dirichlet mixture models and applications. IEEE transactions on neural networks and learning systems, 23(5):762–774, 2012.

    Article  Google Scholar 

  5. Parisa Tirdad, Nizar Bouguila, and Djemel Ziou. Variational learning of finite inverted Dirichlet mixture models and applications. In Artificial Intelligence Applications in Information and Communication Technologies, pages 119–145. Springer, 2015.

    Google Scholar 

  6. Wentao Fan and Nizar Bouguila. An accelerated variational framework for face expression recognition. In 2018 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), pages 1–5. IEEE, 2018.

    Google Scholar 

  7. Wentao Fan, Can Hu, Jixiang Du, and Nizar Bouguila. A novel model-based approach for medical image segmentation using spatially constrained inverted Dirichlet mixture models. Neural Processing Letters, 47(2):619–639, 2018.

    Google Scholar 

  8. Nizar Bouguila and Djemel Ziou. A hybrid sem algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture. IEEE Transactions on Image Processing, 15(9):2657–2668, 2006.

    Article  Google Scholar 

  9. Nizar Bouguila. Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Transactions on Knowledge and Data Engineering, 24(12):2184–2202, 2011.

    Article  Google Scholar 

  10. Can Hu, Wentao Fan, Ji-Xiang Du, and Nizar Bouguila. A novel statistical approach for clustering positive data based on finite inverted Beta-Liouville mixture models. Neurocomputing, 333:110–123, 2019.

    Article  Google Scholar 

  11. Geoffrey J McLachlan. Mixture models in statistics. 2015.

    Google Scholar 

  12. Tarek Elguebaly and Nizar Bouguila. A hierarchical nonparametric Bayesian approach for medical images and gene expressions classification. Soft Computing, 19(1):189–204, 2015.

    Article  Google Scholar 

  13. Dimitris Karlis and Evdokia Xekalaki. Choosing initial values for the em algorithm for finite mixtures. Computational Statistics & Data Analysis, 41(3–4):577–590, 2003.

    Article  MathSciNet  Google Scholar 

  14. Kenji Fukumizu and Shun-ichi Amari. Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3):317–327, 2000.

    Article  Google Scholar 

  15. Michael Evans, Tim Swartz, et al. Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems. Statistical science, 10(3):254–272, 1995.

    Google Scholar 

  16. Christian Robert and George Casella. Monte Carlo statistical methods. Springer Science & Business Media, 2013.

    Google Scholar 

  17. Constantinos Constantinopoulos and Aristidis Likas. Unsupervised learning of Gaussian mixtures based on variational component splitting. IEEE Transactions on Neural Networks, 18(3):745–755, 2007.

    Article  Google Scholar 

  18. Hagai Attias. A variational Bayesian framework for graphical models. In Advances in neural information processing systems, pages 209–215, 2000.

    Google Scholar 

  19. Adrian Corduneanu and Christopher M Bishop. Variational Bayesian model selection for mixture distributions. In Artificial intelligence and Statistics, volume 2001, pages 27–34. Morgan Kaufmann Waltham, MA, 2001.

    Google Scholar 

  20. Mark William Woolrich and Timothy E Behrens. Variational Bayes inference of spatial mixture models for segmentation. IEEE Transactions on Medical Imaging, 25(10):1380–1391, 2006.

    Google Scholar 

  21. Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233, 1999.

    Google Scholar 

  22. Jeffrey Regier, Andrew Miller, Jon McAuliffe, Ryan Adams, Matt Hoffman, Dustin Lang, David Schlegel, and Mr Prabhat. Celeste: Variational inference for a generative model of astronomical images. In International Conference on Machine Learning, pages 2095–2103, 2015.

    Google Scholar 

  23. David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.

    Google Scholar 

  24. Christopher M Bishop. Variational learning in graphical models and neural networks. In International Conference on Artificial Neural Networks, pages 13–22. Springer, 1998.

    Google Scholar 

  25. JM Bernardo, MJ Bayarri, JO Berger, AP Dawid, D Heckerman, AFM Smith, M West, et al. The variational Bayesian em algorithm for incomplete data: with application to scoring graphical model structures. Bayesian statistics, 7:453–464, 2003.

    MathSciNet  Google Scholar 

  26. Nizar Bouguila and Djemel Ziou. High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE transactions on pattern analysis and machine intelligence, 29(10):1716–1731, 2007.

    Article  Google Scholar 

  27. Antonio Penalver and Francisco Escolano. Entropy-based incremental variational Bayes learning of Gaussian mixtures. IEEE transactions on neural networks and learning systems, 23(3):534–540, 2012.

    Article  Google Scholar 

  28. Wentao Fan, Faisal R Al-Osaimi, Nizar Bouguila, and Jixiang Du. Proportional data modeling via entropy-based variational Bayes learning of mixture models. Applied Intelligence, 47(2):473–487, 2017.

    Google Scholar 

  29. Wentao Fan, Nizar Bouguila, Sami Bourouis, and Yacine Laalaoui. Entropy-based variational Bayes learning framework for data clustering. IET Image Processing, 12(10):1762–1772, 2018.

    Article  Google Scholar 

  30. W Raghupathi and S Kudyba. Healthcare informatics: improving efficiency and productivity. In Data Mining in Health Care, pages 211–223. 2010.

    Google Scholar 

  31. George G Tiao and Irwin Cuttman. The inverted Dirichlet distribution with applications. Journal of the American Statistical Association, 60(311):793–805, 1965.

    Google Scholar 

  32. D Chandler. Oxford university press; new york: 1987. Introduction to Modern Statistical Mechanics, pages 234–270.

    Google Scholar 

  33. Josef Kittler, Mohamad Hatef, Robert PW Duin, and Jiri Matas. On combining classifiers. IEEE transactions on pattern analysis and machine intelligence, 20(3):226–239, 1998.

    Google Scholar 

  34. Lev Faivishevsky and Jacob Goldberger. Ica based on a smooth estimation of the differential entropy. In Advances in neural information processing systems, pages 433–440, 2009.

    Google Scholar 

  35. Nikolai Leonenko, Luc Pronzato, Vippal Savani, et al. A class of rényi information estimators for multidimensional densities. The Annals of Statistics, 36(5):2153–2182, 2008.

    Google Scholar 

  36. WHO. Cardiovascular Diseases report of WHO. https://www.who.int/health-topics/cardiovascular-diseases/.

  37. Chayakrit Krittanawong, HongJu Zhang, Zhen Wang, Mehmet Aydar, and Takeshi Kitai. Artificial intelligence in precision cardiovascular medicine. Journal of the American College of Cardiology, 69(21):2657–2664, 2017.

    Article  Google Scholar 

  38. UCI repository. Heart disease. https://archive.ics.uci.edu/ml/datasets/Heart+Disease/.

  39. WHO. Diabetes disease fact sheet. https://www.who.int/news-room/fact-sheets/detail/diabetes/.

  40. Ioannis Kavakiotis, Olga Tsave, Athanasios Salifoglou, Nicos Maglaveras, Ioannis Vlahavas, and Ioanna Chouvarda. Machine learning and data mining methods in diabetes research. Computational and structural biotechnology journal, 15:104–116, 2017.

    Article  Google Scholar 

  41. Kaggle. Diabetes disease dataset. https://www.kaggle.com/uciml/pima-indians-diabetes-database/.

  42. WHO. WHO cancer statistics. https://www.who.int/news-room/fact-sheets/detail/cancer/.

  43. UCI. Lung cancer. https://archive.ics.uci.edu/ml/datasets/Lung+Cancer//.

  44. WHO report on breast cancer. Breast cancer dataset. https://www.who.int/cancer/prevention/diagnosis-screening/breast-cancer/en//.

  45. breast cancer. Cytological breast tissue dataset. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)/.

Download references

Acknowledgements

We express our thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC) as this research was completed by their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Narges Manouchehri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Manouchehri, N., Rahmanpour, M., Bouguila, N. (2021). Entropy-Based Variational Inference for Semi-Bounded Data Clustering in Medical Applications. In: Masmoudi, M., Jarboui, B., Siarry, P. (eds) Artificial Intelligence and Data Mining in Healthcare. Springer, Cham. https://doi.org/10.1007/978-3-030-45240-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45240-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45239-1

  • Online ISBN: 978-3-030-45240-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics