Skip to main content

A Hybrid Knowledge Based-Clustering Multi-Class SVM Approach for Genes Expression Analysis

  • Chapter
Data Mining in Biomedicine

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 7))

  • 1388 Accesses

Abstract

This study utilizes Support Vector Machines (SVM) for multi-class classification of a real data set with more than two classes. The data is a set of E. coli whole-genome gene expression profiles. The problem is how to classify these genes based on their behavior in response to changing pH of the growth medium and mutation of the acid tolerance response gene regulator GadX. In order to apply these techniques, first we have to label the genes. The labels indicate the response of genes to the experimental variables: 1-unchanged, 2-decreased expression level and 3-increased expression level. To label the genes, an unsupervised K-Means clustering technique is applied in a multi-level scheme. Multi-level K-Means clustering is itself an improvement over standard K-Means applications. SVM is used here in two ways. First, labels resulting from multi-level K-Means clustering are confirmed by SVM. To judge the performance of SVM, two other methods, K-nearest neighbor (KNN) and Linear Discriminant Analysis (LDA) are implemented. The Implementation of Multi-class SVM used one-against-one method and one-against-all method. The results show that SVM outperforms KNN and LDA. The advantage of SVM includes the generalization error and the computing time. Second, different from the first application, SVM is used to label the genes after it is trained by a set of training data obtained from K-Means clustering. This alternative SVM strategy offers an improvement over standard SVM applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  2. C.C. Chang and C.J. Lin. LIBSVM: A Library for Support Vector Machines, 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm.

    Google Scholar 

  3. C.-W. Hsu and C.-J. Lin. A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks, 13: 415–425, 2002.

    Article  PubMed  Google Scholar 

  4. J. Ma, Y. Zhao, and S. Ahalt. OSU SVM Classifier Matlab Toolbox. Available at http://eewww.eng.ohio-state.edu/~maj/osu_svm/

    Google Scholar 

  5. K.P. Bennett and E.J., Bredensteiner. Multicategory Classification by Support Vector Machines. Computational Optimization and Applications, 12: 53–79, 1999.

    Article  Google Scholar 

  6. Z. Ma, S. Gong, D.L. Tucker, T. Conway, and J.W. Foster. GadE (YhiE) activates glutamate decarboxylase-dependent acid resistance in Escherichia coli K12. Molecular Microbiology, 49: 1309–1320, 2003.

    Article  PubMed  CAS  Google Scholar 

  7. M.P.S. Brown, W.N., Grundy, D. Lin, N. Cristianini, C., Sugnet, M. Ares, and D. Haussler. Knowledge-based Analysis of Microarray Gene Expression Data Using Support Vector Machines. Proceedings of the National Academy of Sciences, 97(1): 262–267, 2000.

    Article  CAS  Google Scholar 

  8. P. Baldi and S. Brunak. Bioinformatics: A Machine Learning Approach. MIT Press, Cambridge, MA, 2002.

    Google Scholar 

  9. S. Haykin. Neural Networks: A Comprehensive Foundation. 2nd edition, Prentice-Hall, Upper Saddle River, NJ, 1999.

    Google Scholar 

  10. T. Conway, B. Kraus, D.L. Tucker, D.J. Smalley, A.F. Dorman, and L. McKibben. DNA Array Analysis in a Microsoft Windows Environment. Biotechniques, 32: 110–119, 2002.

    PubMed  CAS  Google Scholar 

  11. H. Tao, C. Bausch, C. Richmond, F.R. Blattner, and T. Conway. Functional Genomics: Expression Analysis of Escherichia coli Growing on Minimal and Rich Media. Journal of Bacteriology, 181: 6425–6440, 1999.

    PubMed  CAS  Google Scholar 

  12. R._O. Duda, P. E. Hart, and D.G. Stork. Pattern Classification. John Wiley & Sons, New York, 2001.

    Google Scholar 

  13. V.N. Vapnik. The Nature of Statistical Learning Theory. John Wiley & Sons, New York, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Santosa, B., Conway, T., Trafalis, T. (2007). A Hybrid Knowledge Based-Clustering Multi-Class SVM Approach for Genes Expression Analysis. In: Pardalos, P.M., Boginski, V.L., Vazacopoulos, A. (eds) Data Mining in Biomedicine. Springer Optimization and Its Applications, vol 7. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69319-4_15

Download citation

Publish with us

Policies and ethics