Abstract
This study utilizes Support Vector Machines (SVM) for multi-class classification of a real data set with more than two classes. The data is a set of E. coli whole-genome gene expression profiles. The problem is how to classify these genes based on their behavior in response to changing pH of the growth medium and mutation of the acid tolerance response gene regulator GadX. In order to apply these techniques, first we have to label the genes. The labels indicate the response of genes to the experimental variables: 1-unchanged, 2-decreased expression level and 3-increased expression level. To label the genes, an unsupervised K-Means clustering technique is applied in a multi-level scheme. Multi-level K-Means clustering is itself an improvement over standard K-Means applications. SVM is used here in two ways. First, labels resulting from multi-level K-Means clustering are confirmed by SVM. To judge the performance of SVM, two other methods, K-nearest neighbor (KNN) and Linear Discriminant Analysis (LDA) are implemented. The Implementation of Multi-class SVM used one-against-one method and one-against-all method. The results show that SVM outperforms KNN and LDA. The advantage of SVM includes the generalization error and the computing time. Second, different from the first application, SVM is used to label the genes after it is trained by a set of training data obtained from K-Means clustering. This alternative SVM strategy offers an improvement over standard SVM applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
B. Schölkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
C.C. Chang and C.J. Lin. LIBSVM: A Library for Support Vector Machines, 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm.
C.-W. Hsu and C.-J. Lin. A Comparison of Methods for Multi-class Support Vector Machines. IEEE Transactions on Neural Networks, 13: 415–425, 2002.
J. Ma, Y. Zhao, and S. Ahalt. OSU SVM Classifier Matlab Toolbox. Available at http://eewww.eng.ohio-state.edu/~maj/osu_svm/
K.P. Bennett and E.J., Bredensteiner. Multicategory Classification by Support Vector Machines. Computational Optimization and Applications, 12: 53–79, 1999.
Z. Ma, S. Gong, D.L. Tucker, T. Conway, and J.W. Foster. GadE (YhiE) activates glutamate decarboxylase-dependent acid resistance in Escherichia coli K12. Molecular Microbiology, 49: 1309–1320, 2003.
M.P.S. Brown, W.N., Grundy, D. Lin, N. Cristianini, C., Sugnet, M. Ares, and D. Haussler. Knowledge-based Analysis of Microarray Gene Expression Data Using Support Vector Machines. Proceedings of the National Academy of Sciences, 97(1): 262–267, 2000.
P. Baldi and S. Brunak. Bioinformatics: A Machine Learning Approach. MIT Press, Cambridge, MA, 2002.
S. Haykin. Neural Networks: A Comprehensive Foundation. 2nd edition, Prentice-Hall, Upper Saddle River, NJ, 1999.
T. Conway, B. Kraus, D.L. Tucker, D.J. Smalley, A.F. Dorman, and L. McKibben. DNA Array Analysis in a Microsoft Windows Environment. Biotechniques, 32: 110–119, 2002.
H. Tao, C. Bausch, C. Richmond, F.R. Blattner, and T. Conway. Functional Genomics: Expression Analysis of Escherichia coli Growing on Minimal and Rich Media. Journal of Bacteriology, 181: 6425–6440, 1999.
R._O. Duda, P. E. Hart, and D.G. Stork. Pattern Classification. John Wiley & Sons, New York, 2001.
V.N. Vapnik. The Nature of Statistical Learning Theory. John Wiley & Sons, New York, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Santosa, B., Conway, T., Trafalis, T. (2007). A Hybrid Knowledge Based-Clustering Multi-Class SVM Approach for Genes Expression Analysis. In: Pardalos, P.M., Boginski, V.L., Vazacopoulos, A. (eds) Data Mining in Biomedicine. Springer Optimization and Its Applications, vol 7. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69319-4_15
Download citation
DOI: https://doi.org/10.1007/978-0-387-69319-4_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-69318-7
Online ISBN: 978-0-387-69319-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)