Abstract
Machine learning plays a central role in the interpretation of many datasets generated within the biomedical sciences. In this chapter we focus on two core topics within machine learning, supervised and unsupervised learning, and illustrate their application to interpreting these datasets. For supervised learning, we focus on support vector machines (SVMs), which is a subtopic of kernel-based learning. Kernels can be used to encode many different types of data, from continuous and discrete data through to graph and sequence data. Given the different types of data encountered within bioinformatics, they are therefore a method of choice within this context. With unsupervised learning we are interested in the discovery of structure within data. We start by considering hierarchical cluster analysis (HCA), given its common usage in this context. We then point out the advantages of Bayesian approaches to unsupervised learning, such as a principled approach to model selection (how many clusters are present in the data) through to confidence measures for assignment of datapoints to clusters. We outline five case studies illustrating these methods. For supervised learning we consider prediction of disease progression in cancer and protein fold prediction. For unsupervised learning we apply HCA to a small colon cancer dataset and then illustrate the use of Bayesian unsupervised learning applied to breast and lung cancer datasets. Finally we consider network inference, which can be approached as an unsupervised or supervised learning task depending on the data available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- DAG:
-
directed acyclic graph
- DNA:
-
deoxyribonucleic acid
- EGF:
-
epidermal growth factor
- EGFR:
-
epidermal growth factor receptor
- EM:
-
expectation-maximization
- ERK:
-
extracellular signal-regulated kinase
- GA:
-
genetic algorithm
- HCA:
-
hierarchical cluster analysis
- KL:
-
Kullback–Leibler
- LIBSVM:
-
library for support vector machines
- LOO:
-
leave-one-out
- LPD:
-
latent process decomposition
- MAP:
-
maximum a posteriori
- MCMC:
-
Markov chain Monte Carlo
- MKL:
-
multiple kernel learning
- ML:
-
maximum likelihood
- MRI:
-
magnetic resonance imaging
- ODE:
-
ordinary differential equation
- PSD:
-
positive semidefinite
- QP:
-
quadratic programming
- RNA:
-
ribonucleic acid
- SDP:
-
semidefinite programming
- SVM:
-
support vector machine
- TG:
-
triacylglyceride
- TSA:
-
test set accuracy
- cDNA:
-
complementary DNA
- log:
-
logistic regression
References
L. Bottou, O. Chapelle, D. DeCoste, J. Weston: Large-Scale Kernel Machines, Neural Information Processing Series (MIT Press, Cambridge 2007)
J. Platt, N. Cristianini, J. Shawe-Taylor: Large margin DAGS for multiclass classification, Adv. Neural Inform. Proces. Syst. 12, 547–553 (2000)
Y. Lee, Y. Lin, G. Wahba: Multicategory support vector machines, Technical Report 1043 (Univ. Madison, Wisconsin 2001)
T. Hastie, R. Tibshirani: Classification by pairwise coupling, Ann. Stat. 26, 451–471 (1998)
T.G. Dietterich, G. Bakiri: Solving multiclass learning problems via error-correcting output codes, J. Artif. Intell. 2, 263–286 (1995)
E.L. Allwein, R.E. Schapire, Y. Singer: Reducing multiclass to binary: A unifying approach for margin classifiers, J. Mach. Learn. Res. 1, 133–141 (2000)
K.-B. Duan, S.S. Keerthi: Which is the best multiclass SVM Method? An empirical study, Proc. 6th Int. Workshop Multiple Classifier Syst. (2005), Vol. 3541 (Springer, Berlin, Heidelberg 2006) pp. 278–285
C. Cortes, V. Vapnik: Support vector networks, Mach. Learn. 20, 273–297 (1995)
K. Veropoulos, C. Campbell, N. Cristianini: Controlling the sensitivity of support vector machines, Proc. Int. Joint Conf. Artif. Intell. (IJCAI) (1999)
J. Platt: Probabilistic outputs for support vector machines and comparison to regularised likelihood methods, Adv. Large Margin Classifiers (MIT Press, Cambridge 1999) pp. 61–74
A.E. Hoerl, R. Kennard: Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12, 55–67 (1970)
C. Saunders, A. Gammermann, V. Vovk: Ridge regression learning algorithm in dual variables, Proc. Fifteenth Int. Conf. Mach. Learn. (ICML), ed. by J. Shavlik (Morgan Kaufmann, 1998)
V. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)
V. Vapnik: Statistical Learning Theory (Wiley, New York 1998)
B. Schölkopf, A.J. Smola: Learning with Kernels (MIT Press, Cambridge 2002)
J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins: Support vector density estimation, Advances in Kernel Methods: Support Vector Machines (MIT Press, Cambridge 1998) pp. 293–306
A.J. Smola, B. Schölkopf: A tutorial on support vector regression, Stat. Comput. 14, 199–222 (2004)
R.D. Williams, S.N. Hing, B.T. Greer, C.C. Whiteford, J.S. Wei, R. Natrajan, A. Kelsey, S. Rogers, C. Campbell, K. Pritchard-Jones, J. Khan: Prognostic classification of relapsing favourable histology Wilms tumour using cDNA microarray expression profiling and support vector machines, Genes Chromosom. Cancer 41, 65–79 (2004)
I. Guyon, A. Elisseeff: An Introduction to Variable and Feature Selection, J. Mach. Learn. Res. 3, 1157–1182 (2003)
T. Graepel, R. Herbrich, P. Bollmann-Sdorra, K. Obermayer: Classification on pairwise proximity data, Adv. Neural Inform. Proces. Syst. 11, 438–444 (1998)
E. Pekalska, P. Paclik, R.P.W. Duin: A generalized kernel approach to dissimilarity based classification, J. Mach. Learn. Res. 2, 175–211 (2002)
V. Roth, J. Laub, M. Kawanabe, J.M. Buhmann: Optimal cluster preserving embedding of nonmetric proximity data, IEEE Trans. Pattern Analys. Mach. Intell. 25, 1540–1551 (2003)
R. Luss, A. dʼAspremont: Support vector machine classification with indefinite kernels, Adv. Neural Inform. Proces. Syst. 20, 953–960 (2008)
Y. Ying, C. Campbell, M. Girolami: Analysis of SVM with Indefinite Kernels, Adv. Neural Informat. Proces. Syst. 22, 2205–2213 (2009)
N. Cristianini, C. Campbell, J. Shawe-Taylor: Dynamically adapting kernels in support vector machines, Adv. Neural Inform. Proces. Syst. 11, 204–210 (1999)
T. Joachims: Estimating the generalization performance of an SVM efficiently, Proc. 17th Int. Conf. Mach. Learn. (Morgan Kaufmann, 2000) pp. 431–438
O. Chapelle, V. Vapnik: Model selection for support vector machines, Adv. Neural Inform. Proces. Syst. 12, 673–680 (2000)
V. Vapnik, O. Chapelle: Bounds on error expectation for support vector machines, Neural Comput. 12, 2013–2036 (2000)
P. Sollich: Bayesian methods for support vector machines: Evidence and predictive class probabilities, Mach. Learn. 46, 21–52 (2002)
J. Shawe-Taylor, N. Cristianini: Kernel Methods for Pattern Analysis (Cambridge Univ. Press, Cambridge 2004)
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, C. Watkins: Text classification using string kernels, J. Mach. Learn. Res. 2, 419–444 (2002)
C. Leslie, R. Kuang: Fast kernels for inexact string matching, 16th Ann. Conf. Learning Theory 7th Kernel Workshop, Vol. 2777 (Springer, Berlin, Heidelberg 2003) pp. 114–128
S. Vishwanathan, A. Smola: Fast Kernels for String and Tree Matching, Adv. Neural Inform. Proces. Syst. 15, 569–576 (2003)
I.R. Kondor, J.D. Lafferty: Diffusion kernels on graphs and other discrete structures, Proc. Int. Conf. Mach. Learn. (Morgan Kaufmann, San Francisco, 2002) pp. 315–322
A.J. Smola, I.R. Kondor: Kernels and regularization on graphs, Conf. Learning Theory (COLT), Vol. 2777 (Springer, Berlin, Heidelberg 2003) pp. 144–158
T. Gartner, P. Flach, S. Wrobel: On graph kernels: Hardness results and efficient alternatives, Proc. Annu. Conf. Computational Learning Theory (COLT) (Springer, Berlin, Heidelberg 2003) pp. 129–143
S.V.N. Vishwanathan, K.M. Borgwardt, I.R. Kondor, N.N. Schraudolph: Graph Kernels, J. Mach. Learn. Res. 9, 1–41 (2008)
G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, M.I. Jordan: Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res. 5, 27–72 (2004)
F. Bach, G.R.G. Lanckriet, M.I. Jordan: Multiple kernel learning, conic duality and the SMO algorithm, Proc. 21st Int. Conf. Machine Learning (ICML) (Morgan Kaufmann, New York 1998)
S. Sonnenburg, G. Rätsch, C. Schäfer, B. Schölkopf: Large scale multiple kernel learning, J. Mach. Learn. Res. 7, 1531–1565 (2006)
A. Rakotomamonjy, F. Bach, S. Canu, Y. Grandvalet: SimpleMKL, J. Mach. Learn. Res. 9, 2491–2521 (2008)
Z. Xu, R. Jin, I. King, M.R. Lyu: An extended level method for multiple kernel learning, Adv. Neural Inform. Proces. Syst. 22, 1825–1832 (2008)
Y. Ying, K. Huang, C. Campbell: Enhanced protein fold recognition through a novel data integration approach, BMC Bioinf. 10, 267–285 (2009)
T. Damoulas, M. Girolami: Probabilistic multi-class multi-kernel learning: On protein fold recognition and remote homology detection, Bioinformatics 24, 1264–1270 (2008)
G.R.G. Lanckriet, T. De Bie, N. Cristianini, M.I. Jordan, W.S. Noble: A statistical framework for genomic data fusion, Bioinformatics 20, 2626–2635 (2004)
M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.-R. Müller, A. Zien: Efficient and accurate lp-norm multiple kernel learning, Adv. Neural Inform. Proces. Syst. 22, 997–1005 (2009)
U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)
B. Everitt: Cluster Analysis (Arnold, New York 1993)
L. Kaufman, P.J. Rousseeuw: Finding Groups in Data (Wiley, New York 2005)
R.O. Duda, P.E. Hart, D.G. Stork: Pattern classification (Wiley, New York 2001)
Y.W. Teh, D. Newman, M. Welling: A collapsed variational Bayesian inference algorithm for latent dirichlet allocation, Adv. Neural Inform. Proces. Syst. 19, 1353–1360 (2006)
Y. Ying, P. Li, C. Campbell: A marginalized variational Bayesian approach to the analysis of array data, BMC Proc. 2(4), S7 (2008)
P. Li, Y. Ying, C. Campbell: A variational approach to semi-supervised clustering, Proc. ESANN2009 (2009) pp. 11–16
D.M. Blei, M.I. Jordan: Modeling annotated data, Proc. 26th Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (ACM Press, New York 2003) pp. 127–134
P. Agius, Y. Ying, C. Campbell: Bayesian Unsupervised Learning with Multiple Data Types, Stat. Appl. Genet. Molec. Biol. 8, 27 (2009)
S. Rogers, M. Girolami, C. Campbell, R. Breitling: The latent process decomposition of cdna microarray datasets, IEEE/ACM Trans. Comput. Biol. Bioinforma. 2, 143–156 (2005)
C. Blenkiron, L.D. Goldstein, N.P. Thorne, I. Spiteri, S.F. Chin, M.J. Dunning, N.L. Barbosa-Morais, A.E. Teschendorff, A.R. Green, I.O. Ellis, S. Tavaré, C. Caldas, E.A. Miska: MicroRNA expression profiling of human breast cancer identifies new markers of tumour subtype, Genome Biol. 8(10), R214–1–R214–16 (2007)
L. Carrivick, S. Rogers, J. Clark, C. Campbell, M. Girolami, C. Cooper: Identification of prognostic signatures in breast cancer microarray data using Bayesian techniques, J. R. Soc. Interf. 3, 367–381 (2006)
E. Garber, O.G. Troyanskaya, K. Schluens, S. Petersen, Z. Thaesler, M. Pacyna-Gengelbach, M. van de Rijn, G.D. Rosen, C.M. Perou, R.I. Whyte, R.B. Altman, P.O. Brown, D. Botstein, I. Petersen: Diversity of gene expression in adenocarcinoma of the lung, Proc. Natl. Acad. Sci. USA 98, 13784–13789 (2001)
C. Andrieu, N. De Freitas, A. Doucet, M.I. Jordan: An introduction to MCMC for machine learning, Mach. Learn. 50, 5–43 (2003)
W.R. Gilks, S. Richardson, D.J. Spiegelhalter: Markov Chain Monte Carlo in Practice (Chapman Hall/CRC, New York 1996)
C.P. Robert, G. Casella: Monte Carlo Statistical Methods (Springer, Berlin, Heidelberg 2004)
S. Chib, E. Greenberg: Understanding the Metropolis Hastings Algorithm, Am. Stat. 49(4), 327–335 (1995)
B.A. Berg: Markov Chain Monte Carlo Simulations and Their Statistical Analysis (World Scientific, Singapore 2004)
W.M. Bolstad: Understanding Computational Bayesian Statistics (Wiley, New York 2010)
K. Bleakley, G. Biau, J.-P. Vert: Supervised reconstruction of biological networks with local models, Bioinformatics 23, i57–i65 (2007)
B. Calderhead, M. Girolami: Estimating Bayes factors via thermodynamic integration and population MCMC, Comput. Stat. Data Anal. 53, 4028–4045 (2009)
T.R. Xu, V. Vyshemirsky, A. Gormand, A. von Kriegsheim, M. Girolami, G.S. Baillie, D. Ketley, A.J. Dunlop, G. Milligan, M.D. Houslay, W. Kolch: Inferring signaling pathway topologies from multiple perturbation measurements of specific biochemical species, Sci. Signal. 3(113), ra20:1–ra20:10 (2010)
Cancer Genome Atlas: Available at http://cancergenome.nih.gov
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag
About this chapter
Cite this chapter
Campbell, C. (2014). Machine Learning Methodology in Bioinformatics. In: Kasabov, N. (eds) Springer Handbook of Bio-/Neuroinformatics. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30574-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-30574-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30573-3
Online ISBN: 978-3-642-30574-0
eBook Packages: EngineeringEngineering (R0)