Abstract
A new type of learning algorithms with the supervisor for estimating multidimensional functions is considered. These methods based on Support Vector Machines are widely used due to their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data. Support vector machines and related kernel methods are extremely good at solving prediction problems in computational biology. A background about statistical learning theory and kernel feature spaces is given including practical and algorithmic considerations.
Similar content being viewed by others
References
V. Dyuk and A. Samoilenko, Data Mining: Educational Course (Piter, SPb., 2001) [in Russian].
V. N. Vapnik, The Nature of Statistical Learning Theory (Springer-Verlag, 2000).
V. N. Vapnik, Statistical Learning Theory (John Wiley, 1998).
Y. Jiang, J. Jiang, and P. Capodieci, in Proceedings of the 2nd International Workshop on Computational Intelligence in Security for Information Systems (CISIS’09) (Springer AISC, 2009), vol. 63, p. 61.
A. Patcha and J.-M. Park, Computer Networks 51, 3448 (2007).
T. Shon and J. Moon, Information Sciences 177, 3799 (2007).
T. Trafalis, I. Huseyin, and M. Richman, International Conference on Computational Science (2003).
T. Trafalis and I. Huseyin, IJCNN 6, 348 (2000).
I. Huseyin and T. Trafalis, J. General Systems 37(6), 677 (2008).
E. P. Kondratovich, N. I. Zhokhova, I. I. Baskin, et al., Izv. RAN Ser. Khim., no. 4, 641 (2009).
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, J. Machine Learning 46(1–3), 389 (2002).
S. Mukherjee, P. Tamayo, D. Slonim, et al., AI memo 182. CBCL paper 182. MIT, 2000.
T. Furey, N. Cristianini, N. Duffy, et al., Bioinformatics 16 (10), 906 (2000).
M. Brown, W. Grundy, D. Lin, et al., Proc. Natl. Acad. Sci. USA 97(10), 262 (2000).
P. Bradley and O. Mangasarian, in Proc. 13th International Conference on Machine Learning (1998), p. 82.
G. Lanckriet, T. D. Bie, N. Cristianini, et al., Bioinformatics 20, 2626 (2004).
K. R. Muller, S. Mika, G. Rätsch, et al., IEEE Transactions on Neural Networks 12(2), 181 (2001).
V. Kecman, Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models (MIT Press, 2001).
N. Aronszajn, Trans. Amer. Math. Soc. 68, 337 (1950).
C. Leslie, E. Eskin, and W. Noble, The Spectrum Kernel: A string kernel for SVM protein classification (2002).
A. Ben-Hur, C. Soon Ong, S. Sonnenburg, et al., PLoS Computational Biology 4(10), 1 (2008).
B. Schölkopf, A. Smola, R. Williamson, and P. Bartlett, Neural Computation 12, 1207 (2000).
M. Law and J. Kwok, Machine Learning: ECML 2001, Proceedings, Lecture Notes in Artificial Intelligence 2167, 312 (2001).
C.-C. Chang and C.-J. Lin, Neural Computation 13(9), 2119 (2001).
P.-H. Chen, C.-J. Chih-jen Lin, and B. Schölkopf, OAI-PMH server at cs1.ist.psu.edu (2003).
A. Chalimourda, B. Schölkopf, and A. Smola, Neural Networks 17(1), 127 (2004).
T. Joachims, in Advanced Kernel Methods — Support Vector Learning (MIT Press, 1998), p. 41.
R. Collobert and S. Bengio, J. MachineLearning Res. MIT Press 1, 143 (2001).
J. Platt, Advances in Kernel Methods. Support Vector Learning (MIT Press, 1998), p. 41.
J. Platt, Advances in Neural Information Processing Systems 11 (MIT Press, 1999), p. 557.
S. Shevade, S. Keerthi, C. Bhattacharyya, and K. Murthy, IEEE Transactions on Neural Networks 11(5), (2000).
S. Keerthi and E. Gilbert, Machine Learning 46(1–3), 351 (2002).
G. Flake and S. Lawrence, Machine Learning 46(1–3), 271 (2002).
P.-H. Chen, R.-E. Fan, and C.-J. Lin, Lecture Notes in Artifical Intelligence 3734, 45 (2005).
H. Zhang, X. Wang, C. Zhang, and X. Xu, ICNC 1, 221 (2005).
O. Mangasarian and D. Musicant, IEEE Transactions on Neural Networks 10(5), 1032 (1999).
O. Mangasarian and D. Musicant, OAI-PMH server at cs1.ist.psu.edu (1999).
Y. Quan, J. Yang, L.-X. Yao, and C.-Z. Ye. J. Software 15 (2), 200 (2004).
G. Cauwenberghs and T. Tomaso Poggio, Advances in Neural Information Processing Systems. MIT Press 13, 409 (2001).
P. Laskov, C. Gehl, S. Krüger, and K.-R. Müller, OAI-PMH server at eprints.pascal-network.org (2005).
M. Martin, ECML, p. 282 (2002).
G. Cawley and N. Talbot, ICANN, p. 681 (2002).
G. Cawley and N. Talbot, Neurocomputing 48, 1025 (2002).
Y. Engel, S. Mannor, and R. Meir, ECML, p. 84 (2002).
J. Jyrki Kivinen, S. Smola, and R. Williamson, IEEE Transactions on Signal Processing 52(8), 2165 (2004).
S. Vishwanathan, N. Schraudolph, and A. Smola, J. Machine Learning Res. 6, 1 (2005).
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © N.O. Kadyrova, L.V. Pavlova, 2014, published in Biofizika, 2014, Vol. 59, No. 3, pp. 446–457.
Rights and permissions
About this article
Cite this article
Kadyrova, N.O., Pavlova, L.V. Statistical analysis of big data: an approach based on support vector machines for classification and regression problems. BIOPHYSICS 59, 364–373 (2014). https://doi.org/10.1134/S0006350914030105
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0006350914030105