Skip to main content
Log in

Statistical analysis of big data: an approach based on support vector machines for classification and regression problems

  • Molecular Biophysics
  • Published:
Biophysics Aims and scope Submit manuscript

Abstract

A new type of learning algorithms with the supervisor for estimating multidimensional functions is considered. These methods based on Support Vector Machines are widely used due to their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data. Support vector machines and related kernel methods are extremely good at solving prediction problems in computational biology. A background about statistical learning theory and kernel feature spaces is given including practical and algorithmic considerations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. V. Dyuk and A. Samoilenko, Data Mining: Educational Course (Piter, SPb., 2001) [in Russian].

    Google Scholar 

  2. V. N. Vapnik, The Nature of Statistical Learning Theory (Springer-Verlag, 2000).

    Book  MATH  Google Scholar 

  3. V. N. Vapnik, Statistical Learning Theory (John Wiley, 1998).

    MATH  Google Scholar 

  4. Y. Jiang, J. Jiang, and P. Capodieci, in Proceedings of the 2nd International Workshop on Computational Intelligence in Security for Information Systems (CISIS’09) (Springer AISC, 2009), vol. 63, p. 61.

    Google Scholar 

  5. A. Patcha and J.-M. Park, Computer Networks 51, 3448 (2007).

    Article  ADS  Google Scholar 

  6. T. Shon and J. Moon, Information Sciences 177, 3799 (2007).

    Article  Google Scholar 

  7. T. Trafalis, I. Huseyin, and M. Richman, International Conference on Computational Science (2003).

    Google Scholar 

  8. T. Trafalis and I. Huseyin, IJCNN 6, 348 (2000).

    Google Scholar 

  9. I. Huseyin and T. Trafalis, J. General Systems 37(6), 677 (2008).

    Article  MATH  Google Scholar 

  10. E. P. Kondratovich, N. I. Zhokhova, I. I. Baskin, et al., Izv. RAN Ser. Khim., no. 4, 641 (2009).

    Google Scholar 

  11. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, J. Machine Learning 46(1–3), 389 (2002).

    Article  MATH  Google Scholar 

  12. S. Mukherjee, P. Tamayo, D. Slonim, et al., AI memo 182. CBCL paper 182. MIT, 2000.

    Google Scholar 

  13. T. Furey, N. Cristianini, N. Duffy, et al., Bioinformatics 16 (10), 906 (2000).

    Google Scholar 

  14. M. Brown, W. Grundy, D. Lin, et al., Proc. Natl. Acad. Sci. USA 97(10), 262 (2000).

    Article  ADS  Google Scholar 

  15. P. Bradley and O. Mangasarian, in Proc. 13th International Conference on Machine Learning (1998), p. 82.

    Google Scholar 

  16. G. Lanckriet, T. D. Bie, N. Cristianini, et al., Bioinformatics 20, 2626 (2004).

    Article  Google Scholar 

  17. K. R. Muller, S. Mika, G. Rätsch, et al., IEEE Transactions on Neural Networks 12(2), 181 (2001).

    Article  Google Scholar 

  18. V. Kecman, Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models (MIT Press, 2001).

    Google Scholar 

  19. N. Aronszajn, Trans. Amer. Math. Soc. 68, 337 (1950).

    Article  MathSciNet  MATH  Google Scholar 

  20. C. Leslie, E. Eskin, and W. Noble, The Spectrum Kernel: A string kernel for SVM protein classification (2002).

    Google Scholar 

  21. A. Ben-Hur, C. Soon Ong, S. Sonnenburg, et al., PLoS Computational Biology 4(10), 1 (2008).

    Article  Google Scholar 

  22. B. Schölkopf, A. Smola, R. Williamson, and P. Bartlett, Neural Computation 12, 1207 (2000).

    Article  Google Scholar 

  23. M. Law and J. Kwok, Machine Learning: ECML 2001, Proceedings, Lecture Notes in Artificial Intelligence 2167, 312 (2001).

    Google Scholar 

  24. C.-C. Chang and C.-J. Lin, Neural Computation 13(9), 2119 (2001).

    Article  MATH  Google Scholar 

  25. P.-H. Chen, C.-J. Chih-jen Lin, and B. Schölkopf, OAI-PMH server at cs1.ist.psu.edu (2003).

    Google Scholar 

  26. A. Chalimourda, B. Schölkopf, and A. Smola, Neural Networks 17(1), 127 (2004).

    Article  MATH  Google Scholar 

  27. T. Joachims, in Advanced Kernel Methods — Support Vector Learning (MIT Press, 1998), p. 41.

    Google Scholar 

  28. R. Collobert and S. Bengio, J. MachineLearning Res. MIT Press 1, 143 (2001).

    MathSciNet  Google Scholar 

  29. J. Platt, Advances in Kernel Methods. Support Vector Learning (MIT Press, 1998), p. 41.

    Google Scholar 

  30. J. Platt, Advances in Neural Information Processing Systems 11 (MIT Press, 1999), p. 557.

    Google Scholar 

  31. S. Shevade, S. Keerthi, C. Bhattacharyya, and K. Murthy, IEEE Transactions on Neural Networks 11(5), (2000).

    Google Scholar 

  32. S. Keerthi and E. Gilbert, Machine Learning 46(1–3), 351 (2002).

    Article  MATH  Google Scholar 

  33. G. Flake and S. Lawrence, Machine Learning 46(1–3), 271 (2002).

    Article  MATH  Google Scholar 

  34. P.-H. Chen, R.-E. Fan, and C.-J. Lin, Lecture Notes in Artifical Intelligence 3734, 45 (2005).

    MathSciNet  Google Scholar 

  35. H. Zhang, X. Wang, C. Zhang, and X. Xu, ICNC 1, 221 (2005).

    MATH  Google Scholar 

  36. O. Mangasarian and D. Musicant, IEEE Transactions on Neural Networks 10(5), 1032 (1999).

    Article  Google Scholar 

  37. O. Mangasarian and D. Musicant, OAI-PMH server at cs1.ist.psu.edu (1999).

    Google Scholar 

  38. Y. Quan, J. Yang, L.-X. Yao, and C.-Z. Ye. J. Software 15 (2), 200 (2004).

    Google Scholar 

  39. G. Cauwenberghs and T. Tomaso Poggio, Advances in Neural Information Processing Systems. MIT Press 13, 409 (2001).

    Google Scholar 

  40. P. Laskov, C. Gehl, S. Krüger, and K.-R. Müller, OAI-PMH server at eprints.pascal-network.org (2005).

    Google Scholar 

  41. M. Martin, ECML, p. 282 (2002).

    Google Scholar 

  42. G. Cawley and N. Talbot, ICANN, p. 681 (2002).

    Google Scholar 

  43. G. Cawley and N. Talbot, Neurocomputing 48, 1025 (2002).

    Article  MATH  Google Scholar 

  44. Y. Engel, S. Mannor, and R. Meir, ECML, p. 84 (2002).

    Google Scholar 

  45. J. Jyrki Kivinen, S. Smola, and R. Williamson, IEEE Transactions on Signal Processing 52(8), 2165 (2004).

    Article  MathSciNet  ADS  Google Scholar 

  46. S. Vishwanathan, N. Schraudolph, and A. Smola, J. Machine Learning Res. 6, 1 (2005).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. O. Kadyrova.

Additional information

Original Russian Text © N.O. Kadyrova, L.V. Pavlova, 2014, published in Biofizika, 2014, Vol. 59, No. 3, pp. 446–457.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kadyrova, N.O., Pavlova, L.V. Statistical analysis of big data: an approach based on support vector machines for classification and regression problems. BIOPHYSICS 59, 364–373 (2014). https://doi.org/10.1134/S0006350914030105

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0006350914030105

Keywords

Navigation