Model complexity control and statistical learning theory

Cherkassky, Vladimir

doi:10.1023/A:1015007927558

Model complexity control and statistical learning theory

Published: March 2002

Volume 1, pages 109–133, (2002)
Cite this article

Natural Computing Aims and scope Submit manuscript

Vladimir Cherkassky¹

515 Accesses
35 Citations
Explore all metrics

Abstract

We discuss the problem of modelcomplexity control also known as modelselection. This problem frequently arises inthe context of predictive learning and adaptiveestimation of dependencies from finite data.First we review the problem of predictivelearning as it relates to model complexitycontrol. Then we discuss several issuesimportant for practical implementation ofcomplexity control, using the frameworkprovided by Statistical Learning Theory (orVapnik-Chervonenkis theory). Finally, we showpractical applications of Vapnik-Chervonenkis(VC) generalization bounds for model complexitycontrol. Empirical comparisons of differentmethods for complexity control suggestpractical advantages of using VC-based modelselection in settings where VC generalizationbounds can be rigorously applied. We also arguethat VC-theory provides methodologicalframework for complexity control even when itstechnical results can not be directly applied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Article Open access 08 March 2021

A review of predictive uncertainty estimation with machine learning

Article Open access 18 March 2024

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

References

Akaike H (1970) Statistical predictor information. Ann. Inst. of Stat. Math. 22: 203-217
Google Scholar
Anthony M and Bartlett P (2000) Neural Network Learning: Theoretical Foundations. Cambridge University Press
Barron A (1993) Universal approximation bounds for superposition of a sigmoid function. IEEE Trans. Info. Theory 39(3): 930-945
Google Scholar
Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. on IT 44(2): 525-536
Google Scholar
Baum E and Haussler D (1989) What size net gives valid generalization? Neural Comp. 1: 151-160
Google Scholar
Bruce A, Donoho D and Gao H-Y (1996) Wavelet analysis. IEEE Spect. (Oct.), 26-35
Bernardo JM and Smith ARM (1994) Bayesian Theory. Wiley, New York
Google Scholar
Bishop C (1995) Neural Networks for Pattern Recognition. Oxford University Press
Blumer A, Ehrenfeucht A, Haussler D and Warmuth MK (1989) Learnability and the VC-dimension. Journal of the ACM 36: 926-965
Google Scholar
Cherkassky V, Shao X, Mulier F and Vapnik V (1999) Model selection for regression using VC generalization bounds. IEEE Trans on Neural Networks 10(5): 1075-1089
Google Scholar
Cherkassky V and Mulier F (1998) Learning from Data: Concepts, Theory and Methods. Wiley
Cherkassky V and Shao X (1998) Model selection for wavelet-based signal estimation, in Proc. IJCNN-98
Cherkassky V and Shao X (2001) Signal estimation and denoising using VC-theory. Neural Networks. Pergamon 14: 37-52
Google Scholar
Cherkassky V and Kilts S (2001) Myopotential denoising of ECG signals using wavelet thresholding methods. Neural Networks. Pergamon 14: 1129-1137
Google Scholar
Craven P and Wahba G (1979) Smoothing noisy data with spline functions. Numerische Math. 31: 377-403
Google Scholar
Dempster AP, Schatzoff M and Wermuth N (1977) A Simulation study of alternatives to ordinary least squares. JASA 72(357): 77-106
Google Scholar
Donoho DL (1995) Denoising by soft thresholding. IEEE Trans on IT 41(3): 613-627
Google Scholar
Donoho DL (1992) Wavelet thresholding and W.V.D.: a 1-minute tour. Int'l Conf. OnWavelets and Applications. Toulouse, France
Donoho DL and Johnstone IM (1994) Ideal denoising in an orthonormal basis chosen from a library of bases. Technical Report 461, Department of Statistics. Stanford University
Donoho DL, VetterlyM, DeVore RA and Daubechies I (1998) Data compression and harmonic analysis. IEEE Trans on Info Theory, Oct. 1998, pp. 2435-2476
Duda RO, Hart PE and Stork DG (2001) Pattern Classification. J. Wiley, NY
Google Scholar
Friedman JH (1994) An overview of predictive learning and function approximation. In: Cherkassky V, Friedman JH and Wechsler H (eds) From Statistics to Neural Networks: Theory and Pattern Recognition Applications. NATO ASI Series F, v.136, Springer
Girosi F (1994) Regularization theory, radial basis functions and networks. In: Cherkassky V, Friedman JH and Wechsler H (eds) From Statistics to Neural Networks: Theory and Pattern Recognition Applications. NATO ASI Series F, v.136, Springer
Hardle W, Hall P and Marron JS (1988) How far are automatically chosen regression smoothing parameters from their optimum? JASA 83: 86-95
Google Scholar
Hastie T and Tibshirani R (1990) Generalized Additive Models. Chapman and Hall
Hastie T, Tibshirani R and Friedman J (2001) The Elements of Statistical Learning. Springer
Koiran P and Sontag E (1997) Neural networks with quadratic VC-dimension. Journal of Computer and System Sciences 54(1): 190-198
Google Scholar
LeeW, Bartlett P and Williamson R (1996) Efficient agnostic learningin neural networks with bounded fan-in. IEEE Trans. IT 42(6): 21180-2132
Google Scholar
Moody JE (1991) Note on generalization, regularization and architecture selection in nonlinear learning systems. First IEEE-SP Workshop on Neural Networks in SP. IEEE Comp. Soc. Press, pp. 1-10
Murata N, Yoshisawa S and Amari S (1994) Neural network information criterion determining the number of hidden units for artificial neural network models. IEEE Trans. on NNs 5: 865-872
Google Scholar
Ripley BD (1996) Pattern Recognition and Neural Networks. Cambridge University
Press Shao X, Cherkassky V and Li W (2000) Measuring the VC-dimension using optimized experimental design. Neural Computation. MIT Press, 12(8), 1969-1986
Google Scholar
Shao J and Cherkassky V (2001) Improved signal denoising using VC-theory, in Proc. IJCNN 2001, Washington D.C.
Shibata R (1981) An optimal selection of regression variables. Biometrika 68: 45-54
Google Scholar
Shwartz G (1978) Estimating the dimension of a model. Ann. Stat 6: 461-464
Google Scholar
Vapnik V (1982) Estimation of Dependencies Based on Empirical Data. Springer, NY
Vapnik V (1995) The Nature of Statistical Learning Theory. Springer
Vapnik V, Levin E and Le Cun Y (1994) Measuring the VC-Dimension of a Learning Machine. Neural Computation. MIT Press, 6, 851-876
Google Scholar
Vapnik V (1998) Statistical Learning Theory. Wiley, New York
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Vladimir Cherkassky

Authors

Vladimir Cherkassky
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cherkassky, V. Model complexity control and statistical learning theory. Natural Computing 1, 109–133 (2002). https://doi.org/10.1023/A:1015007927558

Download citation

Issue Date: March 2002
DOI: https://doi.org/10.1023/A:1015007927558

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model complexity control and statistical learning theory

Abstract

Access this article

Similar content being viewed by others

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

A review of predictive uncertainty estimation with machine learning

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Model complexity control and statistical learning theory

Abstract

Access this article

Similar content being viewed by others

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

A review of predictive uncertainty estimation with machine learning

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation