Skip to main content
Log in

Model complexity control and statistical learning theory

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

We discuss the problem of modelcomplexity control also known as modelselection. This problem frequently arises inthe context of predictive learning and adaptiveestimation of dependencies from finite data.First we review the problem of predictivelearning as it relates to model complexitycontrol. Then we discuss several issuesimportant for practical implementation ofcomplexity control, using the frameworkprovided by Statistical Learning Theory (orVapnik-Chervonenkis theory). Finally, we showpractical applications of Vapnik-Chervonenkis(VC) generalization bounds for model complexitycontrol. Empirical comparisons of differentmethods for complexity control suggestpractical advantages of using VC-based modelselection in settings where VC generalizationbounds can be rigorously applied. We also arguethat VC-theory provides methodologicalframework for complexity control even when itstechnical results can not be directly applied.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike H (1970) Statistical predictor information. Ann. Inst. of Stat. Math. 22: 203-217

    Google Scholar 

  • Anthony M and Bartlett P (2000) Neural Network Learning: Theoretical Foundations. Cambridge University Press

  • Barron A (1993) Universal approximation bounds for superposition of a sigmoid function. IEEE Trans. Info. Theory 39(3): 930-945

    Google Scholar 

  • Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. on IT 44(2): 525-536

    Google Scholar 

  • Baum E and Haussler D (1989) What size net gives valid generalization? Neural Comp. 1: 151-160

    Google Scholar 

  • Bruce A, Donoho D and Gao H-Y (1996) Wavelet analysis. IEEE Spect. (Oct.), 26-35

  • Bernardo JM and Smith ARM (1994) Bayesian Theory. Wiley, New York

    Google Scholar 

  • Bishop C (1995) Neural Networks for Pattern Recognition. Oxford University Press

  • Blumer A, Ehrenfeucht A, Haussler D and Warmuth MK (1989) Learnability and the VC-dimension. Journal of the ACM 36: 926-965

    Google Scholar 

  • Cherkassky V, Shao X, Mulier F and Vapnik V (1999) Model selection for regression using VC generalization bounds. IEEE Trans on Neural Networks 10(5): 1075-1089

    Google Scholar 

  • Cherkassky V and Mulier F (1998) Learning from Data: Concepts, Theory and Methods. Wiley

  • Cherkassky V and Shao X (1998) Model selection for wavelet-based signal estimation, in Proc. IJCNN-98

  • Cherkassky V and Shao X (2001) Signal estimation and denoising using VC-theory. Neural Networks. Pergamon 14: 37-52

    Google Scholar 

  • Cherkassky V and Kilts S (2001) Myopotential denoising of ECG signals using wavelet thresholding methods. Neural Networks. Pergamon 14: 1129-1137

    Google Scholar 

  • Craven P and Wahba G (1979) Smoothing noisy data with spline functions. Numerische Math. 31: 377-403

    Google Scholar 

  • Dempster AP, Schatzoff M and Wermuth N (1977) A Simulation study of alternatives to ordinary least squares. JASA 72(357): 77-106

    Google Scholar 

  • Donoho DL (1995) Denoising by soft thresholding. IEEE Trans on IT 41(3): 613-627

    Google Scholar 

  • Donoho DL (1992) Wavelet thresholding and W.V.D.: a 1-minute tour. Int'l Conf. OnWavelets and Applications. Toulouse, France

  • Donoho DL and Johnstone IM (1994) Ideal denoising in an orthonormal basis chosen from a library of bases. Technical Report 461, Department of Statistics. Stanford University

  • Donoho DL, VetterlyM, DeVore RA and Daubechies I (1998) Data compression and harmonic analysis. IEEE Trans on Info Theory, Oct. 1998, pp. 2435-2476

  • Duda RO, Hart PE and Stork DG (2001) Pattern Classification. J. Wiley, NY

    Google Scholar 

  • Friedman JH (1994) An overview of predictive learning and function approximation. In: Cherkassky V, Friedman JH and Wechsler H (eds) From Statistics to Neural Networks: Theory and Pattern Recognition Applications. NATO ASI Series F, v.136, Springer

  • Girosi F (1994) Regularization theory, radial basis functions and networks. In: Cherkassky V, Friedman JH and Wechsler H (eds) From Statistics to Neural Networks: Theory and Pattern Recognition Applications. NATO ASI Series F, v.136, Springer

  • Hardle W, Hall P and Marron JS (1988) How far are automatically chosen regression smoothing parameters from their optimum? JASA 83: 86-95

    Google Scholar 

  • Hastie T and Tibshirani R (1990) Generalized Additive Models. Chapman and Hall

  • Hastie T, Tibshirani R and Friedman J (2001) The Elements of Statistical Learning. Springer

  • Koiran P and Sontag E (1997) Neural networks with quadratic VC-dimension. Journal of Computer and System Sciences 54(1): 190-198

    Google Scholar 

  • LeeW, Bartlett P and Williamson R (1996) Efficient agnostic learningin neural networks with bounded fan-in. IEEE Trans. IT 42(6): 21180-2132

    Google Scholar 

  • Moody JE (1991) Note on generalization, regularization and architecture selection in nonlinear learning systems. First IEEE-SP Workshop on Neural Networks in SP. IEEE Comp. Soc. Press, pp. 1-10

  • Murata N, Yoshisawa S and Amari S (1994) Neural network information criterion determining the number of hidden units for artificial neural network models. IEEE Trans. on NNs 5: 865-872

    Google Scholar 

  • Ripley BD (1996) Pattern Recognition and Neural Networks. Cambridge University

  • Press Shao X, Cherkassky V and Li W (2000) Measuring the VC-dimension using optimized experimental design. Neural Computation. MIT Press, 12(8), 1969-1986

    Google Scholar 

  • Shao J and Cherkassky V (2001) Improved signal denoising using VC-theory, in Proc. IJCNN 2001, Washington D.C.

  • Shibata R (1981) An optimal selection of regression variables. Biometrika 68: 45-54

    Google Scholar 

  • Shwartz G (1978) Estimating the dimension of a model. Ann. Stat 6: 461-464

    Google Scholar 

  • Vapnik V (1982) Estimation of Dependencies Based on Empirical Data. Springer, NY

  • Vapnik V (1995) The Nature of Statistical Learning Theory. Springer

  • Vapnik V, Levin E and Le Cun Y (1994) Measuring the VC-Dimension of a Learning Machine. Neural Computation. MIT Press, 6, 851-876

    Google Scholar 

  • Vapnik V (1998) Statistical Learning Theory. Wiley, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cherkassky, V. Model complexity control and statistical learning theory. Natural Computing 1, 109–133 (2002). https://doi.org/10.1023/A:1015007927558

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1015007927558

Navigation