Abstract
We discuss the problem of modelcomplexity control also known as modelselection. This problem frequently arises inthe context of predictive learning and adaptiveestimation of dependencies from finite data.First we review the problem of predictivelearning as it relates to model complexitycontrol. Then we discuss several issuesimportant for practical implementation ofcomplexity control, using the frameworkprovided by Statistical Learning Theory (orVapnik-Chervonenkis theory). Finally, we showpractical applications of Vapnik-Chervonenkis(VC) generalization bounds for model complexitycontrol. Empirical comparisons of differentmethods for complexity control suggestpractical advantages of using VC-based modelselection in settings where VC generalizationbounds can be rigorously applied. We also arguethat VC-theory provides methodologicalframework for complexity control even when itstechnical results can not be directly applied.
Similar content being viewed by others
References
Akaike H (1970) Statistical predictor information. Ann. Inst. of Stat. Math. 22: 203-217
Anthony M and Bartlett P (2000) Neural Network Learning: Theoretical Foundations. Cambridge University Press
Barron A (1993) Universal approximation bounds for superposition of a sigmoid function. IEEE Trans. Info. Theory 39(3): 930-945
Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans. on IT 44(2): 525-536
Baum E and Haussler D (1989) What size net gives valid generalization? Neural Comp. 1: 151-160
Bruce A, Donoho D and Gao H-Y (1996) Wavelet analysis. IEEE Spect. (Oct.), 26-35
Bernardo JM and Smith ARM (1994) Bayesian Theory. Wiley, New York
Bishop C (1995) Neural Networks for Pattern Recognition. Oxford University Press
Blumer A, Ehrenfeucht A, Haussler D and Warmuth MK (1989) Learnability and the VC-dimension. Journal of the ACM 36: 926-965
Cherkassky V, Shao X, Mulier F and Vapnik V (1999) Model selection for regression using VC generalization bounds. IEEE Trans on Neural Networks 10(5): 1075-1089
Cherkassky V and Mulier F (1998) Learning from Data: Concepts, Theory and Methods. Wiley
Cherkassky V and Shao X (1998) Model selection for wavelet-based signal estimation, in Proc. IJCNN-98
Cherkassky V and Shao X (2001) Signal estimation and denoising using VC-theory. Neural Networks. Pergamon 14: 37-52
Cherkassky V and Kilts S (2001) Myopotential denoising of ECG signals using wavelet thresholding methods. Neural Networks. Pergamon 14: 1129-1137
Craven P and Wahba G (1979) Smoothing noisy data with spline functions. Numerische Math. 31: 377-403
Dempster AP, Schatzoff M and Wermuth N (1977) A Simulation study of alternatives to ordinary least squares. JASA 72(357): 77-106
Donoho DL (1995) Denoising by soft thresholding. IEEE Trans on IT 41(3): 613-627
Donoho DL (1992) Wavelet thresholding and W.V.D.: a 1-minute tour. Int'l Conf. OnWavelets and Applications. Toulouse, France
Donoho DL and Johnstone IM (1994) Ideal denoising in an orthonormal basis chosen from a library of bases. Technical Report 461, Department of Statistics. Stanford University
Donoho DL, VetterlyM, DeVore RA and Daubechies I (1998) Data compression and harmonic analysis. IEEE Trans on Info Theory, Oct. 1998, pp. 2435-2476
Duda RO, Hart PE and Stork DG (2001) Pattern Classification. J. Wiley, NY
Friedman JH (1994) An overview of predictive learning and function approximation. In: Cherkassky V, Friedman JH and Wechsler H (eds) From Statistics to Neural Networks: Theory and Pattern Recognition Applications. NATO ASI Series F, v.136, Springer
Girosi F (1994) Regularization theory, radial basis functions and networks. In: Cherkassky V, Friedman JH and Wechsler H (eds) From Statistics to Neural Networks: Theory and Pattern Recognition Applications. NATO ASI Series F, v.136, Springer
Hardle W, Hall P and Marron JS (1988) How far are automatically chosen regression smoothing parameters from their optimum? JASA 83: 86-95
Hastie T and Tibshirani R (1990) Generalized Additive Models. Chapman and Hall
Hastie T, Tibshirani R and Friedman J (2001) The Elements of Statistical Learning. Springer
Koiran P and Sontag E (1997) Neural networks with quadratic VC-dimension. Journal of Computer and System Sciences 54(1): 190-198
LeeW, Bartlett P and Williamson R (1996) Efficient agnostic learningin neural networks with bounded fan-in. IEEE Trans. IT 42(6): 21180-2132
Moody JE (1991) Note on generalization, regularization and architecture selection in nonlinear learning systems. First IEEE-SP Workshop on Neural Networks in SP. IEEE Comp. Soc. Press, pp. 1-10
Murata N, Yoshisawa S and Amari S (1994) Neural network information criterion determining the number of hidden units for artificial neural network models. IEEE Trans. on NNs 5: 865-872
Ripley BD (1996) Pattern Recognition and Neural Networks. Cambridge University
Press Shao X, Cherkassky V and Li W (2000) Measuring the VC-dimension using optimized experimental design. Neural Computation. MIT Press, 12(8), 1969-1986
Shao J and Cherkassky V (2001) Improved signal denoising using VC-theory, in Proc. IJCNN 2001, Washington D.C.
Shibata R (1981) An optimal selection of regression variables. Biometrika 68: 45-54
Shwartz G (1978) Estimating the dimension of a model. Ann. Stat 6: 461-464
Vapnik V (1982) Estimation of Dependencies Based on Empirical Data. Springer, NY
Vapnik V (1995) The Nature of Statistical Learning Theory. Springer
Vapnik V, Levin E and Le Cun Y (1994) Measuring the VC-Dimension of a Learning Machine. Neural Computation. MIT Press, 6, 851-876
Vapnik V (1998) Statistical Learning Theory. Wiley, New York
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Cherkassky, V. Model complexity control and statistical learning theory. Natural Computing 1, 109–133 (2002). https://doi.org/10.1023/A:1015007927558
Issue Date:
DOI: https://doi.org/10.1023/A:1015007927558