Prequential and Cross-Validated Regression Estimation

Modha, Dharmendra S.; Masry, Elias

doi:10.1023/A:1007577530334

Prequential and Cross-Validated Regression Estimation

Published: October 1998

Volume 33, pages 5–39, (1998)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Prequential and Cross-Validated Regression Estimation

Download PDF

Dharmendra S. Modha¹ &
Elias Masry²

615 Accesses
11 Citations
Explore all metrics

Abstract

Prequential model selection and delete-one cross-validation are data-driven methodologies for choosing between rival models on the basis of their predictive abilities. For a given set of observations, the predictive ability of a model is measured by the model's accumulated prediction error and by the model's average-out-of-sample prediction error, respectively, for prequential model selection and for cross-validation. In this paper, given i.i.d. observations, we propose nonparametric regression estimators—based on neural networks—that select the number of “hidden units” (or “neurons”) using either prequential model selection or delete-one cross-validation. As our main contributions: (i) we establish rates of convergence for the integrated mean-squared errors in estimating the regression function using “off-line” or “batch” versions of the proposed estimators and (ii) we establish rates of convergence for the time-averaged expected prediction errors in using “on-line” versions of the proposed estimators. We also present computer simulations (i) empirically validating the proposed estimators and (ii) empirically comparing the proposed estimators with certain novel prequential and cross-validated “mixture” regression estimators.

References

Barron, A.R. (1991). Complexity regularization. In G. Roussas (Ed.), Proceedings NATO Advanced Study Institute on Nonparametric Functional Estimation. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Google Scholar
Barron, A.R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory, 39(3), 930-945.
Google Scholar
Barron, A.R. (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning, 14, 115-133.
Google Scholar
Barron, A.R., Birgé, L., & Massart, P. (1996). Risk bounds for model selection via penalization. Probab. Theory Relat. Fields (to appear).
Baum, E., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1(1), 151-160.
Google Scholar
Birgé, L., & Massart, P. (1994a). From model selection to adaptive estimation. Probab. Theory Relat. Fields (to appear).
Birgé, L., & Massart, P. (1994b). Minimum contrast estimators on sieves. Technical Report. Université Paris-Sud.
Breiman, L. (1993). Hinging hyperplanes for regression, classification, and function approximation. IEEE Trans. Inform. Theory, 39(3), 999-1013.
Google Scholar
Dawid, A.P. (1984). Statistical theory: The prequential approach. J.R. Statist. Soc. A, 147(2), 278-292.
Google Scholar
Dawid, A.P. (1991). Prequential data analysis. In M. Ghosh, & P.K. Pathak (Eds.), Current issues in statistical inference. Hayward, CA: Institute of Mathematical Statistics.
Google Scholar
Dawid, A.P. (1992). Prequential analysis, stochastic complexity, and Bayesian inference. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics. Oxford University Press.
Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100, 78-150.
Google Scholar
Haussler, D., Kearns, M., & Schapire, R.E. (1994). Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension. Machine Learning, 14(1), 83-113.
Google Scholar
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58, 13-30.
Google Scholar
Hornik, K., Stinchcombe, M.B., White, H., & Auer, P. (1994). Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Comput., 6, 1262-1275.
Google Scholar
Jones, L.K. (1997). L.K. Jones, The computational intractability of training sigmoidal neural networks. IEEE Trans. Inform. Theory, 43(1), 167-173.
Google Scholar
Kearns, M. (1997). A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. Neural Computation, 9, 1143-1161.
Google Scholar
Lehtokangas, M., Saarinen, J., Huuhtanen, P., & Kaski, K. (1996). Predictive minimum description length criterion for time series modeling with neural networks. Neural Computation, 8, 583-593.
Google Scholar
Li, K.-C. (1987). Asymptotic optimality for C _p, C _L, cross-validation, and generalized cross-validation: Discrete index set. Ann. Statist., 15, 958-975.
Google Scholar
Lugosi, G., & Nobel, A. (1995). Adaptive model selection using empirical complexities. Submitted for publication.
Lugosi, G., & Zeger, K. (1996). Concept learning using complexity regularization. IEEE Trans. Inform. Theory, 42(1), 48-54.
Google Scholar
McCaffrey, D.F., & Gallant, A.R. (1994). Convergence rates for single hidden layer feedforward networks. Neural Networks, 7, 147-158.
Google Scholar
Modha, D.S., & Masry, E. (1996). Minimum complexity regression estimation with weakly dependent observations. IEEE Trans. Inform. Theory, 42, 2133-2145.
Google Scholar
Modha, D.S., & Masry, E. (1998). Memory-universal prediction of stationary random processes. IEEE Trans. Inform. Theory, 44, 117-133.
Google Scholar
Mosteller, F., & Tukey, J.W. (1968). Data analysis, including statistics. In G. Lindzey & E. Aronson (Eds.), Handbook of social psychology (Vol. 2). Reading, MA: Addison-Wesley.
Google Scholar
Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Ann. Statist., 11(2), 416-431.
Google Scholar
Rissanen, J. (1986a). A predictive least-squares principle. IMA J. Math. Contr. Inform., 3, 211-222.
Google Scholar
Rissanen, J. (1986b). Complexity of strings in the class of Markov sources. IEEE Trans. Inform. Theory, 32(4), 526-532.
Google Scholar
Rissanen, J. (1989). Stochastic complexity in statistical inquiry. Teaneck, NJ: World Scientific Publishers.
Google Scholar
Rissanen, J. (1994). Information theory and learning. In P. Smolensky, M.C. Mozer, & D.E. Rumelhart (Eds.), Mathematical perspectives on neural networks. Hilldale, NJ: L. Erlbaum Associates.
Google Scholar
Rissanen, J., Speed, T., & Yu, B. (1992). Density estimation by stochastic complexity. IEEE Tran. Inform. Theory, 38(2), 315-323.
Google Scholar
Sarkar, D. (1995). Methods to speed up error back-propagation learning algorithm. ACM Comput. Surveys, 27(4), 519-542.
Google Scholar
Shen, X., & Wong, W.H. (1994). Convergence rates of sieves estimates. Ann. Statist., 22, 580-615.
Google Scholar
Stone, C.J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann. Statist., 12, 1285-1297.
Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. J.R. Statist. Soc. B, 36, 111–133.
Google Scholar
Stone, M. (1977). An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion. J.R. Statist. Soc. B, 39, 44-47.
Google Scholar
Vapnik, V.N. (1982). Estimation of dependences based on empirical data. New York: Springer-Verlag.
Google Scholar
Vapnik, V.N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.
Google Scholar
White, H. (1989). Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings. Neural Networks, 3, 535-549.
Google Scholar
Yukich, J.E., Stinchcombe, M.B., & White, H. (1995). Sup-norm approximation bounds for networks through probabilistic methods. IEEE Trans. Inform. Theory. 41(4), 1021-1027.
Google Scholar
Yu, B., & Speed, T. (1992). Data compression and histograms. Probab. Theory Relat. Fields, 92, 195-229.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Almaden Research Center, San Jose, CA, 95120-6099, USA
Dharmendra S. Modha
Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA, 92093-0407, USA
Elias Masry

Authors

Dharmendra S. Modha
View author publications
You can also search for this author in PubMed Google Scholar
Elias Masry
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Modha, D.S., Masry, E. Prequential and Cross-Validated Regression Estimation. Machine Learning 33, 5–39 (1998). https://doi.org/10.1023/A:1007577530334

Download citation

Issue Date: October 1998
DOI: https://doi.org/10.1023/A:1007577530334

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Prequential and Cross-Validated Regression Estimation

Abstract

Article PDF

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Prequential and Cross-Validated Regression Estimation

Abstract

Article PDF

Similar content being viewed by others

A survey on semi-supervised learning

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey of transfer learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation