Abstract
Predictive learning has been traditionally studied in applied mathematics (function approximation), statistics (nonparametric regression), and engineering (pattern recognition). Recently the fields of artificial intelligence (machine learning) and connectionism (neural networks) have emerged, increasing interest in this problem, both in terms of wider application and methodological advances. This paper reviews the underlying principles of many of the practical approaches developed in these fields, with the goal of placing them in a common perspective and providing a unifying overview.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Bibliography
Akaike, H. (1974). A new look at statistical model identification. IEEE Trans. Auto. Control 19 716–723.
Barron, A. (1984) Predicted squared error: a criterion for automatic model selection. In Self-Organizing Methods in Modeling. S. Farrow, ed., Marcel Dekker, New York.
Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression Analysis and its Applications. Wiley, New York, NY.
Bellman, R. E. (1961). Adaptive Control Proceses. Princeton University Press.
Breiman, L. (1991). The II-method for estimating multivariate functions from noisy data. Technometrics 33 125–160.
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
Breiman L. and Friedman, J. H. (1994). A new approach to multiple outputs through stacking. Stanford University, Department of Statistics, Technical Report LCS114.
Breiman, L. and Spector, P. (1989). Submodel selection and evaluation in regression X random case. Internat. Statist. Rev. (to appear).
Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 317–403.
Denker, J. S. and Le Cun, Y. (1991). Transforming neural-net output levels to probability distributions. In Advances in Neural Information Processing Systems 3. Lippmann, Moody, and Touretzky eds. Morgan Kaufman, San Mateo, CA.
Duda, R. O. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley, New York, NY.
Efron, B. (1983). Estimating the error rate of a prediction rule. J. Amer. Statist. Assoc. 78 316–333.
Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemo-metrics regression tools (with discussion). Technometrics 35 109–148.
Friedman, J. H. (1985). Classification and multiple response regression through projection pursuit. Stanford University, Department of Statistics, Technical Report LCS012.
Friedman, J. H. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1–141.
Friedman, J. H. (1993). Estimating functions of mixed ordinal and categorical variables using adaptive splines. In: New Directions in Statistical Data Analysis and Robustness, Morgenthaler, Ronchetti, and Stahel, eds. Birkhauser
Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
Furnival, G. M. and Wilson, R. M. (1974). Regression by leaps and bounds. Technometrics 16 499–512.
Gill, P. E., Murray, W. and Wright, M. H. (1981). Practical Optimization. Academic Press.
Girosi, F., Jones, M. and Poggio, T (1993). Priors, stabilizers and basis functions: from regularization to radial, tensor, and additive splines. Massachusetts Institute of Technology Artificial Intelligence Laboratory Technical Report A. I. 1430.
Hastie, T., Buja, A., and Tibshirani, R. (1992). Flexible discriminant analysis. J. Amer. Statist. Assoc. (to appear).
Holland, J. (1975). Adaptation in Artificial and Neural Systems. University of Michigan Press. Ann Arbor, MI.
Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk. USSR 114 953–956 (In Russian).
Lippmann, R. (1989). Pattern classification using neural networks. IEEE Communications Magazine 11 47–64.
Lorentz, G. G. (1986). Approximation of Functions. Chelsea, New York, NY.
Mallows, C. L. (1973). Some comments on C p . Technometrics 15 661–675.
Moody, J. E. (1992). The effective number of parameters: an analysis of generalization and regularization in nonlinear learning systems. In Advances in Neural Information Processing Systems 4, Moody, Hanson, and Lippmann, eds., Morgan Kaufmann Publishers, San Mateo, CA.
Ripley, B. D. (1994). Neural networks and related methods for classification (with discussion). J. Roy. Statist. Soc. B 56 (to appear).
Rissanen, Y. (1983). A universal prior for integers and estimation by minimum description length. Ann. Statist. 6 416–431.
Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning internal representations by error propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Rumelhart, McClelland, eds. MIT Press, Cambridge, MA.
Schwartz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
Weigend, A. S., Huberman, B. A. and Rumelhart, D. (1991). Predicting the future: a connectionist approach. Intl. J. Neural Syst. 1 193–209.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Friedman, J.H. (1994). An Overview of Predictive Learning and Function Approximation. In: Cherkassky, V., Friedman, J.H., Wechsler, H. (eds) From Statistics to Neural Networks. NATO ASI Series, vol 136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79119-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-79119-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-79121-5
Online ISBN: 978-3-642-79119-2
eBook Packages: Springer Book Archive