Abstract
Neural networks are statistical models and learning rules are estimators. In this paper a theory for measuring generalisation is developed by combining Bayesian decision theory with information geometry. The performance of an estimator is measured by the information divergence between the true distribution and the estimate, averaged over the Bayesian posterior. This unifies the majority of error measures currently in use. The optimal estimators also reveal some intricate interrelationships among information geometry, Banach spaces and sufficient statistics.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A learning algorithm for Boltzmann machines, Cog. Sci., Vol. 9 (1985), pp147–169.
S. Amari, Differential-Geometrical Methods in Statistics, Vol. 28 of Springer Lecture Notes in Statistics. Springer-Verlag, New York 1985.
S. Amari, Differential geometrical theory of statistics, In Amari et al. [4], Ch. 2, pp19–94.
S. Amari, O. E. Barndoff-Nieldon, R. E. Kass, S. L. Lauritzen, and C. R. Rao, eds., Differential Geometry in Statistical Inference, Vol. 10 of IMS Lecture Notes Monograph. IMS, Hayward, CA (1987).
S. J. Hanson, J. D. Cowan, and C. L. Giles, eds., Advances in Neural Information Processing Systems, Vol. 5 (1993), San Mateo, CA. Morgan Kaufmann.
R. E. Kass, Canonical parameterization and zero parameter effects curvature, J. Roy. Stat. Soc., B, Vol. 46 (1984), pp86–92.
S. L. Lauritzen, Statistical manifolds, in: Amari et al. [4], Ch. 4, pp163–216.
D. J. C. MacKay, Bayesian Methods for Adaptive Models. PhD thesis, California Institute of Technology, Pasadena, CA (1992)
D. J. C. MacKay, Hyperparameters: Optimise, or integrate out?, Technical report, Cambridge (1993).
R. M. Neal, Bayesian learning via stochastic dynamics, in: Hanson et al. [5], pp475–482.
H. White, Learning in artificial neural networks: A statistical perspective, Neural Computation, Vol. 1(4) (1989), pp425–464.
D. H. Wolpert, On the use of evidence in neural neworks, In Hanson et al. [5], pp539–546.
H. Zhu, Neural Networks and Adaptive Computers: Theory and Methods of Stochastic Adaptive Computations. PhD thesis, Dept. of Stat. & Comp. Math., Liverpool University (1993), ftp://archive.cis.ohio-state.edu/pub/neuroprose/Thesis/zhu.thesis.ps.Z
H. Zhu and R. Rohwer, Bayesian invariant measurements of generalisation for continuous distributions, Technical Report NCRG/4352, Dept. Comp. Sci. & Appl. Math., Aston University (August 1995), ftp://cs.aston.ac.Uk/neural/zhuh/continuous.ps.Z.
H. Zhu and R. Rohwer, Bayesian invariant measurements of generalisation for discrete distributions, Technical Report NCRG/4351, Dept. Comp. Sei. & Appl. Math., Aston University (August 1995), ftp://cs.aston.ac.uk/neural/zhuh/discrete.ps.Z.
H. Zhu and R. Rohwer, Information geometric measurements of generalisation, Technical Report NCRG/4350, Dept. Comp. Sci. & Appl. Math., Aston University (August 1995), ftp://cs.aston.ac.uk/neural/zhuh/generalisation.ps.Z.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media New York
About this chapter
Cite this chapter
Zhu, H., Rohwer, R. (1997). Measurements of Generalisation Based on Information Geometry. In: Ellacott, S.W., Mason, J.C., Anderson, I.J. (eds) Mathematics of Neural Networks. Operations Research/Computer Science Interfaces Series, vol 8. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-6099-9_69
Download citation
DOI: https://doi.org/10.1007/978-1-4615-6099-9_69
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7794-8
Online ISBN: 978-1-4615-6099-9
eBook Packages: Springer Book Archive