Skip to main content

Measurements of Generalisation Based on Information Geometry

  • Chapter

Part of the book series: Operations Research/Computer Science Interfaces Series ((ORCS,volume 8))

Abstract

Neural networks are statistical models and learning rules are estimators. In this paper a theory for measuring generalisation is developed by combining Bayesian decision theory with information geometry. The performance of an estimator is measured by the information divergence between the true distribution and the estimate, averaged over the Bayesian posterior. This unifies the majority of error measures currently in use. The optimal estimators also reveal some intricate interrelationships among information geometry, Banach spaces and sufficient statistics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A learning algorithm for Boltzmann machines, Cog. Sci., Vol. 9 (1985), pp147–169.

    Article  Google Scholar 

  2. S. Amari, Differential-Geometrical Methods in Statistics, Vol. 28 of Springer Lecture Notes in Statistics. Springer-Verlag, New York 1985.

    Google Scholar 

  3. S. Amari, Differential geometrical theory of statistics, In Amari et al. [4], Ch. 2, pp19–94.

    Google Scholar 

  4. S. Amari, O. E. Barndoff-Nieldon, R. E. Kass, S. L. Lauritzen, and C. R. Rao, eds., Differential Geometry in Statistical Inference, Vol. 10 of IMS Lecture Notes Monograph. IMS, Hayward, CA (1987).

    Google Scholar 

  5. S. J. Hanson, J. D. Cowan, and C. L. Giles, eds., Advances in Neural Information Processing Systems, Vol. 5 (1993), San Mateo, CA. Morgan Kaufmann.

    Google Scholar 

  6. R. E. Kass, Canonical parameterization and zero parameter effects curvature, J. Roy. Stat. Soc., B, Vol. 46 (1984), pp86–92.

    MathSciNet  MATH  Google Scholar 

  7. S. L. Lauritzen, Statistical manifolds, in: Amari et al. [4], Ch. 4, pp163–216.

    Google Scholar 

  8. D. J. C. MacKay, Bayesian Methods for Adaptive Models. PhD thesis, California Institute of Technology, Pasadena, CA (1992)

    Google Scholar 

  9. D. J. C. MacKay, Hyperparameters: Optimise, or integrate out?, Technical report, Cambridge (1993).

    Google Scholar 

  10. R. M. Neal, Bayesian learning via stochastic dynamics, in: Hanson et al. [5], pp475–482.

    Google Scholar 

  11. H. White, Learning in artificial neural networks: A statistical perspective, Neural Computation, Vol. 1(4) (1989), pp425–464.

    Article  Google Scholar 

  12. D. H. Wolpert, On the use of evidence in neural neworks, In Hanson et al. [5], pp539–546.

    Google Scholar 

  13. H. Zhu, Neural Networks and Adaptive Computers: Theory and Methods of Stochastic Adaptive Computations. PhD thesis, Dept. of Stat. & Comp. Math., Liverpool University (1993), ftp://archive.cis.ohio-state.edu/pub/neuroprose/Thesis/zhu.thesis.ps.Z

    Google Scholar 

  14. H. Zhu and R. Rohwer, Bayesian invariant measurements of generalisation for continuous distributions, Technical Report NCRG/4352, Dept. Comp. Sci. & Appl. Math., Aston University (August 1995), ftp://cs.aston.ac.Uk/neural/zhuh/continuous.ps.Z.

  15. H. Zhu and R. Rohwer, Bayesian invariant measurements of generalisation for discrete distributions, Technical Report NCRG/4351, Dept. Comp. Sei. & Appl. Math., Aston University (August 1995), ftp://cs.aston.ac.uk/neural/zhuh/discrete.ps.Z.

  16. H. Zhu and R. Rohwer, Information geometric measurements of generalisation, Technical Report NCRG/4350, Dept. Comp. Sci. & Appl. Math., Aston University (August 1995), ftp://cs.aston.ac.uk/neural/zhuh/generalisation.ps.Z.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer Science+Business Media New York

About this chapter

Cite this chapter

Zhu, H., Rohwer, R. (1997). Measurements of Generalisation Based on Information Geometry. In: Ellacott, S.W., Mason, J.C., Anderson, I.J. (eds) Mathematics of Neural Networks. Operations Research/Computer Science Interfaces Series, vol 8. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-6099-9_69

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-6099-9_69

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7794-8

  • Online ISBN: 978-1-4615-6099-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics