Skip to main content
Log in

Online learning via congregational gradient descent

  • Published:
Mathematics of Control, Signals and Systems Aims and scope Submit manuscript

Abstract

We propose and analyse a populational version of stepwise gradient descent suitable for a wide range of learning problems. The algorithm is motivated by genetic algorithms which update a population of solutions rather than just a single representative as is typical for gradient descent. This modification of traditional gradient descent (as used, for example, in the backpropogation algorithm) avoids getting trapped in local minima. We use an averaging analysis of the algorithm to relate its behaviour to an associated ordinary differential equation. We derive a result concerning how long one has to wait in order that, with a given high probability, the algorithm is within a certain neighbourhood of the global minimum. We also analyse the effect of different population sizes. An example is presented which corroborates our theory very well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. W. Atmar. Notes on the simulation of evolution.IEEE Transactions on Neural Networks, 5(1):130–147, January 1994.

    Google Scholar 

  2. P. Auer, M. Herbster, and M. K. Waruth. Exponentially many local minima for single neurons. Technical Report NC-TR-96-030, NeuroCOLT Technical Report Series, January 1996. (Available from http://www.dcs.rhbnc.ac.uk/neurocolt.html.)

  3. N. Baba. Global optimization of functions by the random optimization method.International Journal of Control, 30:1061–1065, 1977.

    Google Scholar 

  4. T. Bäck, F. Hoffmeister, and H.-P. Schwefel. A survey of evolutionary strategies. In R. K. Belew and L. B. Booker,Proceedings of the 4th International Conference on Genetic Algorithms, pages 2–9. Morgan Kaufmann, La Jolla, CA, 1991. (ftp://lumpi.infromatik.uni-dortmund.de/pub/EA/ icga91.ps.gz.)

    Google Scholar 

  5. T. Bäck and H.-P. Schwefel. An overview of evolutionary algorithms for parametric optimization.Evolutionary Computation, 1(1):1–23, 1993. (ftp://lumpi.informatik.uni-dortmund.de/pub/EA/ ecl:1.ps.Z.)

    Google Scholar 

  6. P. R. Barros. Robust Performance in Adaptive Control. Ph.D. thesis, The University of Newcastle, New South Wales, March 1990.

    Google Scholar 

  7. D. L. Battle and M. D. Vose. Isomorphisms of genetic algorithms. In G. J. E. Rawlins, editor,Foundations of Genetic Algorithms, pages 243–251. Morgan Kaufmann, San Mateo, CA, 1991.

    Google Scholar 

  8. E. B. Baum, D. Boueh, and C. Garrett. On genetic algorithms. InProceedings of the Eighth ACM Annual Workshop on Computational Learning Theory, pages 230–239, July 1995.

  9. A. Benveniste and M. Goursat. Blind equalizers.IEEE Transactions on Communications, 32:871–883, August 1984.

    Google Scholar 

  10. A. Benveniste, M. Métivier, and P. Prioret.Adaptive Algorithms and Stochastic Approximations. Springer-Verlag, Berlin, 1990.

    Google Scholar 

  11. K. L. Blackmore, R. C. Williamson, and I. M. Y. Mareels. Learning nonlinearly parametrized decision regions.Journal of Mathematical Systems, Estimation, and Control, 6(1):129–132, 1996. (Full manuscript ftp://trick.ntp.springer.de/jmsec/88289.ps.)

    Google Scholar 

  12. S. H. Brooks. A discussion of random methods for seeking minima.Operations Research, 6(2):244–251, 1958.

    Google Scholar 

  13. J. A. Bucklew, T. G. Kurtz, and W. A. Sethares. Weak convergence and local stability properties of fixed step size recursive algorithms.IEEE Transactions on Information Theory, 39:966–978, 1993.

    Google Scholar 

  14. N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worst-case quadratic loss bounds for a generalization of the Widrow-Hoff rule. In L. Pitt, editor,Proceedings of the Sixth ACM Annual Workshop on Computational Learning Theory, pages 429–438, July 1993.

  15. B. C. Cetin, J. Barhen, and J. W. Burdick. Terminal repeller unconstrained subenergy tunnelling (trust) for fast global optimization.Journal of Optimization Theory and Applications, 77(1):97–126, April 1993.

    Google Scholar 

  16. K. A. DeJong. Genetic algorithms are not function optimizers. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 5–17. Morgan Kaufmann, San Mateo, CA, 1993.

    Google Scholar 

  17. K. De Jong. Learning with genetic algorithms: An overview.Machine Learning, 3:121–138, 1988.

    Google Scholar 

  18. A. Demobo and O. Zeitouni.Large Deviation Techniques and Applications. Jones and Bartlett, Boston, MA, 1993.

    Google Scholar 

  19. C. A. Desoer and M. Vidyasagar.Feedback Systems: Input-Output Properties. Academic Press, New York, 1975.

    Google Scholar 

  20. Z. Ding, R. A. Kennedy, B. D. O. Anderson, and C. R. Johnson Jr. Ill-convergence of Godard blind equalizers in data communication systems.IEEE Transactions on Communications, 39(9):1313–1327, 1991.

    Google Scholar 

  21. K. Dogancay and R. A. Kennedy. Testing output performance in blind adaptation. InProceedings of the 33rd IEEE Conference on Decision and Control, pages 2817–2818, December 1994.

  22. W. Finnoff. Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima. InAdvances in Neural Information Processing 5, pages 459–466. Morgan Kaufmann, San Mateo, CA, 1993.

    Google Scholar 

  23. D. B. Fogel. An introduction to simulated evolutionary optimization.IEEE Transactions on Neural Networks, 5(1):3–14, January 1994.

    Google Scholar 

  24. S. Forrest. Genetic algorithms: Principles of natural selection applied to computation.Science, 261:972–978, August 1993.

    Google Scholar 

  25. S. Forrest and M. Mitchell. What makes a problem hard for a genetic algorithm? Some anomalous results and their explanation.Machine learning, 13:285–319, 1993.

    Google Scholar 

  26. M. R. Frater, R. R. Bitmead, and C. R. Johnson Jr. Escape from stable equilibria in blind adaptive equalization. InProceedings of the 31st IEEE Conference on Decision and Control, pages 1756–1761, December 1992.

  27. M. Gell-Mann.The Quark and the Jaguar. Little, Brown and Company, London, 1994.

    Google Scholar 

  28. D. E. Goldberg, K. Deb, and J. H. Clark. Accounting for noise in the sizing of populations. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 127–140. Morgan Kaufmann, San Mateo, CA, 1993.

    Google Scholar 

  29. J. Guddat, F. Guerra Vasquez, and H. Th. Jongen.Parametric Optimization: Singularities, Pathfollowing and Jumps. B. G. Teubner, Stuttgart, 1990. Published simultaneously by Wiley, Chichester.

    Google Scholar 

  30. M. H. Hassoun.Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, MA, 1995.

    Google Scholar 

  31. T. M. Heskes and B. Kappen. On-line learning processes in artificial neural networks. In J. G. Taylor, editor,Mathematical Approaches to Neural Networks, pages 199–233. North-Holland, Amsterdam, 1993.

    Google Scholar 

  32. M. W. Hirsch and S. Smale.Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, New York, 1974.

    Google Scholar 

  33. J. H. Holland.Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA, 1992.

    Google Scholar 

  34. J. Homer. Adaptive Echo Cancellation in Telecommunications. Ph.D. thesis, Australian National University, Canberra, April 1994.

    Google Scholar 

  35. J. Horn and D. E. Goldberg. Genetic algorithm difficulty and the modality of fitness landscapes. InProceedings of the Foundations of Genetic Algorithms (FOGA) 3, pages 243–270. Workshop held July 30–August 2, 1994, Estes Park, CO. IlliGAL Report 94006. (ftp://gal4.ge.uiuc.edu/pub/ papers/Publications/94006.ps.Z.)

  36. R. A. Jarvis. Adaptive global search by the process of competitive evolution.IEEE Transactions on Systems, Man and Cybernetics, 5:297–311, 1975.

    Google Scholar 

  37. T. Jones. A model of fitness landscapes. Technical Report TR 94-02-002, Santa Fe Institute, February 1994. (ftp://ftp.santafe.edu/pub/terry/model-of-landscapes.ps.gz.)

  38. C. R. Johnson, Jr, S. Dasgupta, and W. A. Sethares. Averaging analysis of local stability of a real constant modulus algorithm adaptive filter.IEEE Transactions on Acoustics, Speech and Signal processing, 36(6):900–910, 1988.

    Google Scholar 

  39. C.-M. Kuan and K. Hornik. Convergence of learning algorithms with constant learning rates.IEEE Transactions on Neural Networks, 2(5):484–489, 1991.

    Google Scholar 

  40. H. J. Kushner. Asymptotic global behaviour for stochastic approximation and diffusions with slowly decreasing noise effects: global minimization via monte carlo.SIAM Journal on Applied Mathematics, 40(1):169–185, 1987.

    Google Scholar 

  41. T. K. Leen and J. Moody. Weight space probability densities in stochastic learning: I. Dynamics and equilibria. InAdvances in Neural Information Processing 5, pages 451–458. Morgan Kaufmann, San Mateo, CA, 1993.

    Google Scholar 

  42. L. Ljung. Analysis of recursive stochastic algorithms.IEEE Transactions on Automatic Control, 22:551–575, 1977.

    Google Scholar 

  43. S. W. Mahfoud. Population sizing for sharing methods. InProceedings of the Foundations of Genetic Algorithms (FOGA) 3, pages 185–224. Workshop help July 30–August 2, 1994, Estes Park, CO. IlliGAL Report 94005. (ftp://gal4.ge.uiuc.edu/pub/papers/Publications/94005.ps.Z.)

  44. O. L. Mangasarian. Parallel gradient deistribution in unconstrained optimization.SIAM Journal on Control and Optimization, 33(6):1916–1925, November 1995.

    Google Scholar 

  45. C. J. McMurty and K. S. Fu. A variable structure automation used as a multimodal searching technique.IEEE Transactions on Automatic Control, 11:379–387, 1966.

    Google Scholar 

  46. M. Mitchell, J. H. Holland, and S. Forrest. When will a genetic algorithm outperform hill climbing? In G. Tesuaro, J. D. Cowan, and J. Alspector, editors,Advances in Neural Information Processing Systems 6. Morgan Kaufmann, San Mateo, CA, 1994.

    Google Scholar 

  47. X. Qi and F. Palmieri. Theoretical analysis of evolutionary algorithms with an infinite population size in continuous space, part 1: Basic properties of selection and mutation.IEEE Transactions on Neural Networks, 5(1):102–119, 1994.

    Google Scholar 

  48. J. P. Ros. Learning boolean functions with genetic algorithms: A PAC analysis. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 257–275. Morgan Kaufmann, San Mateo, CA, 1993.

    Google Scholar 

  49. G. Rudolph. Convergence analysis of canonical genetic algorithms.IEEE Transactions on Neural Networks, 5(1):96–101, 1994.

    Google Scholar 

  50. J. A. Sanders and F. Verhulst.Averaging Methods in Nonlinear Dynamical Systems. Applied Mathematical Sciences, volume 59. Springer-Verlag, New York, 1985.

    Google Scholar 

  51. R. E. Smith, S. Forrest, and A. S. Perelson. Searching for diverse, cooperative populations with genetic algorithms.Evolutionary Computation, 1(2):127–149, 1993.

    Google Scholar 

  52. F. J. Solis and R. J.-B. Wets. Minimization by random search techniques.Mathematics of Operations Research, 6(1):19–30, 1981.

    Google Scholar 

  53. V. Solo and X. Kong.Adaptive Signal Processing Algorithms: Stability and Performance. Prentice-Hall, Englewood Cliffs, NJ, 1995.

    Google Scholar 

  54. E. D. Sontag. Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets. InProceedings of ICNN95, pages 2949–2954, 1995.

  55. E. D. Sontag and H. J. Sussmann. Back propagation can give rise to spurious local minima even for network without hidden layers.Complex Systems, 3:91–106, 1989.

    Google Scholar 

  56. M. D. Vose. Modelling simple genetic algorithms. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 63–73. Morgan Kaufmann, San Mateo, CA, 1993.

    Google Scholar 

  57. B. Widrow and S. D. Stearns.Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs, NJ, 1985.

    Google Scholar 

  58. R. C. Williamson and U. Helmke. Existence and uniqueness results for neural network approximations.IEEE Transactions on Neural Networks, 6(1):2–13, 1995.

    Google Scholar 

  59. S. Yakowitz. A globally convergent stochastic approximation.SIAM Journal on Control and Optimization, 31:30–40, 1993.

    Google Scholar 

  60. T. Yoshizawa.Stability Theory and the Existence of Periodic Solutions and Almost Periodic Solutions. Applied Mathematical Sciences, volume 14. Springer-Verlag, New York, 1975.

    Google Scholar 

  61. A. A. Zhigljavsky.Theory of Global Random Search. Kluwer, Dordrecht, 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work was supported by the Australian Research Council.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blackmore, K.L., Williamson, R.C., Mareels, I.M.Y. et al. Online learning via congregational gradient descent. Math. Control Signal Systems 10, 331–363 (1997). https://doi.org/10.1007/BF01211551

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01211551

Key words

Navigation