Online learning via congregational gradient descent

Blackmore, Kim L.; Williamson, Robert C.; Mareels, Iven M. Y.; Sethares, William A.

doi:10.1007/BF01211551

Online learning via congregational gradient descent

Published: December 1997

Volume 10, pages 331–363, (1997)
Cite this article

Mathematics of Control, Signals and Systems Aims and scope Submit manuscript

Kim L. Blackmore¹,
Robert C. Williamson²,
Iven M. Y. Mareels³ &
…
William A. Sethares⁴

78 Accesses
5 Citations
Explore all metrics

Abstract

We propose and analyse a populational version of stepwise gradient descent suitable for a wide range of learning problems. The algorithm is motivated by genetic algorithms which update a population of solutions rather than just a single representative as is typical for gradient descent. This modification of traditional gradient descent (as used, for example, in the backpropogation algorithm) avoids getting trapped in local minima. We use an averaging analysis of the algorithm to relate its behaviour to an associated ordinary differential equation. We derive a result concerning how long one has to wait in order that, with a given high probability, the algorithm is within a certain neighbourhood of the global minimum. We also analyse the effect of different population sizes. An example is presented which corroborates our theory very well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

First-Order and Second-Order Variants of the Gradient Descent in a Unified Framework

Correspondence between neuroevolution and gradient descent

Article Open access 02 November 2021

Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent

References

W. Atmar. Notes on the simulation of evolution.IEEE Transactions on Neural Networks, 5(1):130–147, January 1994.
Google Scholar
P. Auer, M. Herbster, and M. K. Waruth. Exponentially many local minima for single neurons. Technical Report NC-TR-96-030, NeuroCOLT Technical Report Series, January 1996. (Available from http://www.dcs.rhbnc.ac.uk/neurocolt.html.)
N. Baba. Global optimization of functions by the random optimization method.International Journal of Control, 30:1061–1065, 1977.
Google Scholar
T. Bäck, F. Hoffmeister, and H.-P. Schwefel. A survey of evolutionary strategies. In R. K. Belew and L. B. Booker,Proceedings of the 4th International Conference on Genetic Algorithms, pages 2–9. Morgan Kaufmann, La Jolla, CA, 1991. (ftp://lumpi.infromatik.uni-dortmund.de/pub/EA/ icga91.ps.gz.)
Google Scholar
T. Bäck and H.-P. Schwefel. An overview of evolutionary algorithms for parametric optimization.Evolutionary Computation, 1(1):1–23, 1993. (ftp://lumpi.informatik.uni-dortmund.de/pub/EA/ ecl:1.ps.Z.)
Google Scholar
P. R. Barros. Robust Performance in Adaptive Control. Ph.D. thesis, The University of Newcastle, New South Wales, March 1990.
Google Scholar
D. L. Battle and M. D. Vose. Isomorphisms of genetic algorithms. In G. J. E. Rawlins, editor,Foundations of Genetic Algorithms, pages 243–251. Morgan Kaufmann, San Mateo, CA, 1991.
Google Scholar
E. B. Baum, D. Boueh, and C. Garrett. On genetic algorithms. InProceedings of the Eighth ACM Annual Workshop on Computational Learning Theory, pages 230–239, July 1995.
A. Benveniste and M. Goursat. Blind equalizers.IEEE Transactions on Communications, 32:871–883, August 1984.
Google Scholar
A. Benveniste, M. Métivier, and P. Prioret.Adaptive Algorithms and Stochastic Approximations. Springer-Verlag, Berlin, 1990.
Google Scholar
K. L. Blackmore, R. C. Williamson, and I. M. Y. Mareels. Learning nonlinearly parametrized decision regions.Journal of Mathematical Systems, Estimation, and Control, 6(1):129–132, 1996. (Full manuscript ftp://trick.ntp.springer.de/jmsec/88289.ps.)
Google Scholar
S. H. Brooks. A discussion of random methods for seeking minima.Operations Research, 6(2):244–251, 1958.
Google Scholar
J. A. Bucklew, T. G. Kurtz, and W. A. Sethares. Weak convergence and local stability properties of fixed step size recursive algorithms.IEEE Transactions on Information Theory, 39:966–978, 1993.
Google Scholar
N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worst-case quadratic loss bounds for a generalization of the Widrow-Hoff rule. In L. Pitt, editor,Proceedings of the Sixth ACM Annual Workshop on Computational Learning Theory, pages 429–438, July 1993.
B. C. Cetin, J. Barhen, and J. W. Burdick. Terminal repeller unconstrained subenergy tunnelling (trust) for fast global optimization.Journal of Optimization Theory and Applications, 77(1):97–126, April 1993.
Google Scholar
K. A. DeJong. Genetic algorithms are not function optimizers. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 5–17. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
K. De Jong. Learning with genetic algorithms: An overview.Machine Learning, 3:121–138, 1988.
Google Scholar
A. Demobo and O. Zeitouni.Large Deviation Techniques and Applications. Jones and Bartlett, Boston, MA, 1993.
Google Scholar
C. A. Desoer and M. Vidyasagar.Feedback Systems: Input-Output Properties. Academic Press, New York, 1975.
Google Scholar
Z. Ding, R. A. Kennedy, B. D. O. Anderson, and C. R. Johnson Jr. Ill-convergence of Godard blind equalizers in data communication systems.IEEE Transactions on Communications, 39(9):1313–1327, 1991.
Google Scholar
K. Dogancay and R. A. Kennedy. Testing output performance in blind adaptation. InProceedings of the 33rd IEEE Conference on Decision and Control, pages 2817–2818, December 1994.
W. Finnoff. Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima. InAdvances in Neural Information Processing 5, pages 459–466. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
D. B. Fogel. An introduction to simulated evolutionary optimization.IEEE Transactions on Neural Networks, 5(1):3–14, January 1994.
Google Scholar
S. Forrest. Genetic algorithms: Principles of natural selection applied to computation.Science, 261:972–978, August 1993.
Google Scholar
S. Forrest and M. Mitchell. What makes a problem hard for a genetic algorithm? Some anomalous results and their explanation.Machine learning, 13:285–319, 1993.
Google Scholar
M. R. Frater, R. R. Bitmead, and C. R. Johnson Jr. Escape from stable equilibria in blind adaptive equalization. InProceedings of the 31st IEEE Conference on Decision and Control, pages 1756–1761, December 1992.
M. Gell-Mann.The Quark and the Jaguar. Little, Brown and Company, London, 1994.
Google Scholar
D. E. Goldberg, K. Deb, and J. H. Clark. Accounting for noise in the sizing of populations. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 127–140. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
J. Guddat, F. Guerra Vasquez, and H. Th. Jongen.Parametric Optimization: Singularities, Pathfollowing and Jumps. B. G. Teubner, Stuttgart, 1990. Published simultaneously by Wiley, Chichester.
Google Scholar
M. H. Hassoun.Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, MA, 1995.
Google Scholar
T. M. Heskes and B. Kappen. On-line learning processes in artificial neural networks. In J. G. Taylor, editor,Mathematical Approaches to Neural Networks, pages 199–233. North-Holland, Amsterdam, 1993.
Google Scholar
M. W. Hirsch and S. Smale.Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, New York, 1974.
Google Scholar
J. H. Holland.Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA, 1992.
Google Scholar
J. Homer. Adaptive Echo Cancellation in Telecommunications. Ph.D. thesis, Australian National University, Canberra, April 1994.
Google Scholar
J. Horn and D. E. Goldberg. Genetic algorithm difficulty and the modality of fitness landscapes. InProceedings of the Foundations of Genetic Algorithms (FOGA) 3, pages 243–270. Workshop held July 30–August 2, 1994, Estes Park, CO. IlliGAL Report 94006. (ftp://gal4.ge.uiuc.edu/pub/ papers/Publications/94006.ps.Z.)
R. A. Jarvis. Adaptive global search by the process of competitive evolution.IEEE Transactions on Systems, Man and Cybernetics, 5:297–311, 1975.
Google Scholar
T. Jones. A model of fitness landscapes. Technical Report TR 94-02-002, Santa Fe Institute, February 1994. (ftp://ftp.santafe.edu/pub/terry/model-of-landscapes.ps.gz.)
C. R. Johnson, Jr, S. Dasgupta, and W. A. Sethares. Averaging analysis of local stability of a real constant modulus algorithm adaptive filter.IEEE Transactions on Acoustics, Speech and Signal processing, 36(6):900–910, 1988.
Google Scholar
C.-M. Kuan and K. Hornik. Convergence of learning algorithms with constant learning rates.IEEE Transactions on Neural Networks, 2(5):484–489, 1991.
Google Scholar
H. J. Kushner. Asymptotic global behaviour for stochastic approximation and diffusions with slowly decreasing noise effects: global minimization via monte carlo.SIAM Journal on Applied Mathematics, 40(1):169–185, 1987.
Google Scholar
T. K. Leen and J. Moody. Weight space probability densities in stochastic learning: I. Dynamics and equilibria. InAdvances in Neural Information Processing 5, pages 451–458. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
L. Ljung. Analysis of recursive stochastic algorithms.IEEE Transactions on Automatic Control, 22:551–575, 1977.
Google Scholar
S. W. Mahfoud. Population sizing for sharing methods. InProceedings of the Foundations of Genetic Algorithms (FOGA) 3, pages 185–224. Workshop help July 30–August 2, 1994, Estes Park, CO. IlliGAL Report 94005. (ftp://gal4.ge.uiuc.edu/pub/papers/Publications/94005.ps.Z.)
O. L. Mangasarian. Parallel gradient deistribution in unconstrained optimization.SIAM Journal on Control and Optimization, 33(6):1916–1925, November 1995.
Google Scholar
C. J. McMurty and K. S. Fu. A variable structure automation used as a multimodal searching technique.IEEE Transactions on Automatic Control, 11:379–387, 1966.
Google Scholar
M. Mitchell, J. H. Holland, and S. Forrest. When will a genetic algorithm outperform hill climbing? In G. Tesuaro, J. D. Cowan, and J. Alspector, editors,Advances in Neural Information Processing Systems 6. Morgan Kaufmann, San Mateo, CA, 1994.
Google Scholar
X. Qi and F. Palmieri. Theoretical analysis of evolutionary algorithms with an infinite population size in continuous space, part 1: Basic properties of selection and mutation.IEEE Transactions on Neural Networks, 5(1):102–119, 1994.
Google Scholar
J. P. Ros. Learning boolean functions with genetic algorithms: A PAC analysis. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 257–275. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
G. Rudolph. Convergence analysis of canonical genetic algorithms.IEEE Transactions on Neural Networks, 5(1):96–101, 1994.
Google Scholar
J. A. Sanders and F. Verhulst.Averaging Methods in Nonlinear Dynamical Systems. Applied Mathematical Sciences, volume 59. Springer-Verlag, New York, 1985.
Google Scholar
R. E. Smith, S. Forrest, and A. S. Perelson. Searching for diverse, cooperative populations with genetic algorithms.Evolutionary Computation, 1(2):127–149, 1993.
Google Scholar
F. J. Solis and R. J.-B. Wets. Minimization by random search techniques.Mathematics of Operations Research, 6(1):19–30, 1981.
Google Scholar
V. Solo and X. Kong.Adaptive Signal Processing Algorithms: Stability and Performance. Prentice-Hall, Englewood Cliffs, NJ, 1995.
Google Scholar
E. D. Sontag. Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets. InProceedings of ICNN95, pages 2949–2954, 1995.
E. D. Sontag and H. J. Sussmann. Back propagation can give rise to spurious local minima even for network without hidden layers.Complex Systems, 3:91–106, 1989.
Google Scholar
M. D. Vose. Modelling simple genetic algorithms. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 63–73. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
B. Widrow and S. D. Stearns.Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs, NJ, 1985.
Google Scholar
R. C. Williamson and U. Helmke. Existence and uniqueness results for neural network approximations.IEEE Transactions on Neural Networks, 6(1):2–13, 1995.
Google Scholar
S. Yakowitz. A globally convergent stochastic approximation.SIAM Journal on Control and Optimization, 31:30–40, 1993.
Google Scholar
T. Yoshizawa.Stability Theory and the Existence of Periodic Solutions and Almost Periodic Solutions. Applied Mathematical Sciences, volume 14. Springer-Verlag, New York, 1975.
Google Scholar
A. A. Zhigljavsky.Theory of Global Random Search. Kluwer, Dordrecht, 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Defence, DSTO C3 Research Centre, Fernhill Park, 2600, Canberra, ACT, Australia
Kim L. Blackmore
Department of Engineering, Australian National University, 0200, Canberra, ACT, Australia
Robert C. Williamson
Department of Electrical and Electronic Engineering, The University of Melbourne, 3052, Parkville, Vic, Australia
Iven M. Y. Mareels
Department of Electrical and Computer Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, 53706-1691, Madison, Wisconsin, USA
William A. Sethares

Authors

Kim L. Blackmore
View author publications
You can also search for this author in PubMed Google Scholar
Robert C. Williamson
View author publications
You can also search for this author in PubMed Google Scholar
Iven M. Y. Mareels
View author publications
You can also search for this author in PubMed Google Scholar
William A. Sethares
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work was supported by the Australian Research Council.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blackmore, K.L., Williamson, R.C., Mareels, I.M.Y. et al. Online learning via congregational gradient descent. Math. Control Signal Systems 10, 331–363 (1997). https://doi.org/10.1007/BF01211551

Download citation

Received: 01 May 1995
Revised: 01 April 1997
Issue Date: December 1997
DOI: https://doi.org/10.1007/BF01211551

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online learning via congregational gradient descent

Abstract

Access this article

Similar content being viewed by others

First-Order and Second-Order Variants of the Gradient Descent in a Unified Framework

Correspondence between neuroevolution and gradient descent

Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Online learning via congregational gradient descent

Abstract

Access this article

Similar content being viewed by others

First-Order and Second-Order Variants of the Gradient Descent in a Unified Framework

Correspondence between neuroevolution and gradient descent

Adaptive Sampling for Incremental Optimization Using Stochastic Gradient Descent

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation