Abstract
We propose and analyse a populational version of stepwise gradient descent suitable for a wide range of learning problems. The algorithm is motivated by genetic algorithms which update a population of solutions rather than just a single representative as is typical for gradient descent. This modification of traditional gradient descent (as used, for example, in the backpropogation algorithm) avoids getting trapped in local minima. We use an averaging analysis of the algorithm to relate its behaviour to an associated ordinary differential equation. We derive a result concerning how long one has to wait in order that, with a given high probability, the algorithm is within a certain neighbourhood of the global minimum. We also analyse the effect of different population sizes. An example is presented which corroborates our theory very well.
Similar content being viewed by others
References
W. Atmar. Notes on the simulation of evolution.IEEE Transactions on Neural Networks, 5(1):130–147, January 1994.
P. Auer, M. Herbster, and M. K. Waruth. Exponentially many local minima for single neurons. Technical Report NC-TR-96-030, NeuroCOLT Technical Report Series, January 1996. (Available from http://www.dcs.rhbnc.ac.uk/neurocolt.html.)
N. Baba. Global optimization of functions by the random optimization method.International Journal of Control, 30:1061–1065, 1977.
T. Bäck, F. Hoffmeister, and H.-P. Schwefel. A survey of evolutionary strategies. In R. K. Belew and L. B. Booker,Proceedings of the 4th International Conference on Genetic Algorithms, pages 2–9. Morgan Kaufmann, La Jolla, CA, 1991. (ftp://lumpi.infromatik.uni-dortmund.de/pub/EA/ icga91.ps.gz.)
T. Bäck and H.-P. Schwefel. An overview of evolutionary algorithms for parametric optimization.Evolutionary Computation, 1(1):1–23, 1993. (ftp://lumpi.informatik.uni-dortmund.de/pub/EA/ ecl:1.ps.Z.)
P. R. Barros. Robust Performance in Adaptive Control. Ph.D. thesis, The University of Newcastle, New South Wales, March 1990.
D. L. Battle and M. D. Vose. Isomorphisms of genetic algorithms. In G. J. E. Rawlins, editor,Foundations of Genetic Algorithms, pages 243–251. Morgan Kaufmann, San Mateo, CA, 1991.
E. B. Baum, D. Boueh, and C. Garrett. On genetic algorithms. InProceedings of the Eighth ACM Annual Workshop on Computational Learning Theory, pages 230–239, July 1995.
A. Benveniste and M. Goursat. Blind equalizers.IEEE Transactions on Communications, 32:871–883, August 1984.
A. Benveniste, M. Métivier, and P. Prioret.Adaptive Algorithms and Stochastic Approximations. Springer-Verlag, Berlin, 1990.
K. L. Blackmore, R. C. Williamson, and I. M. Y. Mareels. Learning nonlinearly parametrized decision regions.Journal of Mathematical Systems, Estimation, and Control, 6(1):129–132, 1996. (Full manuscript ftp://trick.ntp.springer.de/jmsec/88289.ps.)
S. H. Brooks. A discussion of random methods for seeking minima.Operations Research, 6(2):244–251, 1958.
J. A. Bucklew, T. G. Kurtz, and W. A. Sethares. Weak convergence and local stability properties of fixed step size recursive algorithms.IEEE Transactions on Information Theory, 39:966–978, 1993.
N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worst-case quadratic loss bounds for a generalization of the Widrow-Hoff rule. In L. Pitt, editor,Proceedings of the Sixth ACM Annual Workshop on Computational Learning Theory, pages 429–438, July 1993.
B. C. Cetin, J. Barhen, and J. W. Burdick. Terminal repeller unconstrained subenergy tunnelling (trust) for fast global optimization.Journal of Optimization Theory and Applications, 77(1):97–126, April 1993.
K. A. DeJong. Genetic algorithms are not function optimizers. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 5–17. Morgan Kaufmann, San Mateo, CA, 1993.
K. De Jong. Learning with genetic algorithms: An overview.Machine Learning, 3:121–138, 1988.
A. Demobo and O. Zeitouni.Large Deviation Techniques and Applications. Jones and Bartlett, Boston, MA, 1993.
C. A. Desoer and M. Vidyasagar.Feedback Systems: Input-Output Properties. Academic Press, New York, 1975.
Z. Ding, R. A. Kennedy, B. D. O. Anderson, and C. R. Johnson Jr. Ill-convergence of Godard blind equalizers in data communication systems.IEEE Transactions on Communications, 39(9):1313–1327, 1991.
K. Dogancay and R. A. Kennedy. Testing output performance in blind adaptation. InProceedings of the 33rd IEEE Conference on Decision and Control, pages 2817–2818, December 1994.
W. Finnoff. Diffusion approximations for the constant learning rate backpropagation algorithm and resistance to local minima. InAdvances in Neural Information Processing 5, pages 459–466. Morgan Kaufmann, San Mateo, CA, 1993.
D. B. Fogel. An introduction to simulated evolutionary optimization.IEEE Transactions on Neural Networks, 5(1):3–14, January 1994.
S. Forrest. Genetic algorithms: Principles of natural selection applied to computation.Science, 261:972–978, August 1993.
S. Forrest and M. Mitchell. What makes a problem hard for a genetic algorithm? Some anomalous results and their explanation.Machine learning, 13:285–319, 1993.
M. R. Frater, R. R. Bitmead, and C. R. Johnson Jr. Escape from stable equilibria in blind adaptive equalization. InProceedings of the 31st IEEE Conference on Decision and Control, pages 1756–1761, December 1992.
M. Gell-Mann.The Quark and the Jaguar. Little, Brown and Company, London, 1994.
D. E. Goldberg, K. Deb, and J. H. Clark. Accounting for noise in the sizing of populations. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 127–140. Morgan Kaufmann, San Mateo, CA, 1993.
J. Guddat, F. Guerra Vasquez, and H. Th. Jongen.Parametric Optimization: Singularities, Pathfollowing and Jumps. B. G. Teubner, Stuttgart, 1990. Published simultaneously by Wiley, Chichester.
M. H. Hassoun.Fundamentals of Artificial Neural Networks. MIT Press, Cambridge, MA, 1995.
T. M. Heskes and B. Kappen. On-line learning processes in artificial neural networks. In J. G. Taylor, editor,Mathematical Approaches to Neural Networks, pages 199–233. North-Holland, Amsterdam, 1993.
M. W. Hirsch and S. Smale.Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, New York, 1974.
J. H. Holland.Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA, 1992.
J. Homer. Adaptive Echo Cancellation in Telecommunications. Ph.D. thesis, Australian National University, Canberra, April 1994.
J. Horn and D. E. Goldberg. Genetic algorithm difficulty and the modality of fitness landscapes. InProceedings of the Foundations of Genetic Algorithms (FOGA) 3, pages 243–270. Workshop held July 30–August 2, 1994, Estes Park, CO. IlliGAL Report 94006. (ftp://gal4.ge.uiuc.edu/pub/ papers/Publications/94006.ps.Z.)
R. A. Jarvis. Adaptive global search by the process of competitive evolution.IEEE Transactions on Systems, Man and Cybernetics, 5:297–311, 1975.
T. Jones. A model of fitness landscapes. Technical Report TR 94-02-002, Santa Fe Institute, February 1994. (ftp://ftp.santafe.edu/pub/terry/model-of-landscapes.ps.gz.)
C. R. Johnson, Jr, S. Dasgupta, and W. A. Sethares. Averaging analysis of local stability of a real constant modulus algorithm adaptive filter.IEEE Transactions on Acoustics, Speech and Signal processing, 36(6):900–910, 1988.
C.-M. Kuan and K. Hornik. Convergence of learning algorithms with constant learning rates.IEEE Transactions on Neural Networks, 2(5):484–489, 1991.
H. J. Kushner. Asymptotic global behaviour for stochastic approximation and diffusions with slowly decreasing noise effects: global minimization via monte carlo.SIAM Journal on Applied Mathematics, 40(1):169–185, 1987.
T. K. Leen and J. Moody. Weight space probability densities in stochastic learning: I. Dynamics and equilibria. InAdvances in Neural Information Processing 5, pages 451–458. Morgan Kaufmann, San Mateo, CA, 1993.
L. Ljung. Analysis of recursive stochastic algorithms.IEEE Transactions on Automatic Control, 22:551–575, 1977.
S. W. Mahfoud. Population sizing for sharing methods. InProceedings of the Foundations of Genetic Algorithms (FOGA) 3, pages 185–224. Workshop help July 30–August 2, 1994, Estes Park, CO. IlliGAL Report 94005. (ftp://gal4.ge.uiuc.edu/pub/papers/Publications/94005.ps.Z.)
O. L. Mangasarian. Parallel gradient deistribution in unconstrained optimization.SIAM Journal on Control and Optimization, 33(6):1916–1925, November 1995.
C. J. McMurty and K. S. Fu. A variable structure automation used as a multimodal searching technique.IEEE Transactions on Automatic Control, 11:379–387, 1966.
M. Mitchell, J. H. Holland, and S. Forrest. When will a genetic algorithm outperform hill climbing? In G. Tesuaro, J. D. Cowan, and J. Alspector, editors,Advances in Neural Information Processing Systems 6. Morgan Kaufmann, San Mateo, CA, 1994.
X. Qi and F. Palmieri. Theoretical analysis of evolutionary algorithms with an infinite population size in continuous space, part 1: Basic properties of selection and mutation.IEEE Transactions on Neural Networks, 5(1):102–119, 1994.
J. P. Ros. Learning boolean functions with genetic algorithms: A PAC analysis. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 257–275. Morgan Kaufmann, San Mateo, CA, 1993.
G. Rudolph. Convergence analysis of canonical genetic algorithms.IEEE Transactions on Neural Networks, 5(1):96–101, 1994.
J. A. Sanders and F. Verhulst.Averaging Methods in Nonlinear Dynamical Systems. Applied Mathematical Sciences, volume 59. Springer-Verlag, New York, 1985.
R. E. Smith, S. Forrest, and A. S. Perelson. Searching for diverse, cooperative populations with genetic algorithms.Evolutionary Computation, 1(2):127–149, 1993.
F. J. Solis and R. J.-B. Wets. Minimization by random search techniques.Mathematics of Operations Research, 6(1):19–30, 1981.
V. Solo and X. Kong.Adaptive Signal Processing Algorithms: Stability and Performance. Prentice-Hall, Englewood Cliffs, NJ, 1995.
E. D. Sontag. Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets. InProceedings of ICNN95, pages 2949–2954, 1995.
E. D. Sontag and H. J. Sussmann. Back propagation can give rise to spurious local minima even for network without hidden layers.Complex Systems, 3:91–106, 1989.
M. D. Vose. Modelling simple genetic algorithms. In L. D. Whitely, editor,Foundations of Genetic Algorithms 2, pages 63–73. Morgan Kaufmann, San Mateo, CA, 1993.
B. Widrow and S. D. Stearns.Adaptive Signal Processing. Prentice-Hall, Englewood Cliffs, NJ, 1985.
R. C. Williamson and U. Helmke. Existence and uniqueness results for neural network approximations.IEEE Transactions on Neural Networks, 6(1):2–13, 1995.
S. Yakowitz. A globally convergent stochastic approximation.SIAM Journal on Control and Optimization, 31:30–40, 1993.
T. Yoshizawa.Stability Theory and the Existence of Periodic Solutions and Almost Periodic Solutions. Applied Mathematical Sciences, volume 14. Springer-Verlag, New York, 1975.
A. A. Zhigljavsky.Theory of Global Random Search. Kluwer, Dordrecht, 1991.
Author information
Authors and Affiliations
Additional information
This work was supported by the Australian Research Council.
Rights and permissions
About this article
Cite this article
Blackmore, K.L., Williamson, R.C., Mareels, I.M.Y. et al. Online learning via congregational gradient descent. Math. Control Signal Systems 10, 331–363 (1997). https://doi.org/10.1007/BF01211551
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01211551