Abstract
We review the application of statistical mechanics methods to the study of online learning of a drifting concept in the limit of large systems. The model where a feed-forward network learns from examples generated by a time dependent teacher of the same architecture is analyzed. The best possible generalization ability is determined exactly, through the use of a variational method. The constructive variational method also suggests a learning algorithm. It depends, however, on some unavailable quantities, such as the present performance of the student. The construction of estimators for these quantities permits the implementation of a very effective, highly adaptive algorithm. Several other algorithms are also studied for comparison with the optimal bound and the adaptive algorithm, for different types of time evolution of the rule.
Article PDF
Similar content being viewed by others
References
Amari, S. (1967). Theory of adaptive pattern classifiers. IEEE Transactions, EC-16, 299–307.
Anlauf, J.K. & Biehl, M. (1989). The AdaTron: an adaptive perceptron algorithm. Europhysics Letters, 10, 687–692.
Biehl, M. & Schwarze, H. (1992). Online learning of a time-dependent rule. Europhysics Letters, 20, 733–738.
Biehl, M. & Schwarze, H. (1993). Learning drifting concepts with neural networks. Journal of Physics A: Mathematical and General, 26, 2651–2665.
Biehl, M., Riegler, P. & Stechert, M. (1995). Learning from noisy data: an exactly solvable model. Physical Review E 52, R4624–R4627.
Copelli, M. (1997). Noise robustness in the perceptron. Proceedings of the ESANN'97, Belgium.
Copelli, M. & Caticha, N. (1995). Online learning in the committee machine. Journal of Physics A: Mathematical and General, 28, 1615–1625.
Copelli, M., Eichhorn, R., Kinouchi, O., Biehl, M., Simonetti, R., Riegler, P. & Caticha, N. (1996a) Noise robustness in multilayer neural networks Europhysics Letters, 37.
Copelli, M., Kinouchi, O. & Caticha, N. (1996b). Equivalence between learning in noisy perceptrons and tree committee machines. Physical Review E, 53, 6341–6352.
Frean, M. (1992). A 'thermal' perceptron learning rule. Neural Computation, 4, 946–957.
Haussler, D., Kearns, M., Seung, H. S. & Tishby, N. (1996). Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25, 195–236.
Heskes, T. (1994). The use of being stubborn and introspective, In J. Dean, H. Cruse & H. Ritter (Eds.) Proceedings of the ZiF Conference on Adaptive Behavior and Learning. University of Bielefeld, Bielefeld, Germany.
Hondou, T. (1996). Self-annealing dynamics in a multistable system. Progress in Theoretical Physics, 95, 817–822.
Kim, J.W. & Sompolinsky, H. (1996). Online Gibbs learning. Physical Review Letters, 76, 3021–3024.
Kinouchi, O. & Caticha, N. (1992a). Biased learning in boolean perceptrons. Physica A, 185, 411–416.
Kinouchi, O. & Caticha, N. (1992b). Optimal generalization in perceptrons. Journal of Physics A: Mathematical and General, 25, 6243–6250.
Kinouchi, O. & Caticha, N. (1993). Lower bounds on generalization errors for drifting rules. Journal of Physics A: Mathematical and General, 26, 6161–6171.
Kinouchi, O. & Caticha, N. (1995). Online versus offline learning in the linear perceptron: A comparative study. Physical Review E, 52, 2878–2886.
Kinzel, W. & Ruján, P. (1990). Improving a network generalization ability by selecting examples. Europhysics Letters, 13, 473–477.
Kuva, S., Kinouchi, O., & Caticha, N. (in press). Learning a spin glass: determining Hamiltonians from metastable states. Physica A.
Levine, D. S., Leven, S. J. & Prueit, P. S.(1992). Integration, disintegration, and the frontal lobes. In Levine, D.S. & Leven, S.J. (Eds.), Motivation, Emotion and Goal Direction in Neural Networks. Hillsdale, NJ: Erlbaum.
Mace, C.W.H. & Coolen, A.C.C. (1998). Statistical mechanical analysis of the dynamics of learning in perceptrons. Statistics and Computing, 8, 55–68.
Opper, M., Kinzel, W., Kleinz, J. & Nehl, R. (1990). On the ability of the optimal perceptron to generalize. Journal of Physics A: Mathematical and General, 23, L581–L586.
Opper, M. & Kinzel, W. (1996). Statistical mechanics of generalization. In van Hemmen, J.L., Domany, E. & Schulten, K. (Eds.), Physics of Neural Networks. Berlin: Springer.
Opper,M. (1996). Online versus offline learning from random examples: general results. Physical Review Letters, 77, 4671–4674.
Rattray, M. & Saad, D. (1997). Globally optimal online learning rules for multi-layer neural networks. Journal of Physics A: Mathematical and General, L771–776.
Seung, H.S., Sompolinsky, H. & Tishby, N. (1992). Statistical mechanics of learning from examples. Physical Review A, 45, 6056–6091.
Shallice, T. (1988). From Neuropsychology to Mental Structures. Cambridge: Cambridge University Press.
Simonetti, R. & Caticha, N. (1996). Online learning in parity machines. Journal of Physics A: Mathematical and General, 29, 4859–4867.
Valiant, L.G. (1984). A theory of the learnable. Communications of ACM, 27, 1134–1142.
Van den Broeck, C. & Reimann P. (1996). Unsupervised learning by examples: online versus offline. Physical Review Letters, 76, 2188–2191.
Vicente, R. & Caticha, N. (1997). Functional optimization of online algorithms in multilayer neural networks. Journal of Physics A: Mathematical and General, 30, L599–L605.
Watkin, T.L.H., Rau, A. & Biehl, M. (1993). The statistical mechanics of learning a rule. Reviews of Modern Physics, 65, 499–556.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Vicente, R., Kinouchi, O. & Caticha, N. Statistical Mechanics of Online Learning of Drifting Concepts: A Variational Approach. Machine Learning 32, 179–201 (1998). https://doi.org/10.1023/A:1007428731714
Issue Date:
DOI: https://doi.org/10.1023/A:1007428731714