Abstract
Finding an appropriate set of features is an essential problem in the design of shape recognition systems. This paper attempts to show that for recognizing simple objects with high shape variability such as handwritten characters, it is possible, and even advantageous, to feed the system directly with minimally processed images and to rely on learning to extract the right set of features. Convolutional Neural Networks are shown to be particularly well suited to this task. We also show that these networks can be used to recognize multiple objects without requiring explicit segmentation of the objects from their surrounding. The second part of the paper presents the Graph Transformer Network model which extends the applicability of gradient-based learning to systems that use graphs to represents features, objects, and their combinations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bengio, Y., LeCun, Y., Nohl, C., and Burges, C. (1995). LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition. Neural Computation, 7(5).
Bottou, L. and Gallinari, P. (1991). A Framework for the Cooperation of Learning Algorithms. In Touretzky, D. and Lippmann, R., editors, Advances in Neural Information Processing Systems, volume 3, Denver. Morgan Kaufmann.
Bottou, L., LeCun, Y., and Bengio, Y. (1997). Global Training of Document Processing Systems using Graph Transformer Networks. In Proc. of Computer Vision and Pattern Recognition, Puerto-Rico. IEEE.
Burges, C. J. C. and Schoelkopf, B. (1997). Improving the accuracy and speed of support vector machines. In M. Mozer, M. J. and Petsche, T., editors, Advances in Neural Information Processing Systems 9. The MIT Press, Cambridge.
Driancourt, X. and Bottou, L. (1991). MLP, LVQ and DP: Comparison & Cooperation. In Proceedings of the International Joint Conference on Neural Networks, Seattle.
Drucker, H., Schapire, R., and Simard, P. (1993). Improving performance in neural networks using a boosting algorithm. In Hanson, S. J., Cowan, J. D., and Giles, C. L., editors, Advances in Neural Information Processing Systems 5, pages 42–49, San Mateo, CA. Morgan Kaufmann.
Fukushima, K. (1975). Cognitron: A Self-Organizing Multilayered Neural Network. Biological Cybernetics, 20:121–136.
Fukushima, K. and Miyake, S. (1982). Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15:455–469.
Hubel, D. H. and Wiesel, T. N. (1962). Receptive Fields, Binocular Interaction, and Functional Architecture in the Cat’s Visual Cortex. Journal of Physiology (London), 160:106–154.
Keeler, J., Rumelhart, D., and Leow, W. K. (1991). Integrated segmentation and recognition of hand-printed numerals. In Lippmann, R. P., Moody, J. M., and Touretzky, D. S., editors, Neural Information Processing Systems, volume 3, pages 557–563. Morgan Kaufmann Publishers, San Mateo, CA.
Lades, M., Vorbrüggen, J. C., Buhmann, J., and von der Malsburg, C. (1993). Distortion Invariant Object Recognition in the Dynamic Link Architecture. IEEE Trans. Comp., 42(3):300–311.
Lawrence, S., Giles, C. L., Tsoi, A. C., and Back, A. D. (1997). Face Recognition: A Convolutional Neural Network Approach. IEEE Transactions on Neural Networks, 8(1):98–113.
LeCun, Y. (1986). Learning Processes in an Asymmetric Threshold Network. In Bienenstock, E., Fogelman-Soulié, F., and Weisbuch, G., editors, Disordered systems and biological organization, pages 233–240, Les Houches, France. Springer-Verlag.
LeCun, Y. (1987). Modeles connexionnistes de l’apprentissage (connectionist learning models). PhD thesis, Université P. et M. Curie (Paris 6).
LeCun, Y. (1988). A theoretical framework for Back-Propagation. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 21–28, CMU, Pittsburgh, Pa. Morgan Kaufmann.
LeCun, Y. (1989). Generalization and Network Design Strategies. In Pfeifer, R., Schreter, Z., Fogelman, F., and Steels, L., editors, Connectionism in Perspective, Zurich, Switzerland. Elsevier.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation, 1(4):541–551.
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation net work. In Touretzky, D., editor, Advances in Neural Information Processing Systems 2 (NIPS*89), Denver, CO. Morgan Kaufman.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, (86)11:2278–2324.
LeCun, Y., Kanter, I., and Solla, S. (1991). Eigenvalues of covariance matrices: application to neural-network learning. Physical Review Letters, 66(18):2396–2399.
Martin, G. L. (1993). Centered-object integrated segmentation and recognition of overlapping hand-printed characters. Neural Computation, 5:419–429.
Matan, O., Burges, C. J. C., LeCun, Y., and Denker, J. S. (1992). Multi-Digit Recognition Using a Space Displacement Neural Network. In Moody, J. M., Hanson, S. J., and Lippman, R. P., editors, Neural Information Processing Systems, volume 4. Morgan Kaufmann Publishers, San Mateo, CA.
Mozer, M. C. (1991). The perception of multiple objects: A connectionist approach. MIT Press-Bradford Books, Cambridge, MA.
Nowlan, S. and Platt, J. (1995). A Convolutional Neural Network Hand Tracker. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems 7, pages 901–908, San Mateo, CA. Morgan Kaufmann.
Osuna, E., Freund, R., and Girosi, F. (1997). Training Support Vector Machines: an Application to Face Detection. In Proceedings of CVPR’96, pages 130–136. IEEE Computer Society Press.
Rabiner, L. R. (1989). A Tutorial On Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2):257–286.
Rowley, H. A., Baluja, S., and Kanade, T. (1996). Neural Network-Based Face Detection. In Proceedings of CVPR’96, pages 203–208. IEEE Computer Society Press.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal representations by error propagation. In Parallel distributed processing: Explorations in the microstructure of cognition, volume I, pages 318–362. Bradford Books, Cambridge, MA.
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2):197–227.
Vaillant, R., Monrocq, C., and LeCun, Y. (1994). Original approach for the localisation of objects in images. IEE Proc on Vision, Image, and Signal Processing, 141(4):245–250.
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New-York.
Wang, J. and Jean, J. (1993). Multi-resolution neural networks for omnifont character recognition. In Proceedings of International Conference on Neural Networks, volume III, pages 1588–1593.
Wolf, R. and Platt, J. (1994). Postal address block location using a convolutional locator network. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 6, pages 745–752.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
LeCun, Y., Haffner, P., Bottou, L., Bengio, Y. (1999). Object Recognition with Gradient-Based Learning. In: Shape, Contour and Grouping in Computer Vision. Lecture Notes in Computer Science, vol 1681. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46805-6_19
Download citation
DOI: https://doi.org/10.1007/3-540-46805-6_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66722-3
Online ISBN: 978-3-540-46805-9
eBook Packages: Springer Book Archive