Skip to main content
Log in

A Trainable System for Object Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper presents a general, trainable system for object detection in unconstrained, cluttered scenes. The system derives much of its power from a representation that describes an object class in terms of an overcomplete dictionary of local, oriented, multiscale intensity differences between adjacent regions, efficiently computable as a Haar wavelet transform. This example-based learning approach implicitly derives a model of an object class by training a support vector machine classifier using a large set of positive and negative examples. We present results on face, people, and car detection tasks using the same architecture. In addition, we quantify how the representation affects detection performance by considering several alternate representations including pixels and principal components. We also describe a real-time application of our person detection system as part of a driver assistance system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Betke, M., Haritaoglu, E., and Davis, L. 1997. Highway scene analysis in hard real-time. In Proceedings of Intelligent Transportation Systems.

  • Betke, M. and Nguyen, H. 1998. Highway scene analysis form a moving vehicle under reduced visibility conditions. In Proceedings of Intelligent Vehicles, pp. 131–136.

  • Beymer, D., McLauchlan, P., Coifman, B., and Malik, J. 1997. A real-time computer vision system for measuring traffic parameters. In Proceedings of Computer Vision and Pattern Recognition, pp. 495–501.

  • Bregler, C. and Malik, J. 1996. Learning appearance based models: Mixtures of second moment experts. In Advances in Neural Information Processing Systems.

  • Burges, C. 1996. Simplified support vector decision rules. In Proceedings of 13th International Conference on Machine Learning.

  • Burges, C. 1998. A tutorial on support vector machines for pattern recognition. In Proceedings of Data Mining and Knowledge Discovery, U. Fayyad (Ed.), pp. 1–43.

  • Forsyth, D. and Fleck, M. 1997. Body plans. In Proceedings of Computer Vision and Pattern Recognition, pp. 678–683.

  • Forsyth, D. and Fleck, M. 1999. Automatic detection of human nudes, International Journal of Computer Vision, 32(1):63–77.

    Google Scholar 

  • Franke, U., Gavrila, D., Goerzig, S., Lindner, F., Paetzold, F., and Woehler, C. 1998. Autonomous driving goes downtown. IEEE Intelligent Systems, pp. 32–40.

  • Haritaoglu, I., Harwood, D., and Davis, L. 1998. W4: Who? When? Where? What? A real time system for detecting and tracking people. In Face and Gesture Recognition, pp. 222–227.

  • Heisele, B. and Wohler, C. 1998. Motion-based recognition of pedestrians. In Proceedings of International Conference on Pattern Recognition, pp. 1325–1330.

  • Hogg, D. 1983. Model-based vision: A program to see a walking person. Image and Vision Computing, 1(1):5–20.

    Google Scholar 

  • Itti, L. and Koch, C. 1999. A comparison of feature combination strategies for saliency-based visual attention systems. In Human Vision and Electronic Imaging, vol. 3644, pp. 473–482.

    Google Scholar 

  • Itti, L., Koch, C., and Niebur, E. 1998. A model of saliencybased visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–1259.

    Google Scholar 

  • Joachims, T. 1997. Text categorization with support vector machines. Technical Report LS-8 Report 23, University of Dortmund.

  • Lipson, P. 1996. Context and configuration based scene classification. Ph.D. thesis, Massachusetts Institute of Technology.

  • Lipson, P., Grimson, W., and Sinha, P. 1997. Configuration based scene classification and image indexing. In Proceedings of Computer Vision and Pattern Recognition, pp. 1007–1013.

  • Mallat, S. 1989. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674–693.

    Google Scholar 

  • McKenna, S. and Gong, S. 1997. Non-intrusive person authentication for access control by visual tracking and face recognition. In Audio-and Video-based Biometric Person Authentication, J. Bigun, G. Chollet, and G. Borgefors (Eds.), pp. 177–183.

  • Moghaddam, B. and Pentland, A. 1995. Probabilistic visual learning for object detection. In Proceedings of 6th International Conference on Computer Vision.

  • Mohan, A. 1999. Robust object detection in images by components. Master's Thesis, Massachusetts Institute of Technology.

  • Osuna, E., Freund, R., and Girosi, F. 1997a. Support vector machines: Training and applications. A.I. Memo 1602, MIT Artificial Intelligence Laboratory.

  • Osuna, E., Freund, R., and Girosi, F. 1997b. Training support vector machines: An application to face detection. In Proceedings of Computer Vision and Pattern Recognition, pp. 130–136.

  • Rohr, K. 1993. Incremental recognition of pedestrians from image sequences. In Proceedings of Computer Vision and Pattern Recognition, pp. 8–13.

  • Rowley, H., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–38.

    Google Scholar 

  • Shio, A. and Sklansky, J. 1991. Segmentation of people in motion. In IEEE Workshop on Visual Motion, pp. 325–332.

  • Sinha, P. 1994. Qualitative image-based representations for object recognition. A.I. Memo 1505, MIT Artificial Intelligence Laboratory.

  • Stollnitz, E., DeRose,T., and Salesin, D. 1994. Wavelets for computer graphics: A primer. Technical Report 94-09-11, Department of Computer Science and Engineering, University of Washington.

  • Sung, K.-K. 1995. Learning and example selection for object and pattern detection. Ph.D. Thesis, MIT Artificial Intelligence Laboratory.

  • Sung, K.-K. and Poggio, T. 1994. Example-based learning for viewbased human face detection. A.I. Memo 1521, MIT Artificial Intelligence Laboratory.

  • Vaillant, R., Monrocq, C., and Cun, Y.L. 1994. Original approach for the localisation of objects in images. IEE Proceedings Vision Image Signal Processing, 141(4):245–250.

    Google Scholar 

  • Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer Verlag.

  • Vapnik, V. 1998. Statistical Learning Theory. John Wiley and Sons: New York.

    Google Scholar 

  • Wren, C., Azarbayejani, A., Darrell, T., and Pentland, A. 1995. Pfinder: Real-time tracking of the human body. Technical Report 353, MIT Media Laboratory.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Papageorgiou, C., Poggio, T. A Trainable System for Object Detection. International Journal of Computer Vision 38, 15–33 (2000). https://doi.org/10.1023/A:1008162616689

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008162616689

Navigation