Abstract
This paper presents a general, trainable system for object detection in unconstrained, cluttered scenes. The system derives much of its power from a representation that describes an object class in terms of an overcomplete dictionary of local, oriented, multiscale intensity differences between adjacent regions, efficiently computable as a Haar wavelet transform. This example-based learning approach implicitly derives a model of an object class by training a support vector machine classifier using a large set of positive and negative examples. We present results on face, people, and car detection tasks using the same architecture. In addition, we quantify how the representation affects detection performance by considering several alternate representations including pixels and principal components. We also describe a real-time application of our person detection system as part of a driver assistance system.
Similar content being viewed by others
References
Betke, M., Haritaoglu, E., and Davis, L. 1997. Highway scene analysis in hard real-time. In Proceedings of Intelligent Transportation Systems.
Betke, M. and Nguyen, H. 1998. Highway scene analysis form a moving vehicle under reduced visibility conditions. In Proceedings of Intelligent Vehicles, pp. 131–136.
Beymer, D., McLauchlan, P., Coifman, B., and Malik, J. 1997. A real-time computer vision system for measuring traffic parameters. In Proceedings of Computer Vision and Pattern Recognition, pp. 495–501.
Bregler, C. and Malik, J. 1996. Learning appearance based models: Mixtures of second moment experts. In Advances in Neural Information Processing Systems.
Burges, C. 1996. Simplified support vector decision rules. In Proceedings of 13th International Conference on Machine Learning.
Burges, C. 1998. A tutorial on support vector machines for pattern recognition. In Proceedings of Data Mining and Knowledge Discovery, U. Fayyad (Ed.), pp. 1–43.
Forsyth, D. and Fleck, M. 1997. Body plans. In Proceedings of Computer Vision and Pattern Recognition, pp. 678–683.
Forsyth, D. and Fleck, M. 1999. Automatic detection of human nudes, International Journal of Computer Vision, 32(1):63–77.
Franke, U., Gavrila, D., Goerzig, S., Lindner, F., Paetzold, F., and Woehler, C. 1998. Autonomous driving goes downtown. IEEE Intelligent Systems, pp. 32–40.
Haritaoglu, I., Harwood, D., and Davis, L. 1998. W4: Who? When? Where? What? A real time system for detecting and tracking people. In Face and Gesture Recognition, pp. 222–227.
Heisele, B. and Wohler, C. 1998. Motion-based recognition of pedestrians. In Proceedings of International Conference on Pattern Recognition, pp. 1325–1330.
Hogg, D. 1983. Model-based vision: A program to see a walking person. Image and Vision Computing, 1(1):5–20.
Itti, L. and Koch, C. 1999. A comparison of feature combination strategies for saliency-based visual attention systems. In Human Vision and Electronic Imaging, vol. 3644, pp. 473–482.
Itti, L., Koch, C., and Niebur, E. 1998. A model of saliencybased visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–1259.
Joachims, T. 1997. Text categorization with support vector machines. Technical Report LS-8 Report 23, University of Dortmund.
Lipson, P. 1996. Context and configuration based scene classification. Ph.D. thesis, Massachusetts Institute of Technology.
Lipson, P., Grimson, W., and Sinha, P. 1997. Configuration based scene classification and image indexing. In Proceedings of Computer Vision and Pattern Recognition, pp. 1007–1013.
Mallat, S. 1989. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7):674–693.
McKenna, S. and Gong, S. 1997. Non-intrusive person authentication for access control by visual tracking and face recognition. In Audio-and Video-based Biometric Person Authentication, J. Bigun, G. Chollet, and G. Borgefors (Eds.), pp. 177–183.
Moghaddam, B. and Pentland, A. 1995. Probabilistic visual learning for object detection. In Proceedings of 6th International Conference on Computer Vision.
Mohan, A. 1999. Robust object detection in images by components. Master's Thesis, Massachusetts Institute of Technology.
Osuna, E., Freund, R., and Girosi, F. 1997a. Support vector machines: Training and applications. A.I. Memo 1602, MIT Artificial Intelligence Laboratory.
Osuna, E., Freund, R., and Girosi, F. 1997b. Training support vector machines: An application to face detection. In Proceedings of Computer Vision and Pattern Recognition, pp. 130–136.
Rohr, K. 1993. Incremental recognition of pedestrians from image sequences. In Proceedings of Computer Vision and Pattern Recognition, pp. 8–13.
Rowley, H., Baluja, S., and Kanade, T. 1998. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–38.
Shio, A. and Sklansky, J. 1991. Segmentation of people in motion. In IEEE Workshop on Visual Motion, pp. 325–332.
Sinha, P. 1994. Qualitative image-based representations for object recognition. A.I. Memo 1505, MIT Artificial Intelligence Laboratory.
Stollnitz, E., DeRose,T., and Salesin, D. 1994. Wavelets for computer graphics: A primer. Technical Report 94-09-11, Department of Computer Science and Engineering, University of Washington.
Sung, K.-K. 1995. Learning and example selection for object and pattern detection. Ph.D. Thesis, MIT Artificial Intelligence Laboratory.
Sung, K.-K. and Poggio, T. 1994. Example-based learning for viewbased human face detection. A.I. Memo 1521, MIT Artificial Intelligence Laboratory.
Vaillant, R., Monrocq, C., and Cun, Y.L. 1994. Original approach for the localisation of objects in images. IEE Proceedings Vision Image Signal Processing, 141(4):245–250.
Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer Verlag.
Vapnik, V. 1998. Statistical Learning Theory. John Wiley and Sons: New York.
Wren, C., Azarbayejani, A., Darrell, T., and Pentland, A. 1995. Pfinder: Real-time tracking of the human body. Technical Report 353, MIT Media Laboratory.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Papageorgiou, C., Poggio, T. A Trainable System for Object Detection. International Journal of Computer Vision 38, 15–33 (2000). https://doi.org/10.1023/A:1008162616689
Issue Date:
DOI: https://doi.org/10.1023/A:1008162616689