Contextual Priming for Object Detection

Torralba, Antonio

doi:10.1023/A:1023052124951

Contextual Priming for Object Detection

Published: July 2003

Volume 53, pages 169–191, (2003)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Antonio Torralba¹

2222 Accesses
556 Citations
6 Altmetric
Explore all metrics

Abstract

There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The role of scene summary statistics in object recognition

Article Open access 02 October 2018

Prior object-knowledge sharpens properties of early visual feature-detectors

Article Open access 18 July 2018

Using eye-tracking to parse object recognition: Priming activates primarily a parts-based but also a late-emerging features-based representation

Article 11 May 2020

References

Biederman, I., Mezzanotte, R.J., and Rabinowitz, J.C. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14:143–177.
Google Scholar
Biederman, I. 1987. Recognition-by-components:Atheory of human image interpretation. Psychological Review, 94:115–148.
Google Scholar
Bobick, A. and Pinhanez, C. 1995. Using approximate models as source of contextual information for vision processing. In Proc. of the ICCV'95 Workshop on Context-Based Vision, Cambridge, MA, pp. 13–21.
Burl, M.C., Weber, M., and Perona, P. 1998. A probabilistic approach to object. Recognition using local photometry and global geometry. In Proc. 5th European Conf. Comp. Vision, pp. 628– 641.
Campbell, N.W., Mackeown, W.P.J., Thomas, B.T., and Troscianko, T. 1997. Interpreting image databases by region classification. Pattern Recognition, Special Edition on Image Databases, 30(4):555–563.
Google Scholar
Carson, C., Belongie, S., Greenspan, H., and Malik, J. 1997. Regionbased image querying. In Proc. IEEEW. on Content-Based Access of Image and Video Libraries, pp. 42–49.
Clarkson, B. and Pentland, A. 2000. Framing through peripheral vision. In Proc. IEEE International Conference on Image Processing, Vancouver, BC. Sept. 10–13.
Chernyak, D.A. and Stark, L.W. 2001. Top-down Guided Eye Movements. Transactions on Systems, Man and Cybernetics B, 31(4):514–522.
Google Scholar
Chun, M.M. and Jiang, Y. 1998. Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36:28–71.
Google Scholar
De Bonet, J.S. and Viola, P. 1997. Structure driven image database retrieval. Advances in Neural Information Processing Systems, 10, MIT Press.
De Graef, P., Christiaens, D., and d'Ydewalle, G. 1990. Perceptual effects of scene context on object identification. Psychological Research, 52:317–329.
Google Scholar
Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximumlikelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Ser. B., 39:1–38.
Google Scholar
Dror, R., Adelson, T., and Willsky, A. 2001. Surface reflectance Estimation and natural illumination statistics. Proc. of IEEEWorkshop on Statistical and Computational Theories of Vision, Vancouver, CA.
Farid, H. 2001. Blind inverse gamma correction. IEEE Transactions on Image Processing, 10(10):1428–1433.
Google Scholar
Field, D.J. 1987. Relations between the statistics of natural images and the response properties of cortical cells. Journal of Optical Society of America, 4:2379–2394.
Google Scholar
Fu, D.D., Hammond, K.J., and Swain, M.J. 1994. Vision and navigation in man-made environments: Looking for syrup in all the right places. In Proceedings of CVPR Workshop on Visual Behaviors, IEEE Press, Seattle, Washington, pp. 20–26.
Google Scholar
Gershnfeld, N. 1999. The Nature of Mathematical Modeling. Cambridge University Press.
Gorkani, M.M. and Picard, R.W. 1994. Texture orientation for sorting photos "at a glance". InProc. Int. Conf. Pat. Rec., Jerusalem, vol. I, pp. 459–464.
Google Scholar
Hanson, A.R. and Riseman E.M. 1978. VISIONS: A computer system for interpreting scenes. In Computer Vision Systems, Academic Press: New York, pp. 303–333.
Google Scholar
Haralick, R.M. 1983. Decision making in context. IEEE Trans. Pattern Analysis and Machine Intelligence, 5:417–428.
Google Scholar
Henderson, J.M. and Hollingworth, A. 1999. High level scene perception. Annual Review of Psychology, 50:243–271.
Google Scholar
Hubel, D.H. and Wiesel, T.N. 1968. Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195:215–243.
Google Scholar
Itti, L., Koch, C., and Niebur, E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Vision, 20(11):1254–1259.
Google Scholar
Jordan, M.I. and Jacobs, R.A. 1994. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214.
Google Scholar
Koch, C. and Ullman, S. 1985. Shifts in visual attention: Towards the underlying circuitry. Human Neurobiology, 4:219–227.
Google Scholar
Jepson, A., Richards, W., and Knill, D. 1996. Modal structures and reliable inference. In Perception as Bayesian Inference, D. Knill and W. Richards (Eds.). Cambridge University Press, pp. 63–92.
Lindeberg, T. 1993. Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention. International Journal of Computer Vision, 11(3):283–318.
Google Scholar
Lipson, P., Grimson, E., and Sinha, P. 1997. Configuration based scene classification and image indexing. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 1007–1013.
Moghaddam, B. and Pentland, A. 1997. Probabilistic visual learning for object representation. IEEE Trans. Pattern Analysis and Machine Vision, 19(7):696–710.
Google Scholar
Moore, D.J., Essa, I.A., and Hayes, M.H. 1999. Exploiting human actions and object context for recognition tasks. In Proc. IEEE International Conference on Image Processing, Corfu, Greece, vol. 1, pp. 80–86.
Google Scholar
Noton, D. and Stark, L. 1971. Scanpaths in eye movements during pattern perception. Science, 171:308–311.
Google Scholar
Oliva, A. and Schyns, P.G. 1997. Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology, 34:72–107.
Google Scholar
Oliva, A. and Schyns, P.G. 2000. Diagnostic color blobs mediate scene recognition. Cognitive Psychology, 41:176–210.
Google Scholar
Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175.
Google Scholar
Palmer, S.E. 1975. The effects of contextual scenes on the identifi-cation of objects. Memory and Cognition, 3:519–526.
Google Scholar
Papageorgiou, C. and Poggio, T. 2000. A trainable system for object detection. International Journal of Computer Vision, 38(1):15– 33.
Google Scholar
Potter, M.C. 1975. Meaning in visual search. Science, 187:965– 966.
Google Scholar
Rao, R.P.N., Zelinsky, G.J., Hayhoe, M.M., and Ballard, D.H. 1996. Modeling saccadic targeting in visual search. Advances in Neural Information Processing Systems. MIT Press.
Rensink, R.A., O'Regan, J.K., and Clark, J.J. 1997. To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8:368–373.
Google Scholar
Ripley, B.D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press.
Rowley, H.A., Baluja, S., and Kanade, T. 1998. Neural networkbased face detetcion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–38.
Google Scholar
Schiele, B. and Crowley, J.L. 2000. Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision, 36(1):31–50.
Google Scholar
Schyns, P.G. and Oliva, A. 1994. From blobs to boundary edges: Evidence for time and spatial scale dependent scene recognition. Psychological Science, 5:195–200.
Google Scholar
Sirovich, L. and Kirby, M. 1987. Low-dimensional procedure for the characterization of human faces. Journal of Optical Society of America, 4:519–524.
Google Scholar
Song, X., Sill, J., Abu-Mostafa, Y., and Kasdan, H. 2000. Image recognition in context: Application to microscopic urinalysis. Advances in Neural Information Processing Systems, MIT Press: Cambridge, MA, pp. 963–969.
Google Scholar
Strat, T.M. and Fischler, M.A. 1991. Context-based vision: Recognizing objects using information from both 2-D and 3-D imagery. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10):1050–1065.
Google Scholar
Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classifi-cation. In IEEE Intl.Workshop on Content-Based Access of Image and Video Databases.
Torralba, A. and Oliva, A. 1999. Scene organization using discriminant structural templates. IEEE Proc. of Int. Conf. in Comp. Vision, 1253–1258.
Torralba, A. and Sinha, P. 2001. Statistical context priming for object detection. IEEE Proc. of Int. Conf. in Comp. Vision, 1:763–770.
Google Scholar
Torralba, A. 2002. Contextual modulation of target saliency. In Advances in Neural Information Processing Systems, T.G. Dietterich, S. Becker, and Z. Ghahramani (Eds.). MIT Press: Cambridge, MA, 14:1303–1310.
Google Scholar
Torralba, A. and Oliva, A. 2002. Depth estimation from image structure. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(9):1226–1238.
Google Scholar
Treisman, A. and Gelade, G. 1980. A feature integration theory of attention. Cognitive Psychology, 12:97–136.
Google Scholar
Tsotsos, J.K., Culhane, S.M., Wai, W.Y.K., Lai, Y.H., Davis, N., and Nuflo, F. 1995. Modeling visual-attention via selective tuning. Artificial Intelligence, 78(1/2):507–545.
Google Scholar
Vailaya, A., Jain, A., and Zhang, H.J. 1998. On image classification: City images vs. landscapes. Pattern Recognition, 31:1921–1935.
Google Scholar
Weiss, Y. 2001. Deriving intrinsic images from image sequences. IEEE Proc. of Int. Conf. in Comp. Vision, 2:68–75.
Google Scholar
Wolfe, J.M. 1994. Guided search 2.0. A revised model of visual search. Psychonomic Bulletin and Review, 1:202–228.
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Antonio Torralba

Authors

Antonio Torralba
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torralba, A. Contextual Priming for Object Detection. International Journal of Computer Vision 53, 169–191 (2003). https://doi.org/10.1023/A:1023052124951

Download citation

Issue Date: July 2003
DOI: https://doi.org/10.1023/A:1023052124951

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contextual Priming for Object Detection

Abstract

Access this article

Similar content being viewed by others

The role of scene summary statistics in object recognition

Prior object-knowledge sharpens properties of early visual feature-detectors

Using eye-tracking to parse object recognition: Priming activates primarily a parts-based but also a late-emerging features-based representation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Contextual Priming for Object Detection

Abstract

Access this article

Similar content being viewed by others

The role of scene summary statistics in object recognition

Prior object-knowledge sharpens properties of early visual feature-detectors

Using eye-tracking to parse object recognition: Priming activates primarily a parts-based but also a late-emerging features-based representation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation