Skip to main content
Log in

Contextual Priming for Object Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Biederman, I., Mezzanotte, R.J., and Rabinowitz, J.C. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14:143–177.

    Google Scholar 

  • Biederman, I. 1987. Recognition-by-components:Atheory of human image interpretation. Psychological Review, 94:115–148.

    Google Scholar 

  • Bobick, A. and Pinhanez, C. 1995. Using approximate models as source of contextual information for vision processing. In Proc. of the ICCV'95 Workshop on Context-Based Vision, Cambridge, MA, pp. 13–21.

  • Burl, M.C., Weber, M., and Perona, P. 1998. A probabilistic approach to object. Recognition using local photometry and global geometry. In Proc. 5th European Conf. Comp. Vision, pp. 628– 641.

  • Campbell, N.W., Mackeown, W.P.J., Thomas, B.T., and Troscianko, T. 1997. Interpreting image databases by region classification. Pattern Recognition, Special Edition on Image Databases, 30(4):555–563.

    Google Scholar 

  • Carson, C., Belongie, S., Greenspan, H., and Malik, J. 1997. Regionbased image querying. In Proc. IEEEW. on Content-Based Access of Image and Video Libraries, pp. 42–49.

  • Clarkson, B. and Pentland, A. 2000. Framing through peripheral vision. In Proc. IEEE International Conference on Image Processing, Vancouver, BC. Sept. 10–13.

  • Chernyak, D.A. and Stark, L.W. 2001. Top-down Guided Eye Movements. Transactions on Systems, Man and Cybernetics B, 31(4):514–522.

    Google Scholar 

  • Chun, M.M. and Jiang, Y. 1998. Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36:28–71.

    Google Scholar 

  • De Bonet, J.S. and Viola, P. 1997. Structure driven image database retrieval. Advances in Neural Information Processing Systems, 10, MIT Press.

  • De Graef, P., Christiaens, D., and d'Ydewalle, G. 1990. Perceptual effects of scene context on object identification. Psychological Research, 52:317–329.

    Google Scholar 

  • Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximumlikelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Ser. B., 39:1–38.

    Google Scholar 

  • Dror, R., Adelson, T., and Willsky, A. 2001. Surface reflectance Estimation and natural illumination statistics. Proc. of IEEEWorkshop on Statistical and Computational Theories of Vision, Vancouver, CA.

  • Farid, H. 2001. Blind inverse gamma correction. IEEE Transactions on Image Processing, 10(10):1428–1433.

    Google Scholar 

  • Field, D.J. 1987. Relations between the statistics of natural images and the response properties of cortical cells. Journal of Optical Society of America, 4:2379–2394.

    Google Scholar 

  • Fu, D.D., Hammond, K.J., and Swain, M.J. 1994. Vision and navigation in man-made environments: Looking for syrup in all the right places. In Proceedings of CVPR Workshop on Visual Behaviors, IEEE Press, Seattle, Washington, pp. 20–26.

    Google Scholar 

  • Gershnfeld, N. 1999. The Nature of Mathematical Modeling. Cambridge University Press.

  • Gorkani, M.M. and Picard, R.W. 1994. Texture orientation for sorting photos "at a glance". InProc. Int. Conf. Pat. Rec., Jerusalem, vol. I, pp. 459–464.

    Google Scholar 

  • Hanson, A.R. and Riseman E.M. 1978. VISIONS: A computer system for interpreting scenes. In Computer Vision Systems, Academic Press: New York, pp. 303–333.

    Google Scholar 

  • Haralick, R.M. 1983. Decision making in context. IEEE Trans. Pattern Analysis and Machine Intelligence, 5:417–428.

    Google Scholar 

  • Henderson, J.M. and Hollingworth, A. 1999. High level scene perception. Annual Review of Psychology, 50:243–271.

    Google Scholar 

  • Hubel, D.H. and Wiesel, T.N. 1968. Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195:215–243.

    Google Scholar 

  • Itti, L., Koch, C., and Niebur, E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Vision, 20(11):1254–1259.

    Google Scholar 

  • Jordan, M.I. and Jacobs, R.A. 1994. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214.

    Google Scholar 

  • Koch, C. and Ullman, S. 1985. Shifts in visual attention: Towards the underlying circuitry. Human Neurobiology, 4:219–227.

    Google Scholar 

  • Jepson, A., Richards, W., and Knill, D. 1996. Modal structures and reliable inference. In Perception as Bayesian Inference, D. Knill and W. Richards (Eds.). Cambridge University Press, pp. 63–92.

  • Lindeberg, T. 1993. Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention. International Journal of Computer Vision, 11(3):283–318.

    Google Scholar 

  • Lipson, P., Grimson, E., and Sinha, P. 1997. Configuration based scene classification and image indexing. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 1007–1013.

  • Moghaddam, B. and Pentland, A. 1997. Probabilistic visual learning for object representation. IEEE Trans. Pattern Analysis and Machine Vision, 19(7):696–710.

    Google Scholar 

  • Moore, D.J., Essa, I.A., and Hayes, M.H. 1999. Exploiting human actions and object context for recognition tasks. In Proc. IEEE International Conference on Image Processing, Corfu, Greece, vol. 1, pp. 80–86.

    Google Scholar 

  • Noton, D. and Stark, L. 1971. Scanpaths in eye movements during pattern perception. Science, 171:308–311.

    Google Scholar 

  • Oliva, A. and Schyns, P.G. 1997. Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology, 34:72–107.

    Google Scholar 

  • Oliva, A. and Schyns, P.G. 2000. Diagnostic color blobs mediate scene recognition. Cognitive Psychology, 41:176–210.

    Google Scholar 

  • Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175.

    Google Scholar 

  • Palmer, S.E. 1975. The effects of contextual scenes on the identifi-cation of objects. Memory and Cognition, 3:519–526.

    Google Scholar 

  • Papageorgiou, C. and Poggio, T. 2000. A trainable system for object detection. International Journal of Computer Vision, 38(1):15– 33.

    Google Scholar 

  • Potter, M.C. 1975. Meaning in visual search. Science, 187:965– 966.

    Google Scholar 

  • Rao, R.P.N., Zelinsky, G.J., Hayhoe, M.M., and Ballard, D.H. 1996. Modeling saccadic targeting in visual search. Advances in Neural Information Processing Systems. MIT Press.

  • Rensink, R.A., O'Regan, J.K., and Clark, J.J. 1997. To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8:368–373.

    Google Scholar 

  • Ripley, B.D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press.

  • Rowley, H.A., Baluja, S., and Kanade, T. 1998. Neural networkbased face detetcion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–38.

    Google Scholar 

  • Schiele, B. and Crowley, J.L. 2000. Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision, 36(1):31–50.

    Google Scholar 

  • Schyns, P.G. and Oliva, A. 1994. From blobs to boundary edges: Evidence for time and spatial scale dependent scene recognition. Psychological Science, 5:195–200.

    Google Scholar 

  • Sirovich, L. and Kirby, M. 1987. Low-dimensional procedure for the characterization of human faces. Journal of Optical Society of America, 4:519–524.

    Google Scholar 

  • Song, X., Sill, J., Abu-Mostafa, Y., and Kasdan, H. 2000. Image recognition in context: Application to microscopic urinalysis. Advances in Neural Information Processing Systems, MIT Press: Cambridge, MA, pp. 963–969.

    Google Scholar 

  • Strat, T.M. and Fischler, M.A. 1991. Context-based vision: Recognizing objects using information from both 2-D and 3-D imagery. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10):1050–1065.

    Google Scholar 

  • Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classifi-cation. In IEEE Intl.Workshop on Content-Based Access of Image and Video Databases.

  • Torralba, A. and Oliva, A. 1999. Scene organization using discriminant structural templates. IEEE Proc. of Int. Conf. in Comp. Vision, 1253–1258.

  • Torralba, A. and Sinha, P. 2001. Statistical context priming for object detection. IEEE Proc. of Int. Conf. in Comp. Vision, 1:763–770.

    Google Scholar 

  • Torralba, A. 2002. Contextual modulation of target saliency. In Advances in Neural Information Processing Systems, T.G. Dietterich, S. Becker, and Z. Ghahramani (Eds.). MIT Press: Cambridge, MA, 14:1303–1310.

    Google Scholar 

  • Torralba, A. and Oliva, A. 2002. Depth estimation from image structure. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(9):1226–1238.

    Google Scholar 

  • Treisman, A. and Gelade, G. 1980. A feature integration theory of attention. Cognitive Psychology, 12:97–136.

    Google Scholar 

  • Tsotsos, J.K., Culhane, S.M., Wai, W.Y.K., Lai, Y.H., Davis, N., and Nuflo, F. 1995. Modeling visual-attention via selective tuning. Artificial Intelligence, 78(1/2):507–545.

    Google Scholar 

  • Vailaya, A., Jain, A., and Zhang, H.J. 1998. On image classification: City images vs. landscapes. Pattern Recognition, 31:1921–1935.

    Google Scholar 

  • Weiss, Y. 2001. Deriving intrinsic images from image sequences. IEEE Proc. of Int. Conf. in Comp. Vision, 2:68–75.

    Google Scholar 

  • Wolfe, J.M. 1994. Guided search 2.0. A revised model of visual search. Psychonomic Bulletin and Review, 1:202–228.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torralba, A. Contextual Priming for Object Detection. International Journal of Computer Vision 53, 169–191 (2003). https://doi.org/10.1023/A:1023052124951

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023052124951

Navigation