Abstract
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.
Similar content being viewed by others
References
Biederman, I., Mezzanotte, R.J., and Rabinowitz, J.C. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14:143–177.
Biederman, I. 1987. Recognition-by-components:Atheory of human image interpretation. Psychological Review, 94:115–148.
Bobick, A. and Pinhanez, C. 1995. Using approximate models as source of contextual information for vision processing. In Proc. of the ICCV'95 Workshop on Context-Based Vision, Cambridge, MA, pp. 13–21.
Burl, M.C., Weber, M., and Perona, P. 1998. A probabilistic approach to object. Recognition using local photometry and global geometry. In Proc. 5th European Conf. Comp. Vision, pp. 628– 641.
Campbell, N.W., Mackeown, W.P.J., Thomas, B.T., and Troscianko, T. 1997. Interpreting image databases by region classification. Pattern Recognition, Special Edition on Image Databases, 30(4):555–563.
Carson, C., Belongie, S., Greenspan, H., and Malik, J. 1997. Regionbased image querying. In Proc. IEEEW. on Content-Based Access of Image and Video Libraries, pp. 42–49.
Clarkson, B. and Pentland, A. 2000. Framing through peripheral vision. In Proc. IEEE International Conference on Image Processing, Vancouver, BC. Sept. 10–13.
Chernyak, D.A. and Stark, L.W. 2001. Top-down Guided Eye Movements. Transactions on Systems, Man and Cybernetics B, 31(4):514–522.
Chun, M.M. and Jiang, Y. 1998. Contextual cueing: Implicit learning and memory of visual context guides spatial attention. Cognitive Psychology, 36:28–71.
De Bonet, J.S. and Viola, P. 1997. Structure driven image database retrieval. Advances in Neural Information Processing Systems, 10, MIT Press.
De Graef, P., Christiaens, D., and d'Ydewalle, G. 1990. Perceptual effects of scene context on object identification. Psychological Research, 52:317–329.
Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximumlikelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Ser. B., 39:1–38.
Dror, R., Adelson, T., and Willsky, A. 2001. Surface reflectance Estimation and natural illumination statistics. Proc. of IEEEWorkshop on Statistical and Computational Theories of Vision, Vancouver, CA.
Farid, H. 2001. Blind inverse gamma correction. IEEE Transactions on Image Processing, 10(10):1428–1433.
Field, D.J. 1987. Relations between the statistics of natural images and the response properties of cortical cells. Journal of Optical Society of America, 4:2379–2394.
Fu, D.D., Hammond, K.J., and Swain, M.J. 1994. Vision and navigation in man-made environments: Looking for syrup in all the right places. In Proceedings of CVPR Workshop on Visual Behaviors, IEEE Press, Seattle, Washington, pp. 20–26.
Gershnfeld, N. 1999. The Nature of Mathematical Modeling. Cambridge University Press.
Gorkani, M.M. and Picard, R.W. 1994. Texture orientation for sorting photos "at a glance". InProc. Int. Conf. Pat. Rec., Jerusalem, vol. I, pp. 459–464.
Hanson, A.R. and Riseman E.M. 1978. VISIONS: A computer system for interpreting scenes. In Computer Vision Systems, Academic Press: New York, pp. 303–333.
Haralick, R.M. 1983. Decision making in context. IEEE Trans. Pattern Analysis and Machine Intelligence, 5:417–428.
Henderson, J.M. and Hollingworth, A. 1999. High level scene perception. Annual Review of Psychology, 50:243–271.
Hubel, D.H. and Wiesel, T.N. 1968. Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195:215–243.
Itti, L., Koch, C., and Niebur, E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Vision, 20(11):1254–1259.
Jordan, M.I. and Jacobs, R.A. 1994. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214.
Koch, C. and Ullman, S. 1985. Shifts in visual attention: Towards the underlying circuitry. Human Neurobiology, 4:219–227.
Jepson, A., Richards, W., and Knill, D. 1996. Modal structures and reliable inference. In Perception as Bayesian Inference, D. Knill and W. Richards (Eds.). Cambridge University Press, pp. 63–92.
Lindeberg, T. 1993. Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention. International Journal of Computer Vision, 11(3):283–318.
Lipson, P., Grimson, E., and Sinha, P. 1997. Configuration based scene classification and image indexing. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 1007–1013.
Moghaddam, B. and Pentland, A. 1997. Probabilistic visual learning for object representation. IEEE Trans. Pattern Analysis and Machine Vision, 19(7):696–710.
Moore, D.J., Essa, I.A., and Hayes, M.H. 1999. Exploiting human actions and object context for recognition tasks. In Proc. IEEE International Conference on Image Processing, Corfu, Greece, vol. 1, pp. 80–86.
Noton, D. and Stark, L. 1971. Scanpaths in eye movements during pattern perception. Science, 171:308–311.
Oliva, A. and Schyns, P.G. 1997. Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive Psychology, 34:72–107.
Oliva, A. and Schyns, P.G. 2000. Diagnostic color blobs mediate scene recognition. Cognitive Psychology, 41:176–210.
Oliva, A. and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145–175.
Palmer, S.E. 1975. The effects of contextual scenes on the identifi-cation of objects. Memory and Cognition, 3:519–526.
Papageorgiou, C. and Poggio, T. 2000. A trainable system for object detection. International Journal of Computer Vision, 38(1):15– 33.
Potter, M.C. 1975. Meaning in visual search. Science, 187:965– 966.
Rao, R.P.N., Zelinsky, G.J., Hayhoe, M.M., and Ballard, D.H. 1996. Modeling saccadic targeting in visual search. Advances in Neural Information Processing Systems. MIT Press.
Rensink, R.A., O'Regan, J.K., and Clark, J.J. 1997. To see or not to see: The need for attention to perceive changes in scenes. Psychological Science, 8:368–373.
Ripley, B.D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press.
Rowley, H.A., Baluja, S., and Kanade, T. 1998. Neural networkbased face detetcion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23–38.
Schiele, B. and Crowley, J.L. 2000. Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision, 36(1):31–50.
Schyns, P.G. and Oliva, A. 1994. From blobs to boundary edges: Evidence for time and spatial scale dependent scene recognition. Psychological Science, 5:195–200.
Sirovich, L. and Kirby, M. 1987. Low-dimensional procedure for the characterization of human faces. Journal of Optical Society of America, 4:519–524.
Song, X., Sill, J., Abu-Mostafa, Y., and Kasdan, H. 2000. Image recognition in context: Application to microscopic urinalysis. Advances in Neural Information Processing Systems, MIT Press: Cambridge, MA, pp. 963–969.
Strat, T.M. and Fischler, M.A. 1991. Context-based vision: Recognizing objects using information from both 2-D and 3-D imagery. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10):1050–1065.
Szummer, M. and Picard, R.W. 1998. Indoor-outdoor image classifi-cation. In IEEE Intl.Workshop on Content-Based Access of Image and Video Databases.
Torralba, A. and Oliva, A. 1999. Scene organization using discriminant structural templates. IEEE Proc. of Int. Conf. in Comp. Vision, 1253–1258.
Torralba, A. and Sinha, P. 2001. Statistical context priming for object detection. IEEE Proc. of Int. Conf. in Comp. Vision, 1:763–770.
Torralba, A. 2002. Contextual modulation of target saliency. In Advances in Neural Information Processing Systems, T.G. Dietterich, S. Becker, and Z. Ghahramani (Eds.). MIT Press: Cambridge, MA, 14:1303–1310.
Torralba, A. and Oliva, A. 2002. Depth estimation from image structure. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(9):1226–1238.
Treisman, A. and Gelade, G. 1980. A feature integration theory of attention. Cognitive Psychology, 12:97–136.
Tsotsos, J.K., Culhane, S.M., Wai, W.Y.K., Lai, Y.H., Davis, N., and Nuflo, F. 1995. Modeling visual-attention via selective tuning. Artificial Intelligence, 78(1/2):507–545.
Vailaya, A., Jain, A., and Zhang, H.J. 1998. On image classification: City images vs. landscapes. Pattern Recognition, 31:1921–1935.
Weiss, Y. 2001. Deriving intrinsic images from image sequences. IEEE Proc. of Int. Conf. in Comp. Vision, 2:68–75.
Wolfe, J.M. 1994. Guided search 2.0. A revised model of visual search. Psychonomic Bulletin and Review, 1:202–228.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Torralba, A. Contextual Priming for Object Detection. International Journal of Computer Vision 53, 169–191 (2003). https://doi.org/10.1023/A:1023052124951
Issue Date:
DOI: https://doi.org/10.1023/A:1023052124951