A review of log-polar imaging for visual perception in robotics

https://doi.org/10.1016/j.robot.2009.10.002Get rights and content

Abstract

Log-polar imaging consists of a type of methods that represent visual information with a space-variant resolution inspired by the visual system of mammals. It has been studied for about three decades and has surpassed conventional approaches in robotics applications, mainly the ones where real-time constraints make it necessary to utilize resource-economic image representations and processing methodologies. This paper surveys the application of log-polar imaging in robotic vision, particularly in visual attention, target tracking, egomotion estimation, and 3D perception. The concise yet comprehensive review offered in this paper is intended to provide novel and experienced roboticists with a quick and gentle overview of log-polar vision and to motivate vision researchers to investigate the many open problems that still need solving. To help readers identify promising research directions, a possible research agenda is outlined. Finally, since log-polar vision is not restricted to robotics, a couple of other areas of application are discussed.

Introduction

Both natural and artificial visual systems have to deal with large amounts of information coming from the surrounding environment. When real-time operation is required, as happens with animals or robots in dynamic and unstructured environments, image acquisition and processing must be performed in a very short time (a few milliseconds) in order to provide a sufficiently fast response to external stimuli. Appropriate sensor geometries and image representations are essential for the efficiency of the full visual processing stream. To address this problem it is wise to look for the solutions present in biological systems, which have been optimized by millions of years of evolution. For instance, the visual system of many animals exhibits a non-uniform structure, where the receptive fields1 represent certain parts of the visual field more densely and acutely. In the case of mammals, whose eyes are able to move, retinas present a unique high resolution area in the center of the visual field, called the fovea. The distribution of receptive fields within the retina is fixed and the fovea can be redirected to other targets by ocular movements. The same structure is also commonly used in robot systems with moving cameras [2], [3], [4], [5], [6].

In the late 70s computer vision researchers broke new ground by considering the foveal nature of the visual systems of primates as an alternative to conventional uniform resolution sensors for artificial perception in computers and robots. Earlier on, biological findings in the visual cortex of monkeys [7] had shown that the displacement of a light stimulus in the retina produces displacements in the cortex that are inversely proportional to the distance to the fovea. This effect, also known as cortical magnification, indicates a general scaling behavior by which both receptive field spacing and size increase linearly with eccentricity, i.e. the distance from the fovea [8]. It was found that responses to linear stimuli originating in the fovea lie roughly along lines in the cortex, and circular stimuli centered on the fovea produce linear responses in the cortex at approximately orthogonal orientations [9]. Thus, the information transmitted between the retina and the visual cortex is organized in an approximate logarithmic-polar law [10].

The foveal structure of the retina of some animals is, together with their ability to move the eyes, a fundamental mechanism in the control of visual perception. In the late 80s and early 90s, researchers started exploiting eye movements to achieve complex visual tasks. The paradigm of active vision emerged as a powerful concept to endow an active observer with the ability to find more efficient solutions to problems that, from a passive vision perspective, were ill-posed and non-linear [11]. The idea can be generalized beyond pure perception by including manipulation. The smart usage of robot arms and hands, for instance, opens up many more possibilities for better visual perception [12]. A great deal of excitement was aroused at that time with regard to the present and future possibilities of active vision [13]. For instance, by purposefully moving the eyes, an observer with a foveal low resolution sensor can acquire a “virtual” high resolution image of its entire field of view [14]. Therefore, following on as the next natural step forward, a new concept appeared —space-variant active vision [15].

Since then, efforts have been made to explore the advantages that foveal-like log-polar imaging can bring to robotic applications. However, after almost three decades since those initial studies, no systematic and comprehensive work has been published that attempts to review past research on the topic. While a careful review of log-polar models was conducted ten years ago in [16], its focus was a detailed study and comparison of log-polar mapping templates and models with overlapping receptive fields. More recently, the motivations for retina-like sensors and the properties of the log-polar mapping were nicely considered in [17], [18]. Another paper [19] surveyed foveated sensors with a particular emphasis on image processing issues.

Thus, while these few review-like papers did not consider the applications and usages of log-polar images in depth, we feel that such an analysis is needed, to reflect on past achievements, discuss current challenges, and predict future developments. Additionally, this literature overview would be a valuable aid to any researcher interested in approaching the field, particularly to beginners. Finally, another important benefit of such a survey is that of helping to promote further work, both on the theoretical and practical sides of log-polar vision.

Therefore, the present survey aims to complement these previous reviews, by looking further into the variety of applications of log-polar sensing that have been proposed. Furthermore, our analysis pays particular attention to robotic applications. Hence, the usage of the log-polar transform for pattern recognition issues, though important, is not considered here. Studies and reviews on this other perspective can be found elsewhere [20], [21], [22], [23].

An overview of the log-polar mapping (Section 2) allows the readers to become familiar with the basics of this transform. There are different ways to obtain log-polar images either from conventional images or directly from a scene, using software- and/or hardware-based solutions (Section 3). In Section 3 we also address issues regarding how the mapping parameters may influence the visual task and whether the selection of optimal parameters can be automated. The area of visual attention and salience computation under foveal vision (Section 4) has not been explored very much, even though it plays a key role in active object search and recognition, in exploratory gaze strategies, and in the proper integration of different visual tasks in practical scenarios. One of the visual processes where log-polar imaging is most suitable is probably active target tracking (Section 5), and substantial research has been devoted to this topic. Some advantages have also been found in estimating the observer’s motion using log-polar images (Section 6), basically due to its polar geometric nature which fits particularly well with time-to-collision computation and other navigation tasks in mobile robots. Binocular depth estimation has been considered with a joint usage of log-polar imaging and active vergence movements (Section 7). There are also a number of less conventional sensor arrangements and less known properties of log-polar imaging that deserve some consideration. It is our prediction that many of these issues will open up the door to fascinating new research challenges in automatic foveal vision not only within robotics but also in other fields of application (Section 8).

Section snippets

Log-polar mapping

Log-polar mapping is a geometrical image transformation that attempts to emulate the topological reorganization of visual information from the retina to the visual cortex of primates. It can also be found in the literature under different names, such as log-polar transformation or the log(z) model. The reason for this last denomination comes from the fact that the mapping can be mathematically modeled by the complex logarithmic function log(z), where z is the complex variable representing

Sensor design

While the market of commercial visual cameras is dominated by sensors with conventional Cartesian lattices, research and development on foveal imaging during the 90s led to the design and construction of several prototypes of the so-called retina-like cameras with sensitive elements arranged following a log-polar pattern. The first versions of these sensors and cameras were based on the Charged Coupled Device (CCD) technology [60], [61], [62], [63], [64], while later models used the

Visual attention

In animals, attention is a cognitive process through which only a reduced subset of all sensory stimuli in the environment is selected. Paying attention to all the information at all times would clearly overwhelm the animal’s cognitive capabilities. In robots, this selection is also required since the computational resources are limited, and it is also beneficial to automatically select and process only the most interesting parts. Regarding visual perception, models of primate visual attention 

Visual tracking

Visual tracking consists in keeping track over time of the movements of one or more moving objects within the visual field of a given observer. When the camera dynamically changes its parameters to track a moving object [147], this can be referred to as active tracking. On the contrary, passive tracking just keeps record of the target position without moving the camera, and therefore entails a smaller field of view. A simple taxonomy of visual tracking may consider two dimensions:

Egomotion estimation

Egomotion is the motion of the observer (a pan–tilt camera, a robot arm, or any other dynamic visual agent). Its estimation is important in those situations where the knowledge of actual robot movement is required either for self-localization, or to distinguish it from the motion of objects in the scene. Although egomotion estimation can be tackled with the general tools for motion estimation, some specific approaches have been proposed that exploit the task at hand. Two common characteristics

3D cues and vergence control

One of the most important capabilities of autonomous robots is their ability to perceive the three-dimensional structure of the environment, in order to avoid obstacles, recognize shapes, and manipulate objects. Several cues for depth perception can be used from an image stream. In monocular systems, motion, focus and shading cues (amongst others) have been systematically used to address the problem of depth perception. In binocular systems, depth can be computed via stereo, a conceptually

A look to the future

The research on log-polar imaging, with emphasis on the robot vision community, has been reviewed in this article. In the past few years, fundamental properties of this image representation have been studied and exploited in appropriate algorithms.

One criticism that can naturally be posed is whether, taking all into account, it is really worth adopting log-polar images. Regarding the computational advantages, it could be argued that, after all, almost any task solved with log-polar images can

Acknowledgements

The authors acknowledge the Integrated Actions programs funded by the Portuguese Government and Spanish Ministry of Science and Education, through projects E 47-06 and HP2005-0095, respectively, under the topic “FOVEAR: Foveal vision for robotic applications”, and the Spanish research programme Consolider Ingenio-2010 CSD2007-00018. The authors are also grateful to the anonymous reviewers for their comments, the “Servei de Llengües i Terminologia” at Universitat Jaume I for their professional

V. Javier Traver earned a B.Sc. degree in Computer Science from Technical University of Valencia (Valencia, Spain) in 1995, and a Ph.D. degree in Computer Engineering from Universitat Jaume I (Castellón, Spain) in 2002. Since 2000, he has been a full-time lecturer of Computer Engineering at Universitat Jaume I. His research include foveal imaging, active vision, and image sequence analysis.

References (249)

  • J.A. Rodríguez et al.

    A multiresolution spatiotemporal motion segmentation technique for video sequences based on pyramidal structures

    Pattern Recognition Letters

    (2002)
  • R. Marfil et al.

    Pyramid segmentation algorithms revisited

    Pattern Recognition

    (2006)
  • J. Dias et al.

    Optical normal flow estimation on log-polar images. A solution for real-time binocular vision

    Real Time Imaging

    (1997)
  • F.L. Lim et al.

    Resolution consideration in spatially variant sensors

    Image and Vision Computing (IVC)

    (1997)
  • M. Ujaldón et al.

    On the computation of the circle Hough transform by a GPU rasterizer

    Pattern Recognition Letters

    (2008)
  • V.J. Traver et al.

    Log-polar mapping template design: From task-level requirements to geometry parameters

    Image and Vision Computing (IVC)

    (2008)
  • S. Edelman

    Receptive fields for vision: From hyperaccuity to object recognition

  • A. Bernardino et al.

    Binocular tracking: Integrating perception and control

    IEEE Transactions on Robotics and Automation

    (1999)
  • C.F.R. Weiman, Video compression via log-polar mapping, in: SPIE Symposium on OE/Aerospace Sensing, Orlando, Florida,...
  • F. Panerai, C. Capurro, G. Sandini, Space variant vision for an active camera mount. Technical Report TR 1/95, LIRA,...
  • C. Breazeal et al.

    Active vision for sociable robots

    Proc. IEEE Transactions on Systems, Man and Cybernetics Part A

    (2001)
  • B. Bederson, A miniature space-variant active vision system, Ph.D. Thesis, New York University,...
  • P. Daniel et al.

    The representation of the visual field on the cerebral cortex in monkeys

    Journal of Physiology

    (1961)
  • T. Lindeberg, L. Florack, Foveal scale-space and the linear increase of receptive field size as a function of...
  • R. Tootell et al.

    Deosyxlucose analysis of retinotopic organization in primate striate cortex

    Science

    (1982)
  • E.L. Schwartz

    Spatial mapping in the primate sensory projection: Analytic structure and relevance to perception

    Biological Cybernetics

    (1977)
  • J.Y. Aloimonos et al.

    Active vision

    International Journal of Computer Vision (IJCV)

    (1988)
  • R. Bajcsy

    Active perception and exploratory robotics

  • M. Swain et al.

    Promising directions in active vision

    International Journal of Computer Vision (IJCV)

    (1993)
  • G. Sandini et al.

    Retina-like sensors: Motivations, technology and applications

  • F. Berton et al.

    Anthropomorphic visual sensors

  • M. Yeasin et al.

    Foveated vision sensor and image processing—A review

  • J.C. Wilson, R.M. Hodgson, A pattern recognition system based on models of aspects of the human visual system, in: 4th...
  • V.J. Traver et al.

    The log-polar image representation in pattern recognition tasks

  • R. Wallace et al.

    Space variant image processing

    International Journal of Computer Vision (IJCV)

    (1994)
  • C. Capurro et al.

    Dynamic vergence using log-polar images

    International Journal of Computer Vision (IJCV)

    (1997)
  • L. Florack

    Modeling foveal vision

  • R.S. Wallace et al.

    Space variant image processing

    International Journal of Computer Vision (IJCV)

    (1994)
  • R.S. Wallace, P.-W. Ong, B.B. Bederson, E.L. Schwartz, Space-variant image processing, Technical Report 589-R256,...
  • R.S. Wallace, P.-W. Ong, B.B. Bederson, E.L. Schwartz, Space variant image processing, Technical Report 633, Courant...
  • J.A. Boluda, F. Pardo, T. Kayser, J.J. Pérez, J. Pelechano, A new foveated space-variant camera for robotic...
  • F. Pardo, J.A. Boluda, J.J. Pérez, B. Dierickx, D. Scheffer, Design issues on CMOS space-variant image sensors, in:...
  • R. Wodnicki et al.

    A log-polar image sensor fabricated in a standard 1.2-μm ASIC CMOS process

    Journal of Solid-State Circuits

    (1997)
  • F. Pardo et al.

    Space-variant nonorthogonal structure CMOS image sensor design

    Journal of Solid-State Circuits

    (1998)
  • S. Meikle, R. Yates, Building smarter sensors—Lessons learned from computer vision, in: Intelligent Vehicles Symposium,...
  • A.M. Wallace et al.

    Gradient detection in discrete log-polar images

    Pattern Recognition Letters

    (2003)
  • B. Fischl et al.

    Rapid anisotropic diffusion using space-variant vision

    International Journal of Computer Vision (IJCV)

    (1998)
  • S.W. Smith

    The Scientist and Engineer’s Guide to Digital Signal Processing

    (1997)
  • B.G. Borghuis et al.

    Design of a neuronal array

    The Journal of Neuroscience

    (2008)
  • V.J. Traver et al.

    Motion analysis with the Radon transform on log-polar images

    Journal of Mathematical Imaging and Vision

    (2008)
  • Cited by (137)

    • Foveated imaging through scattering medium with LG-basis transmission matrix

      2022, Optics and Lasers in Engineering
      Citation Excerpt :

      A remarkable superiority of foveal vision consists of assigning higher resolution in the region of interests (ROI) while lower resolution in the uninterested region [17]. Mathematically, the underlying principle of this space-variant resolution property lies in an approximate logarithmic-polar law mapping from Cartesian coordinates [18]. This feature of the foveal vision permits an efficient way for a visual system with limited resources, and thus has widespread in the region of image compression [19], single-pixel imaging [20] and 3D display [21].

    • Subject-oriented spatial logic

      2021, Information and Computation
    • HoughNet: Integrating Near and Long-Range Evidence for Visual Detection

      2023, IEEE Transactions on Pattern Analysis and Machine Intelligence
    View all citing articles on Scopus

    V. Javier Traver earned a B.Sc. degree in Computer Science from Technical University of Valencia (Valencia, Spain) in 1995, and a Ph.D. degree in Computer Engineering from Universitat Jaume I (Castellón, Spain) in 2002. Since 2000, he has been a full-time lecturer of Computer Engineering at Universitat Jaume I. His research include foveal imaging, active vision, and image sequence analysis.

    Alexandre Bernardino received the Ph.D. degree in Electrical and Computer Engineering in 2004 from Instituto Superior Técnico (IST). He is an Assistant Professor at IST and Researcher at the Institute for Systems and Robotics (ISR-Lisboa) in the Computer Vision Laboratory (VisLab). He participates in several national and international research projects in the fields of robotics, cognitive systems, computer vision and surveillance. He has published several articles in international journals and conferences, and his main research interests focus on the application of computer vision, cognitive science and control theory to advanced robotic and surveillance systems.

    View full text