A review of log-polar imaging for visual perception in robotics
Introduction
Both natural and artificial visual systems have to deal with large amounts of information coming from the surrounding environment. When real-time operation is required, as happens with animals or robots in dynamic and unstructured environments, image acquisition and processing must be performed in a very short time (a few milliseconds) in order to provide a sufficiently fast response to external stimuli. Appropriate sensor geometries and image representations are essential for the efficiency of the full visual processing stream. To address this problem it is wise to look for the solutions present in biological systems, which have been optimized by millions of years of evolution. For instance, the visual system of many animals exhibits a non-uniform structure, where the receptive fields1 represent certain parts of the visual field more densely and acutely. In the case of mammals, whose eyes are able to move, retinas present a unique high resolution area in the center of the visual field, called the fovea. The distribution of receptive fields within the retina is fixed and the fovea can be redirected to other targets by ocular movements. The same structure is also commonly used in robot systems with moving cameras [2], [3], [4], [5], [6].
In the late 70s computer vision researchers broke new ground by considering the foveal nature of the visual systems of primates as an alternative to conventional uniform resolution sensors for artificial perception in computers and robots. Earlier on, biological findings in the visual cortex of monkeys [7] had shown that the displacement of a light stimulus in the retina produces displacements in the cortex that are inversely proportional to the distance to the fovea. This effect, also known as cortical magnification, indicates a general scaling behavior by which both receptive field spacing and size increase linearly with eccentricity, i.e. the distance from the fovea [8]. It was found that responses to linear stimuli originating in the fovea lie roughly along lines in the cortex, and circular stimuli centered on the fovea produce linear responses in the cortex at approximately orthogonal orientations [9]. Thus, the information transmitted between the retina and the visual cortex is organized in an approximate logarithmic-polar law [10].
The foveal structure of the retina of some animals is, together with their ability to move the eyes, a fundamental mechanism in the control of visual perception. In the late 80s and early 90s, researchers started exploiting eye movements to achieve complex visual tasks. The paradigm of active vision emerged as a powerful concept to endow an active observer with the ability to find more efficient solutions to problems that, from a passive vision perspective, were ill-posed and non-linear [11]. The idea can be generalized beyond pure perception by including manipulation. The smart usage of robot arms and hands, for instance, opens up many more possibilities for better visual perception [12]. A great deal of excitement was aroused at that time with regard to the present and future possibilities of active vision [13]. For instance, by purposefully moving the eyes, an observer with a foveal low resolution sensor can acquire a “virtual” high resolution image of its entire field of view [14]. Therefore, following on as the next natural step forward, a new concept appeared —space-variant active vision [15].
Since then, efforts have been made to explore the advantages that foveal-like log-polar imaging can bring to robotic applications. However, after almost three decades since those initial studies, no systematic and comprehensive work has been published that attempts to review past research on the topic. While a careful review of log-polar models was conducted ten years ago in [16], its focus was a detailed study and comparison of log-polar mapping templates and models with overlapping receptive fields. More recently, the motivations for retina-like sensors and the properties of the log-polar mapping were nicely considered in [17], [18]. Another paper [19] surveyed foveated sensors with a particular emphasis on image processing issues.
Thus, while these few review-like papers did not consider the applications and usages of log-polar images in depth, we feel that such an analysis is needed, to reflect on past achievements, discuss current challenges, and predict future developments. Additionally, this literature overview would be a valuable aid to any researcher interested in approaching the field, particularly to beginners. Finally, another important benefit of such a survey is that of helping to promote further work, both on the theoretical and practical sides of log-polar vision.
Therefore, the present survey aims to complement these previous reviews, by looking further into the variety of applications of log-polar sensing that have been proposed. Furthermore, our analysis pays particular attention to robotic applications. Hence, the usage of the log-polar transform for pattern recognition issues, though important, is not considered here. Studies and reviews on this other perspective can be found elsewhere [20], [21], [22], [23].
An overview of the log-polar mapping (Section 2) allows the readers to become familiar with the basics of this transform. There are different ways to obtain log-polar images either from conventional images or directly from a scene, using software- and/or hardware-based solutions (Section 3). In Section 3 we also address issues regarding how the mapping parameters may influence the visual task and whether the selection of optimal parameters can be automated. The area of visual attention and salience computation under foveal vision (Section 4) has not been explored very much, even though it plays a key role in active object search and recognition, in exploratory gaze strategies, and in the proper integration of different visual tasks in practical scenarios. One of the visual processes where log-polar imaging is most suitable is probably active target tracking (Section 5), and substantial research has been devoted to this topic. Some advantages have also been found in estimating the observer’s motion using log-polar images (Section 6), basically due to its polar geometric nature which fits particularly well with time-to-collision computation and other navigation tasks in mobile robots. Binocular depth estimation has been considered with a joint usage of log-polar imaging and active vergence movements (Section 7). There are also a number of less conventional sensor arrangements and less known properties of log-polar imaging that deserve some consideration. It is our prediction that many of these issues will open up the door to fascinating new research challenges in automatic foveal vision not only within robotics but also in other fields of application (Section 8).
Section snippets
Log-polar mapping
Log-polar mapping is a geometrical image transformation that attempts to emulate the topological reorganization of visual information from the retina to the visual cortex of primates. It can also be found in the literature under different names, such as log-polar transformation or the model. The reason for this last denomination comes from the fact that the mapping can be mathematically modeled by the complex logarithmic function , where is the complex variable representing
Sensor design
While the market of commercial visual cameras is dominated by sensors with conventional Cartesian lattices, research and development on foveal imaging during the 90s led to the design and construction of several prototypes of the so-called retina-like cameras with sensitive elements arranged following a log-polar pattern. The first versions of these sensors and cameras were based on the Charged Coupled Device (CCD) technology [60], [61], [62], [63], [64], while later models used the
Visual attention
In animals, attention is a cognitive process through which only a reduced subset of all sensory stimuli in the environment is selected. Paying attention to all the information at all times would clearly overwhelm the animal’s cognitive capabilities. In robots, this selection is also required since the computational resources are limited, and it is also beneficial to automatically select and process only the most interesting parts. Regarding visual perception, models of primate visual attention
Visual tracking
Visual tracking consists in keeping track over time of the movements of one or more moving objects within the visual field of a given observer. When the camera dynamically changes its parameters to track a moving object [147], this can be referred to as active tracking. On the contrary, passive tracking just keeps record of the target position without moving the camera, and therefore entails a smaller field of view. A simple taxonomy of visual tracking may consider two dimensions:
Egomotion estimation
Egomotion is the motion of the observer (a pan–tilt camera, a robot arm, or any other dynamic visual agent). Its estimation is important in those situations where the knowledge of actual robot movement is required either for self-localization, or to distinguish it from the motion of objects in the scene. Although egomotion estimation can be tackled with the general tools for motion estimation, some specific approaches have been proposed that exploit the task at hand. Two common characteristics
3D cues and vergence control
One of the most important capabilities of autonomous robots is their ability to perceive the three-dimensional structure of the environment, in order to avoid obstacles, recognize shapes, and manipulate objects. Several cues for depth perception can be used from an image stream. In monocular systems, motion, focus and shading cues (amongst others) have been systematically used to address the problem of depth perception. In binocular systems, depth can be computed via stereo, a conceptually
A look to the future
The research on log-polar imaging, with emphasis on the robot vision community, has been reviewed in this article. In the past few years, fundamental properties of this image representation have been studied and exploited in appropriate algorithms.
One criticism that can naturally be posed is whether, taking all into account, it is really worth adopting log-polar images. Regarding the computational advantages, it could be argued that, after all, almost any task solved with log-polar images can
Acknowledgements
The authors acknowledge the Integrated Actions programs funded by the Portuguese Government and Spanish Ministry of Science and Education, through projects E 47-06 and HP2005-0095, respectively, under the topic “FOVEAR: Foveal vision for robotic applications”, and the Spanish research programme Consolider Ingenio-2010 CSD2007-00018. The authors are also grateful to the anonymous reviewers for their comments, the “Servei de Llengües i Terminologia” at Universitat Jaume I for their professional
V. Javier Traver earned a B.Sc. degree in Computer Science from Technical University of Valencia (Valencia, Spain) in 1995, and a Ph.D. degree in Computer Engineering from Universitat Jaume I (Castellón, Spain) in 2002. Since 2000, he has been a full-time lecturer of Computer Engineering at Universitat Jaume I. His research include foveal imaging, active vision, and image sequence analysis.
References (249)
Animate vision
Artificial Intelligence
(1991)- et al.
Space-variant active vision: Definition, overview and examples
Neural Networks
(1995) - et al.
A review of biologically motivated space-variant data reduction models for robotic vision
Computer Vision and Image Understanding (CVIU)
(1998) - et al.
‘Form-Invariant’ topological mapping strategy for 2D shape recognition
Computer Vision, Graphics, and Image Processing
(1985) - et al.
An image processing architecture for real time generation of scale and rotation invariant patterns
Computer Vision, Graphics, and Image Processing Journal
(1985) Computational anatomy and functional architecture of the striate cortex
Vision Research
(1980)A new log-polar mapping for space variant imaging. Application to face detection and tracking
Pattern Recognition
(1999)On the retino-cortical mapping
International Journal on Man-Machine studies
(1983)- et al.
An anthropomorphic retina-like structure for scene analysis
Computer Graphics and Image Processing
(1980) - et al.
A real-time foveated sensor with overlapping receptive fields
Real-Time Imaging: Special Issue on Natural and Artifical Imaging and Vision
(1997)
A multiresolution spatiotemporal motion segmentation technique for video sequences based on pyramidal structures
Pattern Recognition Letters
Pyramid segmentation algorithms revisited
Pattern Recognition
Optical normal flow estimation on log-polar images. A solution for real-time binocular vision
Real Time Imaging
Resolution consideration in spatially variant sensors
Image and Vision Computing (IVC)
On the computation of the circle Hough transform by a GPU rasterizer
Pattern Recognition Letters
Log-polar mapping template design: From task-level requirements to geometry parameters
Image and Vision Computing (IVC)
Receptive fields for vision: From hyperaccuity to object recognition
Binocular tracking: Integrating perception and control
IEEE Transactions on Robotics and Automation
Active vision for sociable robots
Proc. IEEE Transactions on Systems, Man and Cybernetics Part A
The representation of the visual field on the cerebral cortex in monkeys
Journal of Physiology
Deosyxlucose analysis of retinotopic organization in primate striate cortex
Science
Spatial mapping in the primate sensory projection: Analytic structure and relevance to perception
Biological Cybernetics
Active vision
International Journal of Computer Vision (IJCV)
Active perception and exploratory robotics
Promising directions in active vision
International Journal of Computer Vision (IJCV)
Retina-like sensors: Motivations, technology and applications
Anthropomorphic visual sensors
Foveated vision sensor and image processing—A review
The log-polar image representation in pattern recognition tasks
Space variant image processing
International Journal of Computer Vision (IJCV)
Dynamic vergence using log-polar images
International Journal of Computer Vision (IJCV)
Modeling foveal vision
Space variant image processing
International Journal of Computer Vision (IJCV)
A log-polar image sensor fabricated in a standard 1.2-μm ASIC CMOS process
Journal of Solid-State Circuits
Space-variant nonorthogonal structure CMOS image sensor design
Journal of Solid-State Circuits
Gradient detection in discrete log-polar images
Pattern Recognition Letters
Rapid anisotropic diffusion using space-variant vision
International Journal of Computer Vision (IJCV)
The Scientist and Engineer’s Guide to Digital Signal Processing
Design of a neuronal array
The Journal of Neuroscience
Motion analysis with the Radon transform on log-polar images
Journal of Mathematical Imaging and Vision
Cited by (137)
Sonar glass—Artificial vision: Comprehensive design aspects of a synchronization protocol for vision based sensors
2023, Measurement: Journal of the International Measurement ConfederationFoveated imaging through scattering medium with LG-basis transmission matrix
2022, Optics and Lasers in EngineeringCitation Excerpt :A remarkable superiority of foveal vision consists of assigning higher resolution in the region of interests (ROI) while lower resolution in the uninterested region [17]. Mathematically, the underlying principle of this space-variant resolution property lies in an approximate logarithmic-polar law mapping from Cartesian coordinates [18]. This feature of the foveal vision permits an efficient way for a visual system with limited resources, and thus has widespread in the region of image compression [19], single-pixel imaging [20] and 3D display [21].
Subject-oriented spatial logic
2021, Information and ComputationHoughNet: Integrating Near and Long-Range Evidence for Visual Detection
2023, IEEE Transactions on Pattern Analysis and Machine Intelligence
V. Javier Traver earned a B.Sc. degree in Computer Science from Technical University of Valencia (Valencia, Spain) in 1995, and a Ph.D. degree in Computer Engineering from Universitat Jaume I (Castellón, Spain) in 2002. Since 2000, he has been a full-time lecturer of Computer Engineering at Universitat Jaume I. His research include foveal imaging, active vision, and image sequence analysis.
Alexandre Bernardino received the Ph.D. degree in Electrical and Computer Engineering in 2004 from Instituto Superior Técnico (IST). He is an Assistant Professor at IST and Researcher at the Institute for Systems and Robotics (ISR-Lisboa) in the Computer Vision Laboratory (VisLab). He participates in several national and international research projects in the fields of robotics, cognitive systems, computer vision and surveillance. He has published several articles in international journals and conferences, and his main research interests focus on the application of computer vision, cognitive science and control theory to advanced robotic and surveillance systems.