Description of interest regions with local binary patterns
Introduction
Local image feature detection and description have received a lot of attention in recent years. The basic idea is to first detect interest regions that are covariant to a class of transformations. Then, for each detected region, an invariant descriptor is built. Once we have the descriptors computed, we can match interest regions between images. This approach has many advantages. For example, local features can be made very tolerant to illumination changes, perspective distortions, image blur, image zoom, and so on. The approach is also very robust to occlusion. Local features have performed very well in many computer vision applications, such as image retrieval [1], wide baseline matching [2], object recognition [3], texture recognition [4], and robot localization [5].
The interest regions that are used as input to region description methods are provided by the interest region detectors. Many different approaches to region detection have been proposed. For example, some detectors detect corner-like regions while others extract blobs. Since this paper focuses on interest region description, we refer the reader to Ref. [6] for more information on interest region detection.
As with the interest region detection, many different approaches to interest region description have been proposed. The methods emphasize different image properties such as pixel intensities, color, texture, and edges. Many of the proposed descriptors are distribution-based, i.e. they use histograms to represent different characteristics of appearance or shape. The intensity-domain spin image [4] is a 2D histogram where the dimensions are the distance from the center point and the intensity value. The SIFT descriptor [3] is a 3D histogram of gradient locations and orientations where the contribution to the location and orientation bins is weighted by the gradient magnitude and a Gaussian window overlaid over the region. Very similar to the SIFT descriptor is the GLOH descriptor [7], which replaces the Cartesian location grid used by the SIFT with a log-polar one, and applies PCA to reduce the size of the descriptor. Another SIFT-like descriptor using log-polar location grid is the extension to the shape context presented in Ref. [7], which is a 3D histogram of edge point locations and orientations. Original shape context was computed only for edge point locations and not for orientations [8]. The SURF descriptor [9] builds on the strengths of the leading existing detectors and descriptors. It uses a Hessian matrix-based measure for the detector and Haar wavelet responses for the descriptor. By relying on integral images for image convolutions, computation time is significantly reduced. In Ref. [10], geodesic sampling is used to get neighborhood samples for interest points and then a geodesic-intensity histogram (GIH) is used as a deformation invariant local descriptor. Other interest region descriptors proposed in the literature include PCA-SIFT [11], steerable filters [12], moment invariants [13], and complex filters [14].
There exist several recent comparative studies on region descriptors [7], [15], [16]. Almost without an exception, the best results are reported for distribution-based descriptors such as SIFT. Recently, in Ref. [17], an interesting study on region descriptors was published. The authors break up the descriptor extraction process into a number of modules and put these together in different combinations. Many of these combinations give rise to published descriptors such as the SIFT but many are untested. Furthermore, learning is used to optimize the choice of parameter values for each candidate descriptor algorithm.
Many existing texture operators have not been used for describing interest regions so far [18]. One reason might be that, by using these methods, usually a large number of dimensions is required to build a reliable descriptor. The local binary pattern (LBP) texture operator [19], [20], [21] has been highly successful for various computer vision problems such as face recognition [22], background subtraction [23], and recognition of 3D textured surfaces [24], but it has not been used for describing interest regions so far. The LBP has properties that favor its usage in interest region description such as tolerance against illumination changes and computational simplicity. Drawbacks are that the operator produces a rather long histogram and is not too robust on flat image areas. To address these problems, in this paper, we propose a new LBP-based texture feature, denoted as center-symmetric local binary pattern (CS-LBP) that is more suitable for the given problem.
Since the SIFT and other distribution-based descriptors similar to it [7], [8], [9] have shown state-of-the-art performance in different problems, we decided to focus on this approach. We were especially interested to see if the gradient orientation and magnitude-based feature used in the SIFT algorithm could be replaced by a different feature that offers better or comparable performance. In this paper, we propose a new interest region descriptor, denoted as CS-LBP descriptor, that combines the good properties of the SIFT and LBP. This is achieved by adopting the SIFT descriptor and using the novel CS-LBP feature instead of original gradient feature. The new feature allows simplifications of several steps of the algorithm which makes the resulting descriptor computationally simpler than SIFT. It also appears to be more robust to illumination changes than the SIFT descriptor. A preliminary version of this article has appeared in Ref. [25].
The rest of the paper is organized as follows. In Section 2, we first briefly describe the starting point for our work, i.e. the SIFT and LBP methods. Sections 3 and 4 give details for the CS-LBP operator and the CS-LBP descriptor, respectively. The experimental evaluation is carried out in Section 5. Finally, we conclude the paper in Section 6.
Section snippets
SIFT and LBP methods
Before presenting in detail the CS-LBP operator and the CS-LBP descriptor, we give a brief review of the SIFT and LBP methods that form the basis for our work.
Center-symmetric local binary patterns
The LBP operator, described in Section 2.2, produces rather long histograms and is therefore difficult to use in the context of a region descriptor. To address the problem we modified the scheme of how to compare the pixels in the neighborhood. Instead of comparing each pixel with the center pixel, we compare center-symmetric pairs of pixels as illustrated in Fig. 1. This halves the number of comparisons for the same number of neighbors. We can see that for eight neighbors, LBP produces 256
CS-LBP descriptor
In the following, we present our CS-LBP descriptor in detail. The input for the descriptor is a normalized interest region. The process is depicted in Fig. 2. The region detection and normalization steps are described in Section 5.1. In our experiments, the region size after normalization is fixed to pixels and the pixel values lie between 0 and 1.
Experimental evaluation
We use two well-known protocols to evaluate the proposed CS-LBP descriptor. Both are freely available on the Internet. The first protocol is a matching protocol that is designed for matching interest regions between a pair of images [27]. The second protocol is the PASCAL Visual Object Classes Challenge 2006 protocol which is an object category classification protocol [28]. We compare the performance of the CS-LBP descriptor to the state-of-the-art descriptor SIFT. This allows us to evaluate
Conclusions
A new method for interest region description was presented. The proposed CS-LBP descriptor combines the strengths of two well-known methods, the SIFT descriptor and the LBP texture operator. It uses a SIFT-like grid and replaces SIFTs gradient features with a LBP-based feature, i.e. CS-LBP which was also introduced in this paper. The CS-LBP feature has many properties that make it well suited for this task, namely a relatively short feature histogram, tolerance to illumination changes, and
Acknowledgment
The financial support provided by the Academy of Finland and the Infotech Oulu Graduate School is gratefully acknowledged.
About the Author—MARKO HEIKKILÄ received the M.Sc. degree in Electrical Engineering in 2004 from the University of Oulu, Finland. He was a researcher in the Machine Vision Group at the University of Oulu. Currently he is with Nokia Corporation.
References (32)
- et al.
A comparative study of texture measures with classification based on feature distributions
Pattern Recognition
(1996) - et al.
View-based recognition of real-world textures
Pattern Recognition
(2004) - K. Mikolajczyk, C. Schmid, Indexing based on scale invariant interest points, in: 8th IEEE International Conference on...
- et al.
Matching widely separated views based on affine invariant regions
Int. J. Comput. Vision
(2004) Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vision
(2004)- et al.
A sparse texture representation using local affine regions
IEEE Trans. Pattern Anal. Mach. Intell.
(2005) - S. Se, D. Lowe, J. Little, Global localization using distinctive visual features, in: IEEE/RSJ International Conference...
- et al.
A comparison of affine region detectors
Int. J. Comput. Vision
(2005) - et al.
A performance evaluation of local descriptors
IEEE Trans. Pattern Anal. Mach. Intell.
(2005) - et al.
Shape matching and object recognition using shape contexts
IEEE Trans. Pattern Anal. Mach. Intell.
(2002)
The design and use of steerable filters
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (0)
About the Author—MARKO HEIKKILÄ received the M.Sc. degree in Electrical Engineering in 2004 from the University of Oulu, Finland. He was a researcher in the Machine Vision Group at the University of Oulu. Currently he is with Nokia Corporation.
About the Author—MATTI PIETIKÄINEN received the doctor of science in technology degree from the University of Oulu, Finland, in 1982. In 1981, he established the Machine Vision Group at the University of Oulu. This group has achieved a highly respected position in its field, and its research results have been widely exploited in industry. Currently, he is a professor of information engineering, scientific director of Infotech Oulu Research Center, and leader of the Machine Vision Group at the University of Oulu. His research interests include texture-based computer vision, face analysis, and their applications in human–computer interaction, person identification, and visual surveillance. From 1980 to 1981 and from 1984 to 1985, he visited the Computer Vision Laboratory at the University of Maryland. He has authored over 200 papers in international journals, books, and conference proceedings and about 100 other publications or reports. He has been an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence and Pattern Recognition journals. He was guest editor (with L.F. Pau) of a two-part special issue on Machine Vision for Advanced Production for the International Journal of Pattern Recognition and Artificial Intelligence (also reprinted as a book by World Scientific in 1996). He was also the editor of the book Texture Analysis in Machine Vision (World Scientific, 2000) and has served as a reviewer for numerous journals and conferences. He was the president of the Pattern Recognition Society of Finland from 1989 to 1992. From 1989 to 2007, he has served as a member of the Governing Board of the International Association for Pattern Recognition (IAPR) and became one of the founding fellows of the IAPR in 1994. He has also served on committees of several international conferences. He was an area chair of IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’07) and is cochair of Workshops of International Conference on Pattern Recognition (ICPR ’08). He is a senior member of the IEEE, and was the vice-chair of IEEE Finland Section.
About the Author—CORDELIA SCHMID holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis on “Local Greyvalue Invariants for Image Matching and Retrieval” received the best thesis award from INPG in 1996. She received the Habilitation degree in 2001 for her thesis entitled “From Image Matching to Learning Visual Models”. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996–1997. Since 1997 she has held a permanent research position at INRIA Rhone-Alpes, where she is a research director and directs the INRIA team called LEAR for LEArning and Recognition in Vision. Dr. Schmid is the author of over eighty technical publications. She has been an Associate Editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (2001–2005) and for the International Journal of Computer Vision (2004–), and she was program chair of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. In 2006, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a senior member of IEEE.