Elsevier

Pattern Recognition

Volume 42, Issue 3, March 2009, Pages 425-436
Pattern Recognition

Description of interest regions with local binary patterns

https://doi.org/10.1016/j.patcog.2008.08.014Get rights and content

Abstract

This paper presents a novel method for interest region description. We adopted the idea that the appearance of an interest region can be well characterized by the distribution of its local features. The most well-known descriptor built on this idea is the SIFT descriptor that uses gradient as the local feature. Thus far, existing texture features are not widely utilized in the context of region description. In this paper, we introduce a new texture feature called center-symmetric local binary pattern (CS-LBP) that is a modified version of the well-known local binary pattern (LBP) feature. To combine the strengths of the SIFT and LBP, we use the CS-LBP as the local feature in the SIFT algorithm. The resulting descriptor is called the CS-LBP descriptor. In the matching and object category classification experiments, our descriptor performs favorably compared to the SIFT. Furthermore, the CS-LBP descriptor is computationally simpler than the SIFT.

Introduction

Local image feature detection and description have received a lot of attention in recent years. The basic idea is to first detect interest regions that are covariant to a class of transformations. Then, for each detected region, an invariant descriptor is built. Once we have the descriptors computed, we can match interest regions between images. This approach has many advantages. For example, local features can be made very tolerant to illumination changes, perspective distortions, image blur, image zoom, and so on. The approach is also very robust to occlusion. Local features have performed very well in many computer vision applications, such as image retrieval [1], wide baseline matching [2], object recognition [3], texture recognition [4], and robot localization [5].

The interest regions that are used as input to region description methods are provided by the interest region detectors. Many different approaches to region detection have been proposed. For example, some detectors detect corner-like regions while others extract blobs. Since this paper focuses on interest region description, we refer the reader to Ref. [6] for more information on interest region detection.

As with the interest region detection, many different approaches to interest region description have been proposed. The methods emphasize different image properties such as pixel intensities, color, texture, and edges. Many of the proposed descriptors are distribution-based, i.e. they use histograms to represent different characteristics of appearance or shape. The intensity-domain spin image [4] is a 2D histogram where the dimensions are the distance from the center point and the intensity value. The SIFT descriptor [3] is a 3D histogram of gradient locations and orientations where the contribution to the location and orientation bins is weighted by the gradient magnitude and a Gaussian window overlaid over the region. Very similar to the SIFT descriptor is the GLOH descriptor [7], which replaces the Cartesian location grid used by the SIFT with a log-polar one, and applies PCA to reduce the size of the descriptor. Another SIFT-like descriptor using log-polar location grid is the extension to the shape context presented in Ref. [7], which is a 3D histogram of edge point locations and orientations. Original shape context was computed only for edge point locations and not for orientations [8]. The SURF descriptor [9] builds on the strengths of the leading existing detectors and descriptors. It uses a Hessian matrix-based measure for the detector and Haar wavelet responses for the descriptor. By relying on integral images for image convolutions, computation time is significantly reduced. In Ref. [10], geodesic sampling is used to get neighborhood samples for interest points and then a geodesic-intensity histogram (GIH) is used as a deformation invariant local descriptor. Other interest region descriptors proposed in the literature include PCA-SIFT [11], steerable filters [12], moment invariants [13], and complex filters [14].

There exist several recent comparative studies on region descriptors [7], [15], [16]. Almost without an exception, the best results are reported for distribution-based descriptors such as SIFT. Recently, in Ref. [17], an interesting study on region descriptors was published. The authors break up the descriptor extraction process into a number of modules and put these together in different combinations. Many of these combinations give rise to published descriptors such as the SIFT but many are untested. Furthermore, learning is used to optimize the choice of parameter values for each candidate descriptor algorithm.

Many existing texture operators have not been used for describing interest regions so far [18]. One reason might be that, by using these methods, usually a large number of dimensions is required to build a reliable descriptor. The local binary pattern (LBP) texture operator [19], [20], [21] has been highly successful for various computer vision problems such as face recognition [22], background subtraction [23], and recognition of 3D textured surfaces [24], but it has not been used for describing interest regions so far. The LBP has properties that favor its usage in interest region description such as tolerance against illumination changes and computational simplicity. Drawbacks are that the operator produces a rather long histogram and is not too robust on flat image areas. To address these problems, in this paper, we propose a new LBP-based texture feature, denoted as center-symmetric local binary pattern (CS-LBP) that is more suitable for the given problem.

Since the SIFT and other distribution-based descriptors similar to it [7], [8], [9] have shown state-of-the-art performance in different problems, we decided to focus on this approach. We were especially interested to see if the gradient orientation and magnitude-based feature used in the SIFT algorithm could be replaced by a different feature that offers better or comparable performance. In this paper, we propose a new interest region descriptor, denoted as CS-LBP descriptor, that combines the good properties of the SIFT and LBP. This is achieved by adopting the SIFT descriptor and using the novel CS-LBP feature instead of original gradient feature. The new feature allows simplifications of several steps of the algorithm which makes the resulting descriptor computationally simpler than SIFT. It also appears to be more robust to illumination changes than the SIFT descriptor. A preliminary version of this article has appeared in Ref. [25].

The rest of the paper is organized as follows. In Section 2, we first briefly describe the starting point for our work, i.e. the SIFT and LBP methods. Sections 3 and 4 give details for the CS-LBP operator and the CS-LBP descriptor, respectively. The experimental evaluation is carried out in Section 5. Finally, we conclude the paper in Section 6.

Section snippets

SIFT and LBP methods

Before presenting in detail the CS-LBP operator and the CS-LBP descriptor, we give a brief review of the SIFT and LBP methods that form the basis for our work.

Center-symmetric local binary patterns

The LBP operator, described in Section 2.2, produces rather long histograms and is therefore difficult to use in the context of a region descriptor. To address the problem we modified the scheme of how to compare the pixels in the neighborhood. Instead of comparing each pixel with the center pixel, we compare center-symmetric pairs of pixels as illustrated in Fig. 1. This halves the number of comparisons for the same number of neighbors. We can see that for eight neighbors, LBP produces 256 (28)

CS-LBP descriptor

In the following, we present our CS-LBP descriptor in detail. The input for the descriptor is a normalized interest region. The process is depicted in Fig. 2. The region detection and normalization steps are described in Section 5.1. In our experiments, the region size after normalization is fixed to 41×41 pixels and the pixel values lie between 0 and 1.

Experimental evaluation

We use two well-known protocols to evaluate the proposed CS-LBP descriptor. Both are freely available on the Internet. The first protocol is a matching protocol that is designed for matching interest regions between a pair of images [27]. The second protocol is the PASCAL Visual Object Classes Challenge 2006 protocol which is an object category classification protocol [28]. We compare the performance of the CS-LBP descriptor to the state-of-the-art descriptor SIFT. This allows us to evaluate

Conclusions

A new method for interest region description was presented. The proposed CS-LBP descriptor combines the strengths of two well-known methods, the SIFT descriptor and the LBP texture operator. It uses a SIFT-like grid and replaces SIFTs gradient features with a LBP-based feature, i.e. CS-LBP which was also introduced in this paper. The CS-LBP feature has many properties that make it well suited for this task, namely a relatively short feature histogram, tolerance to illumination changes, and

Acknowledgment

The financial support provided by the Academy of Finland and the Infotech Oulu Graduate School is gratefully acknowledged.

About the Author—MARKO HEIKKILÄ received the M.Sc. degree in Electrical Engineering in 2004 from the University of Oulu, Finland. He was a researcher in the Machine Vision Group at the University of Oulu. Currently he is with Nokia Corporation.

References (32)

  • T. Ojala et al.

    A comparative study of texture measures with classification based on feature distributions

    Pattern Recognition

    (1996)
  • M. Pietikäinen et al.

    View-based recognition of real-world textures

    Pattern Recognition

    (2004)
  • K. Mikolajczyk, C. Schmid, Indexing based on scale invariant interest points, in: 8th IEEE International Conference on...
  • T. Tuytelaars et al.

    Matching widely separated views based on affine invariant regions

    Int. J. Comput. Vision

    (2004)
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vision

    (2004)
  • S. Lazebnik et al.

    A sparse texture representation using local affine regions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • S. Se, D. Lowe, J. Little, Global localization using distinctive visual features, in: IEEE/RSJ International Conference...
  • K. Mikolajczyk et al.

    A comparison of affine region detectors

    Int. J. Comput. Vision

    (2005)
  • K. Mikolajczyk et al.

    A performance evaluation of local descriptors

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • S. Belongie et al.

    Shape matching and object recognition using shape contexts

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • H. Bay, T. Tuytelaars, L.V. Gool, SURF: speeded up robust features, in: European Conference on Computer Vision, vol. 1,...
  • H. Ling, D.W. Jacobs, Deformation invariant image matching, in: 10th IEEE International Conference on Computer Vision,...
  • Y. Ke, R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, in: IEEE Conference on...
  • W. Freeman et al.

    The design and use of steerable filters

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1991)
  • L.J.V. Gool, T. Moons, D. Ungureanu, Affine/photometric invariants for planar intensity patterns, in: 4th European...
  • F. Schaffalitzky, A. Zisserman, Multi-view matching for unordered image sets, in: 7th European Conference on Computer...
  • Cited by (0)

    About the Author—MARKO HEIKKILÄ received the M.Sc. degree in Electrical Engineering in 2004 from the University of Oulu, Finland. He was a researcher in the Machine Vision Group at the University of Oulu. Currently he is with Nokia Corporation.

    About the Author—MATTI PIETIKÄINEN received the doctor of science in technology degree from the University of Oulu, Finland, in 1982. In 1981, he established the Machine Vision Group at the University of Oulu. This group has achieved a highly respected position in its field, and its research results have been widely exploited in industry. Currently, he is a professor of information engineering, scientific director of Infotech Oulu Research Center, and leader of the Machine Vision Group at the University of Oulu. His research interests include texture-based computer vision, face analysis, and their applications in human–computer interaction, person identification, and visual surveillance. From 1980 to 1981 and from 1984 to 1985, he visited the Computer Vision Laboratory at the University of Maryland. He has authored over 200 papers in international journals, books, and conference proceedings and about 100 other publications or reports. He has been an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence and Pattern Recognition journals. He was guest editor (with L.F. Pau) of a two-part special issue on Machine Vision for Advanced Production for the International Journal of Pattern Recognition and Artificial Intelligence (also reprinted as a book by World Scientific in 1996). He was also the editor of the book Texture Analysis in Machine Vision (World Scientific, 2000) and has served as a reviewer for numerous journals and conferences. He was the president of the Pattern Recognition Society of Finland from 1989 to 1992. From 1989 to 2007, he has served as a member of the Governing Board of the International Association for Pattern Recognition (IAPR) and became one of the founding fellows of the IAPR in 1994. He has also served on committees of several international conferences. He was an area chair of IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’07) and is cochair of Workshops of International Conference on Pattern Recognition (ICPR ’08). He is a senior member of the IEEE, and was the vice-chair of IEEE Finland Section.

    About the Author—CORDELIA SCHMID holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis on “Local Greyvalue Invariants for Image Matching and Retrieval” received the best thesis award from INPG in 1996. She received the Habilitation degree in 2001 for her thesis entitled “From Image Matching to Learning Visual Models”. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996–1997. Since 1997 she has held a permanent research position at INRIA Rhone-Alpes, where she is a research director and directs the INRIA team called LEAR for LEArning and Recognition in Vision. Dr. Schmid is the author of over eighty technical publications. She has been an Associate Editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (2001–2005) and for the International Journal of Computer Vision (2004–), and she was program chair of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. In 2006, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a senior member of IEEE.

    View full text