Elsevier

Real-Time Imaging

Volume 11, Issue 3, June 2005, Pages 172-185
Real-Time Imaging

Real-time foreground–background segmentation using codebook model

https://doi.org/10.1016/j.rti.2004.12.004Get rights and content

Abstract

We present a real-time algorithm for foreground–background segmentation. Sample background values at each pixel are quantized into codebooks which represent a compressed form of background model for a long image sequence. This allows us to capture structural background variation due to periodic-like motion over a long period of time under limited memory. The codebook representation is efficient in memory and speed compared with other background modeling techniques. Our method can handle scenes containing moving backgrounds or illumination variations, and it achieves robust detection for different types of videos. We compared our method with other multimode modeling techniques.

In addition to the basic algorithm, two features improving the algorithm are presented—layered modeling/detection and adaptive codebook updating.

For performance evaluation, we have applied perturbation detection rate analysis to four background subtraction algorithms and two videos of different types of scenes.

Introduction

The capability of extracting moving objects from a video sequence captured using a static camera is a typical first step in visual surveillance. A common approach for discriminating moving objects from the background is detection by background subtraction. The idea of background subtraction is to subtract or difference the current image from a reference background model. The subtraction identifies non-stationary or new objects.

The simplest background model assumes that the intensity values of a pixel can be modeled by a single unimodal distribution. This basic model is used in [1], [2]. However, a single-mode model cannot handle multiple backgrounds, like waving trees. The generalized mixture of Gaussians (MOG) in [3] has been used to model complex, non-static backgrounds. Methods employing MOG have been widely incorporated into algorithms that utilize Bayesian frameworks [4], dense depth data [5], color and gradient information [6], mean-shift analysis [7], and region-based information [8].

MOG does have some disadvantages. Backgrounds having fast variations are not easily modeled with just a few Gaussians accurately, and it may fail to provide sensitive detection (which is mentioned in [9]). In addition, depending on the learning rate to adapt to background changes, MOG faces trade-off problems. For a low learning rate, it produces a wide model that has difficulty in detecting a sudden change to the background. If the model adapts too quickly, slowly moving foreground pixels will be absorbed into the background model, resulting in a high false negative rate. This is the foreground aperture problem described in [10].

To overcome these problems, a non-parametric technique estimating the probability density function at each pixel from many samples using kernel density estimation technique was developed in [9]. It is able to adapt very quickly to changes in the background process and to detect targets with high sensitivity. A more advanced approach using adaptive kernel density estimation was recently proposed in [11].

However, the non-parametric technique in [9] cannot be used when long-time periods are needed to sufficiently sample the background—for example when there is significant wind load on vegetation—due mostly to memory constraints. Our algorithm constructs a highly compressed background model that addresses that problem.

Pixel-based techniques assume that the time series of observations is independent at each pixel. In contrast, some researchers [5], [8], [10] employ a region- or frame-based approach by segmenting an image into regions or by refining low-level classification obtained at the pixel level. Markov random field techniques employed in [12], [13] can also model both temporal and spatial context. Algorithms in [14], [15] aim to segment the foreground objects in dynamic textured backgrounds (e.g., water, escalators, waving trees, etc.). Furthermore, Amer et al. [16] describes interactions between low-level object segments and high-level information such as tracking or event description.

Our codebook (CB) background subtraction algorithm was intended to sample values over long times, without making parametric assumptions. Mixed backgrounds can be modeled by multiple codewords. The key features of the algorithm are

  • an adaptive and compact background model that can capture structural background motion over a long period of time under limited memory. This allows us to encode moving backgrounds or multiple changing backgrounds;

  • the capability of coping with local and global illumination changes;

  • unconstrained training that allows moving foreground objects in the scene during the initial training period;

  • layered modeling and detection allowing us to have multiple layers of background representing different background layers.

In Section 2, we describe the codebook construction algorithm and the color and brightness metric, used for detection. We show, in Section 3, that the method is suitable for both stationary and moving backgrounds in different types of scenes, and applicable to compressed videos such as MPEG. Important improvements to the above algorithm are presented in Section 4—layered modeling/detection and adaptive codebook updating. In Section 5, a performance evaluation technique—perturbation detection rate analysis—is used to evaluate four pixel-based algorithms. Finally, conclusion and discussion are presented in last Section 6.

Section snippets

Background modeling and detection

The CB algorithm adopts a quantization/clustering technique, inspired by Kohonen [18], [19], to construct a background model from long observation sequences. For each pixel, it builds a codebook consisting of one or more codewords. Samples at each pixel are clustered into the set of codewords based on a color distortion metric together with brightness bounds. Not all pixels have the same number of codewords. The clusters represented by codewords do not necessarily correspond to single Gaussian

Detection results and comparison

Most existing background subtraction algorithms fail to work with low-bandwidth compressed videos mainly due to spatial block compression that causes block artifacts, and temporal block compression that causes abnormal distribution of encoding (random spikes). Fig. 4(a) is an image extracted from an MPEG video encoded at 70 kbits/s. Fig. 4(b) depicts 20-times scaled image of the standard deviations of blue(B)-channel values in the training set. It is easy to see that the distribution of pixel

Improvements

In order to make our technique more practically useful in a visual surveillance system, we improved the basic algorithm by layered modeling/detection and adaptive codebook updating.

Performance evaluation using PDR analysis

In this section we evaluate the performance of several background subtraction algorithms using perturbation detection rate (PDR) analysis. PDR measures, given a false alarm rate (FA-rate), the sensitivity of a background subtraction algorithm in detecting low contrast targets against a background as a function of contrast (Δ), also depending on how well the model captures mixed (moving) background events. As an alternative to the common method of ROC analysis, it does not require foreground

Conclusion and discussion

Our new adaptive background subtraction algorithm, which is able to model a background from a long training sequence with limited memory, works well on moving backgrounds, illumination changes (using our color distortion measures), and compressed videos having irregular intensity distributions. It has other desirable features—unconstrained training and layered modeling/detection. Comparison with other multimode modeling algorithms shows that the codebook algorithm has good properties on several

References (21)

  • T. Kohonen

    Learning vector quantization

    Neural Networks

    (1988)
  • C.R. Wren et al.

    Pfinderreal-time tracking of the human body

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • Horprasert T, Harwood D, Davis LS. A statistical approach for real-time robust background subtraction and shadow...
  • C. Stauffer et al.

    Adaptive background mixture models for real-time tracking

    IEEE International Conference on Computer Vision and Pattern Recognition

    (1999)
  • Lee DS, Hull JJ, Erol B. A Bayesian framework for Gaussian mixture background modeling. IEEE International Conference...
  • M. Harville

    A framework for high-level feedback to adaptive, per-pixel, mixture-of-gaussian background models

    European Conference on Computer Vision

    (2002)
  • Javed O, Shafique K, Shah M. A hierarchical approach to robust background subtraction using color and gradient...
  • Porikli F, Tuzel O. Human body tracking by adaptive background models and mean-shift analysis. IEEE International...
  • Cristani M, Bicego M, Murino V. Integrated region- and pixel-based approach to background modelling. Proceedings of...
  • A. Elgammal et al.

    Non-parametric model for background subtraction

    European Conference on Computer Vision

    (2000)
There are more references available in the full text version of this article.

Cited by (1358)

View all citing articles on Scopus
View full text