Real-time foreground–background segmentation using codebook model
Introduction
The capability of extracting moving objects from a video sequence captured using a static camera is a typical first step in visual surveillance. A common approach for discriminating moving objects from the background is detection by background subtraction. The idea of background subtraction is to subtract or difference the current image from a reference background model. The subtraction identifies non-stationary or new objects.
The simplest background model assumes that the intensity values of a pixel can be modeled by a single unimodal distribution. This basic model is used in [1], [2]. However, a single-mode model cannot handle multiple backgrounds, like waving trees. The generalized mixture of Gaussians (MOG) in [3] has been used to model complex, non-static backgrounds. Methods employing MOG have been widely incorporated into algorithms that utilize Bayesian frameworks [4], dense depth data [5], color and gradient information [6], mean-shift analysis [7], and region-based information [8].
MOG does have some disadvantages. Backgrounds having fast variations are not easily modeled with just a few Gaussians accurately, and it may fail to provide sensitive detection (which is mentioned in [9]). In addition, depending on the learning rate to adapt to background changes, MOG faces trade-off problems. For a low learning rate, it produces a wide model that has difficulty in detecting a sudden change to the background. If the model adapts too quickly, slowly moving foreground pixels will be absorbed into the background model, resulting in a high false negative rate. This is the foreground aperture problem described in [10].
To overcome these problems, a non-parametric technique estimating the probability density function at each pixel from many samples using kernel density estimation technique was developed in [9]. It is able to adapt very quickly to changes in the background process and to detect targets with high sensitivity. A more advanced approach using adaptive kernel density estimation was recently proposed in [11].
However, the non-parametric technique in [9] cannot be used when long-time periods are needed to sufficiently sample the background—for example when there is significant wind load on vegetation—due mostly to memory constraints. Our algorithm constructs a highly compressed background model that addresses that problem.
Pixel-based techniques assume that the time series of observations is independent at each pixel. In contrast, some researchers [5], [8], [10] employ a region- or frame-based approach by segmenting an image into regions or by refining low-level classification obtained at the pixel level. Markov random field techniques employed in [12], [13] can also model both temporal and spatial context. Algorithms in [14], [15] aim to segment the foreground objects in dynamic textured backgrounds (e.g., water, escalators, waving trees, etc.). Furthermore, Amer et al. [16] describes interactions between low-level object segments and high-level information such as tracking or event description.
Our codebook (CB) background subtraction algorithm was intended to sample values over long times, without making parametric assumptions. Mixed backgrounds can be modeled by multiple codewords. The key features of the algorithm are
an adaptive and compact background model that can capture structural background motion over a long period of time under limited memory. This allows us to encode moving backgrounds or multiple changing backgrounds;
the capability of coping with local and global illumination changes;
unconstrained training that allows moving foreground objects in the scene during the initial training period;
layered modeling and detection allowing us to have multiple layers of background representing different background layers.
Section snippets
Background modeling and detection
The CB algorithm adopts a quantization/clustering technique, inspired by Kohonen [18], [19], to construct a background model from long observation sequences. For each pixel, it builds a codebook consisting of one or more codewords. Samples at each pixel are clustered into the set of codewords based on a color distortion metric together with brightness bounds. Not all pixels have the same number of codewords. The clusters represented by codewords do not necessarily correspond to single Gaussian
Detection results and comparison
Most existing background subtraction algorithms fail to work with low-bandwidth compressed videos mainly due to spatial block compression that causes block artifacts, and temporal block compression that causes abnormal distribution of encoding (random spikes). Fig. 4(a) is an image extracted from an MPEG video encoded at 70 kbits/s. Fig. 4(b) depicts 20-times scaled image of the standard deviations of blue(B)-channel values in the training set. It is easy to see that the distribution of pixel
Improvements
In order to make our technique more practically useful in a visual surveillance system, we improved the basic algorithm by layered modeling/detection and adaptive codebook updating.
Performance evaluation using PDR analysis
In this section we evaluate the performance of several background subtraction algorithms using perturbation detection rate (PDR) analysis. PDR measures, given a false alarm rate (FA-rate), the sensitivity of a background subtraction algorithm in detecting low contrast targets against a background as a function of contrast (), also depending on how well the model captures mixed (moving) background events. As an alternative to the common method of ROC analysis, it does not require foreground
Conclusion and discussion
Our new adaptive background subtraction algorithm, which is able to model a background from a long training sequence with limited memory, works well on moving backgrounds, illumination changes (using our color distortion measures), and compressed videos having irregular intensity distributions. It has other desirable features—unconstrained training and layered modeling/detection. Comparison with other multimode modeling algorithms shows that the codebook algorithm has good properties on several
References (21)
Learning vector quantization
Neural Networks
(1988)- et al.
Pfinderreal-time tracking of the human body
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997) - Horprasert T, Harwood D, Davis LS. A statistical approach for real-time robust background subtraction and shadow...
- et al.
Adaptive background mixture models for real-time tracking
IEEE International Conference on Computer Vision and Pattern Recognition
(1999) - Lee DS, Hull JJ, Erol B. A Bayesian framework for Gaussian mixture background modeling. IEEE International Conference...
A framework for high-level feedback to adaptive, per-pixel, mixture-of-gaussian background models
European Conference on Computer Vision
(2002)- Javed O, Shafique K, Shah M. A hierarchical approach to robust background subtraction using color and gradient...
- Porikli F, Tuzel O. Human body tracking by adaptive background models and mean-shift analysis. IEEE International...
- Cristani M, Bicego M, Murino V. Integrated region- and pixel-based approach to background modelling. Proceedings of...
- et al.
Non-parametric model for background subtraction
European Conference on Computer Vision
(2000)
Cited by (1358)
Segmenting foreground objects in a multi-modal background using modified Z-score
2024, Journal of Ambient Intelligence and Humanized ComputingBackground subtraction for video sequence using deep neural network
2024, Multimedia Tools and Applications