Elsevier

Pattern Recognition Letters

Volume 28, Issue 3, 1 February 2007, Pages 320-328
Pattern Recognition Letters

A new motion detection algorithm based on ΣΔ background estimation

https://doi.org/10.1016/j.patrec.2006.04.007Get rights and content

Abstract

Motion detection using a stationary camera can be done by estimating the static scene (background). In that purpose, we propose a new method based on a simple recursive non linear operator, the ΣΔ filter. Used along with a spatiotemporal regularization algorithm, it allows robust, computationally efficient and accurate motion detection. To deal with complex scenes containing a wide range of motion models with very different time constants, we propose a generalization of the basic model to multiple ΣΔ estimation.

Introduction

The detection of moving objects in an image sequence is a very important low-level task for many computer vision applications, such as video surveillance, traffic monitoring or sign language recognition. When the camera is stationary, a class of methods usually employed is background subtraction. The principle of these methods is to build a model of the static scene (i.e., without moving objects) called background, and then to compare every frame of the sequence to this background in order to discriminate the regions of unusual motion, called foreground (the moving objects).

Many algorithms have been developed for background subtraction: recent reviews and evaluations can be found in (Lee and Hedley, 2002, Chalidabhongse et al., 2003, Cheung and Kamath, 2004, Piccardi, 2004). In this paper, we are more specifically interested in video surveillance systems with long autonomy. The difficulty in devising background subtraction algorithms in such context lies in the respect of several constraints:

  • The system must keep working without human interaction for a long time, and then take into account gradual or sudden changes such as illumination variation or new static objects settling in the scene. This means that the background must be temporally adaptive.

  • The system must be able to discard irrelevant motion such as waving bushes or flowing water. It should also be robust to slight oscillations of the camera. This means that there must be a local estimation for the confidence in the background value.

  • The system must be real-time, compact and low-power, so the algorithms must not use much resource, in terms of computing power and memory.

The two first conditions imply that statistical measures on the temporal activity must be locally available in every pixel, and constantly updated. This excludes any basic approach like using a single model such as the previous frame or a temporal average for the background, and global thresholding for decision.

Some background estimation methods are based on the analysis of the histogram of the values taken by each pixel within a fixed number K of past frames. The mean, the median or the mode of the histogram can be chosen to set the background value, and the foreground can be discriminated by comparing the difference between the current frame and the background with the histogram variance. More sophisticated techniques are also based on the K past frames history: linear prediction (Toyoma et al., 1999), kernel density estimation (Elgammal et al., 2000, Mittal and Paragios, 2004), or principal component analysis (Oliver et al., 2000). These methods require a great amount of memory, since K needs to be large (usually more than 50) for robustness purposes. So they are not compatible with our third condition.

Much more attractive for our requirements are the recursive methods, that do not keep in memory a histogram for each pixel, but rather a fixed number of estimates computed recursively. These estimates can be the mean and variance of a Gaussian distribution (Wren et al., 1997), different states of the background (e.g., its values and temporal derivatives) estimated by predictive filter (Karmann and von Brandt, 1990), or recursive estimation of the extremal values (Richefeu and Manzanera, 2004). But it is difficult to get robust estimates of the background with linear recursive framework, unless a multi-modal distribution (e.g., multiple Gaussian (Stauffer and Grimson, 2000, Power and Schoonees, 2002)) is explicitly used, which is done at the price of an increasing complexity and memory requirement. Furthermore, these methods rely on parameters such as the learning rates used in the recursive linear filters, setting the relative weights of the background states and the new observations, whose tuning can be tricky, which makes difficult the fulfillment of the first condition stated above.

A recursive approximation of the temporal median was proposed in (McFarlane and Schofield, 1995) to compute the background. The interest of this method lies in the robustness provided by the non linearity compared to the linear recursive average, and in the very low computational cost. In this article, we investigate some nice properties of this operator, introducing the notion of ΣΔ estimation, and using it to obtain a locally adaptive motion detection.

In Section 2, we present the basic ΣΔ estimation method. The ΣΔ filter is presented and used to compute two orders of temporal statistics for each pixel of the sequence providing a pixel-level decision framework. Then, in Section 3, we exploit the spatial correlation in these data using new hybrid linear/morphological operators, and use higher level processing to enhance and regularize the detection solution. Some results are presented, illustrating the robustness and accuracy of the method in the case of simple background (i.e., one single time-varying mode). For more complex scenes, we propose in Section 4 a generalization of the algorithm to multiple background estimation. Finally, conclusions and future works are presented in Section 5.

Section snippets

ΣΔ estimation

Our first background estimate, whose computation is shown on Table 1(1), is the same as (McFarlane and Schofield, 1995), where It is the input sequence, and Mt the estimated background value. The sign function sgn is defined as sgn(a) = −1 if a < 0, sgn(a) = 1 if a > 0, and sgn(a) = 0 if a = 0. So, at every frame, the estimate is simply incremented by one if it is smaller than the sample, or decremented by one if it is greater than the sample. If It is a discrete random signal, the ratio between the number

Spatiotemporal processing

Recently, we have presented a Markovian modeling to perform a spatiotemporal regularization of the pixel-level ΣΔ detection. It was an adaptation of the iterative algorithm presented in (Caplier et al., 1996, Lacassagne et al., 1999), using the pixel-level detection Dt as initialization, and the ΣΔ difference Δt and variance Vt, as a couple of observation fields used in the design of the energy. Details can be found in (Manzanera and Richefeu, 2004).

We present here another regularization

Multiple background ΣΔ estimation

The spatiotemporal processing and relevance feedback, presented in the previous section, allows a visible enhancement of the robustness, in the case of slowing down, stopping or radially moving objects. Nevertheless, the ΣΔ estimator is characterized by a time constant: its updating period, which has a dimension of number of gray levels per second. This induces a limitation of the basic approach in the adaptation capability to certain complex scenes, typically in the case of scenes permanently

Conclusions

We have presented a new algorithm allowing a robust and accurate detection of moving objects for a small cost in memory consumption and computational complexity. We have emphasized the nice properties of the ΣΔ filter for the detection of salient features in time-varying signal, showing that the interest of such filter goes well beyond its temporal median convergence property.

We have proposed a new spatiotemporal regularization strategy, using an original hybrid reconstruction method and

Acknowledgements

The authors wish to thank Lionel Lacassagne, who proposed improvements on the original algorithm, and provided most of the sequences used in this article.

References (20)

  • A. Caplier et al.

    Mrf based motion detection algorithm image processing board implementation

    Traitement du signal

    (1996)
  • Chalidabhongse, T.H., Kim, K., Harwood, D., Davis, L., 2003. A perturbation method for evaluating background...
  • Cheung, S.-C., Kamath, C., 2004. Robust techniques for background subtraction in urban traffic video. In: Proc. SPIE...
  • Denoulet, J., Mostafaoui, G., Lacassagne, L., Mérigot, A., 2005. Implementing motion markov detection on general...
  • Elgammal, A., Harwood, D., Davis, L., 2000. Non-parametric Model for Background Subtraction. In: Proc. IEEE ECCV....
  • Karmann, K.-P., von Brandt, A., 1990. Moving object recognition using an adaptive background memory. In: Time-Varying...
  • T. Komuro et al.

    A digital vision chip specialized for high-speed target tracking

    IEEE Trans. Electron Dev.

    (2003)
  • Lacassagne, L., Milgram, M., Garda, P., 1999. Motion detection, labeling, data association and tracking in real-time on...
  • Lee, B., Hedley, M., 2002. Background estimation for video surveillance. In: Proc. IVCNZ’02. pp....
  • Manzanera, A., Richefeu, J., Dec. 2004. A robust and computationally efficient motion detection algorithm based on Σ–Δ...
There are more references available in the full text version of this article.

Cited by (168)

  • A real-time surveillance system with multi-object tracking

    2023, Multidimensional Systems and Signal Processing
  • Unsupervised Object Learning via Common Fate

    2023, Proceedings of Machine Learning Research
  • Design and Implementation of Approximate Divider for Error-Resilient Image Processing Applications

    2023, 2023 2nd International Conference on Electrical, Electronics, Information and Communication Technologies, ICEEICT 2023
View all citing articles on Scopus
View full text