A new motion detection algorithm based on Σ–Δ background estimation
Introduction
The detection of moving objects in an image sequence is a very important low-level task for many computer vision applications, such as video surveillance, traffic monitoring or sign language recognition. When the camera is stationary, a class of methods usually employed is background subtraction. The principle of these methods is to build a model of the static scene (i.e., without moving objects) called background, and then to compare every frame of the sequence to this background in order to discriminate the regions of unusual motion, called foreground (the moving objects).
Many algorithms have been developed for background subtraction: recent reviews and evaluations can be found in (Lee and Hedley, 2002, Chalidabhongse et al., 2003, Cheung and Kamath, 2004, Piccardi, 2004). In this paper, we are more specifically interested in video surveillance systems with long autonomy. The difficulty in devising background subtraction algorithms in such context lies in the respect of several constraints:
- •
The system must keep working without human interaction for a long time, and then take into account gradual or sudden changes such as illumination variation or new static objects settling in the scene. This means that the background must be temporally adaptive.
- •
The system must be able to discard irrelevant motion such as waving bushes or flowing water. It should also be robust to slight oscillations of the camera. This means that there must be a local estimation for the confidence in the background value.
- •
The system must be real-time, compact and low-power, so the algorithms must not use much resource, in terms of computing power and memory.
The two first conditions imply that statistical measures on the temporal activity must be locally available in every pixel, and constantly updated. This excludes any basic approach like using a single model such as the previous frame or a temporal average for the background, and global thresholding for decision.
Some background estimation methods are based on the analysis of the histogram of the values taken by each pixel within a fixed number K of past frames. The mean, the median or the mode of the histogram can be chosen to set the background value, and the foreground can be discriminated by comparing the difference between the current frame and the background with the histogram variance. More sophisticated techniques are also based on the K past frames history: linear prediction (Toyoma et al., 1999), kernel density estimation (Elgammal et al., 2000, Mittal and Paragios, 2004), or principal component analysis (Oliver et al., 2000). These methods require a great amount of memory, since K needs to be large (usually more than 50) for robustness purposes. So they are not compatible with our third condition.
Much more attractive for our requirements are the recursive methods, that do not keep in memory a histogram for each pixel, but rather a fixed number of estimates computed recursively. These estimates can be the mean and variance of a Gaussian distribution (Wren et al., 1997), different states of the background (e.g., its values and temporal derivatives) estimated by predictive filter (Karmann and von Brandt, 1990), or recursive estimation of the extremal values (Richefeu and Manzanera, 2004). But it is difficult to get robust estimates of the background with linear recursive framework, unless a multi-modal distribution (e.g., multiple Gaussian (Stauffer and Grimson, 2000, Power and Schoonees, 2002)) is explicitly used, which is done at the price of an increasing complexity and memory requirement. Furthermore, these methods rely on parameters such as the learning rates used in the recursive linear filters, setting the relative weights of the background states and the new observations, whose tuning can be tricky, which makes difficult the fulfillment of the first condition stated above.
A recursive approximation of the temporal median was proposed in (McFarlane and Schofield, 1995) to compute the background. The interest of this method lies in the robustness provided by the non linearity compared to the linear recursive average, and in the very low computational cost. In this article, we investigate some nice properties of this operator, introducing the notion of Σ–Δ estimation, and using it to obtain a locally adaptive motion detection.
In Section 2, we present the basic Σ–Δ estimation method. The Σ–Δ filter is presented and used to compute two orders of temporal statistics for each pixel of the sequence providing a pixel-level decision framework. Then, in Section 3, we exploit the spatial correlation in these data using new hybrid linear/morphological operators, and use higher level processing to enhance and regularize the detection solution. Some results are presented, illustrating the robustness and accuracy of the method in the case of simple background (i.e., one single time-varying mode). For more complex scenes, we propose in Section 4 a generalization of the algorithm to multiple background estimation. Finally, conclusions and future works are presented in Section 5.
Section snippets
Σ–Δ estimation
Our first background estimate, whose computation is shown on Table 1(1), is the same as (McFarlane and Schofield, 1995), where It is the input sequence, and Mt the estimated background value. The sign function sgn is defined as sgn(a) = −1 if a < 0, sgn(a) = 1 if a > 0, and sgn(a) = 0 if a = 0. So, at every frame, the estimate is simply incremented by one if it is smaller than the sample, or decremented by one if it is greater than the sample. If It is a discrete random signal, the ratio between the number
Spatiotemporal processing
Recently, we have presented a Markovian modeling to perform a spatiotemporal regularization of the pixel-level Σ–Δ detection. It was an adaptation of the iterative algorithm presented in (Caplier et al., 1996, Lacassagne et al., 1999), using the pixel-level detection Dt as initialization, and the Σ–Δ difference Δt and variance Vt, as a couple of observation fields used in the design of the energy. Details can be found in (Manzanera and Richefeu, 2004).
We present here another regularization
Multiple background Σ–Δ estimation
The spatiotemporal processing and relevance feedback, presented in the previous section, allows a visible enhancement of the robustness, in the case of slowing down, stopping or radially moving objects. Nevertheless, the Σ–Δ estimator is characterized by a time constant: its updating period, which has a dimension of number of gray levels per second. This induces a limitation of the basic approach in the adaptation capability to certain complex scenes, typically in the case of scenes permanently
Conclusions
We have presented a new algorithm allowing a robust and accurate detection of moving objects for a small cost in memory consumption and computational complexity. We have emphasized the nice properties of the Σ–Δ filter for the detection of salient features in time-varying signal, showing that the interest of such filter goes well beyond its temporal median convergence property.
We have proposed a new spatiotemporal regularization strategy, using an original hybrid reconstruction method and
Acknowledgements
The authors wish to thank Lionel Lacassagne, who proposed improvements on the original algorithm, and provided most of the sequences used in this article.
References (20)
- et al.
Mrf based motion detection algorithm image processing board implementation
Traitement du signal
(1996) - Chalidabhongse, T.H., Kim, K., Harwood, D., Davis, L., 2003. A perturbation method for evaluating background...
- Cheung, S.-C., Kamath, C., 2004. Robust techniques for background subtraction in urban traffic video. In: Proc. SPIE...
- Denoulet, J., Mostafaoui, G., Lacassagne, L., Mérigot, A., 2005. Implementing motion markov detection on general...
- Elgammal, A., Harwood, D., Davis, L., 2000. Non-parametric Model for Background Subtraction. In: Proc. IEEE ECCV....
- Karmann, K.-P., von Brandt, A., 1990. Moving object recognition using an adaptive background memory. In: Time-Varying...
- et al.
A digital vision chip specialized for high-speed target tracking
IEEE Trans. Electron Dev.
(2003) - Lacassagne, L., Milgram, M., Garda, P., 1999. Motion detection, labeling, data association and tracking in real-time on...
- Lee, B., Hedley, M., 2002. Background estimation for video surveillance. In: Proc. IVCNZ’02. pp....
- Manzanera, A., Richefeu, J., Dec. 2004. A robust and computationally efficient motion detection algorithm based on Σ–Δ...
Cited by (168)
Comprehensive comparative evaluation of background subtraction algorithms in open sea environments
2021, Computer Vision and Image UnderstandingDesign of Energy-Efficient Approximate Arithmetic Circuits for Error Tolerant Medical Image Processing Applications
2024, Lecture Notes in Electrical EngineeringA real-time surveillance system with multi-object tracking
2023, Multidimensional Systems and Signal ProcessingUnsupervised Object Learning via Common Fate
2023, Proceedings of Machine Learning ResearchDesign and Implementation of Approximate Divider for Error-Resilient Image Processing Applications
2023, 2023 2nd International Conference on Electrical, Electronics, Information and Communication Technologies, ICEEICT 2023Vehicles Detection for Smart Roads Applications on Board of Smart Cameras: A Comparative Analysis
2022, IEEE Transactions on Intelligent Transportation Systems