Elsevier

Pattern Recognition Letters

Volume 34, Issue 14, 15 October 2013, Pages 1687-1693
Pattern Recognition Letters

A novel multiplex cascade classifier for pedestrian detection

https://doi.org/10.1016/j.patrec.2013.04.015Get rights and content

Highlights

  • A novel multiplex classifier model is proposed for pedestrian detection.

  • A new fusion strategy is introduced by cascading classifiers.

  • The weighted linear regression model is introduced to train the weak classifiers in our model.

  • A structure table is introduced to label the foreground pixels by means of background differences.

Abstract

Reliable pedestrian detection is of great importance in visual surveillance. In this paper, we propose a novel multiplex classifier model, which is composed of two multiplex cascades parts: Haar-like cascade classifier and shapelet cascade classifier. The Haar-like cascade classifier filters out most of irrelevant image background, while the shapelet cascade classifier detects intensively head-shoulder features. The weighted linear regression model is introduced to train its weak classifiers. We also introduce a structure table to label the foreground pixels by means of background differences. The experimental results illustrate that our classifier model provides satisfying detection accuracy. In particular, our detection approach can also perform well for low resolution and relatively complicated backgrounds.

Introduction

Pedestrian detection is essential and significant in intelligent visual surveillance systems, as it can provide the fundamental information for semantic understanding (Dollár et al., 2012). Some promising applications are exhibited in different fields, such as video surveillance (Viola et al., 2005, Haritaoglu et al., 2000, Wang et al., 2012), video conferences (Yang et al., 1996) and driver assistant systems (Zhao and Thorpe, 2000, Gavrila and Munder, 2007, Geronimo et al., 2010). It has been rapidly developed in recent years (Gao et al., 2009, Cerri et al., 2010, Enzweiler and Gavrila, 2009, Hou and Pang, 2011, Dollár et al., 2009). Viola and Jones (2001) presented Haar-like classifier to rapidly detect objects using AdaBoost classifier cascades in conjunction with Haar-like features. Rather than using the intensity values of a pixel, these Haar-like features use the change in contrast values between adjacent rectangular groups of pixels. The contrast variances are used to determine relative light and dark areas. The adjacent pixel groups with a relative contrast variance form a Haar-like feature. Haar features can easily be scaled by increasing or decreasing the size of the pixel group being detected. This allows these features to be used to detect objects of various sizes (Wilson and Fernandez, 2006). The main strength of Haar-like classifier is its fast detection. But for the complicated environments or images with noise, increasing its stage number cannot improve the detection accuracy but increase the computational cost. Usually the shape features are considered as the most important cue for pedestrian detection. Sabzmeydani and Mori (2007) introduced shapelet feature concept to describe local pieces of shape, which are formed by human features, such as head, shoulder, body. Their shapelet cascade classifier provided lower error rate for pedestrian detection. But it is usually time-consuming to analyze multi-level shapelet features.

In this paper, we propose a novel multiplex classifier model, which is composed of two parts: Haar-like cascade classifier and shapelet cascade classifier. The two classifiers are multiplex cascades. The Haar-like cascade classifier filters out most of irrelevant image background effectively, while the shapelet cascade classifier detect intensively head-shoulder features further. The weighted linear regression model is introduced to train the weak classifiers in our multiplex classifier model. We also introduce a structure table to label the foreground pixels of each frame of the video by means of background differences, which is helpful to decrease computational cost and to improve detection speed further.

The rest of the paper is organized as follows. Related works about pedestrian detection is given in Section 2. Cascade Adaboost classifier is introduced briefly and its performance is analyzed in Section 3. Our multiplex classifier model is proposed in Section 4. And its learning algorithm and foreground pixel labeling approach are both discussed in detail. Experiment results and analysis are presented in Section 5 and some conclusions are provided towards the end.

Section snippets

Related works

Classifier performance is most of important factors in pedestrian detection and thus it is paid more attentions (Xu et al., 2011, Cheng and Jhan, 2013). Freund and Schapire (1997) proposed AdaBoost, which is a classifier model through constructing a strong classifier as linear combination of simple weak classifiers. In the classifier model, its subsequent classifiers can be tweaked in favor of those instances misclassified by previous classifiers. But AdaBoost is sensitive to noisy data and

Cascade Adaboost classifier

Viola and Jones (2001) proposed AdaBoost cascade classifier, which is composed by cascading a number of Adaboost classifiers. Every stage of the cascade classifier either rejects the window or passes it to the next stage. Only the last stage may finally accept a window, but rejection may happen at any stage (Wojnarski, 2007, Landesa-Vázquez and Alba-Castro, 2012). According to the cascade Adaboost classifier training theory, the probability that a rejected sample is a positive sample at stage K

Multiplex cascade classifier

In this section, we present our multiplex cascade classifier. Firstly, our classifier framework will be discussed. And then its learning algorithm will be illustrated for the proposed multiplex classifier. Considering the detection time and cost, we further proposed the foreground pixels labeling algorithm to improve the image search spaces.

Experiment results and discussions

To illustrate the effectiveness and performance of the proposed classifier model, we consider different instances as follows. Some of them were collected from pedestrian detection benchmark datasets,1,2 and others were obtained by our own video surveillance system. In our experiments, the single cascade classifier and simple cascade classifier were used to compare the performance with our multiplex cascade classifier.

Conclusions

In this paper, we presented a multiplex classifier model, which is composed of two multiplex cascades parts: Haar-like cascade classifier and shapelet cascade classifier. The Haar-like cascade classifier filters out most of irrelevant image background, while the shapelet cascade classifier detects intensively head-shoulder features. The weighted linear regression model was introduced to train its weak classifiers. We further introduced a structure table to label the foreground pixels by means

Acknowledgments

The authors thank Lixia Wang and Xiaoqing Ding for their scientific collaboration in this research work. This work is partly supported by the National Natural Science Foundation of China (Grant Nos. 61074029, 61173035), the Program for New Century Excellent Talents in University (Grant No. NCET-11-0861), and the Natural Science Foundation of Liaoning Province (Grant No. 20102014).

References (32)

  • P. Felzenszwalb et al.

    Cascade object detection with deformable part models

  • W. Gao et al.

    Adaptive contour features in oriented granular space for human detection and segmentation

  • D. Gavrila et al.

    Multi-cue pedestrian detection and tracking from a moving vehicle

    International Journal of Computer Vision

    (2007)
  • D. Geronimo et al.

    Survey of pedestrian detection for advanced driver assistance systems

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2010)
  • I. Haritaoglu et al.

    W4: real-time surveillance of people and their activities

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2000)
  • Y. Hou et al.

    People counting and human detection in a challenging situation

    IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans

    (2011)
  • Cited by (18)

    • Scene-dependent proposals for efficient person detection

      2019, Pattern Recognition
      Citation Excerpt :

      In order to reduce the cost of feature extraction in the pyramid, Dollár et al. extended the ICF into the Aggregate Channel Feature (ACF) [15], where distinct channel features obtained from block of pixels are aggregated. Other researchers have proposed new classifier architectures that perform effective classification at a reduced computational cost [20–22,27,28]. Bourdev and Brandt proposed the Soft Cascade [20] where detections are evaluated at each node by also taking into account the weak-classifiers responses of the previous nodes.

    • Comparison of 2D image models in segmentation performance for 3D laser point clouds

      2017, Neurocomputing
      Citation Excerpt :

      Moreover, among 2D image models, BA image is a proper choice for data acquired with a fixed laser scanner while FOBA image is more suitable for those captured with a moving laser scanner. The usage of 2D image models is not limited in scene segmentation [19,20]. Various features can be extracted from the 2D images converted from 3D laser point clouds and can also be utilized in outdoor scene classification and understanding with some learning algorithms [21–23].

    • Progressive subspace ensemble learning

      2016, Pattern Recognition
      Citation Excerpt :

      For example, Rasheed et al. [27] used the classifier ensemble approach for electromyographic signal decomposition. Tian et al. [28] designed a classifier ensemble consisting of Haar-like and shapelet components for pedestrian detection. Guo et al. [29] proposed a two-stage pedestrian detection algorithm using AdaBoost and SVM.

    • An effective learning strategy for cascaded object detection

      2016, Information Sciences
      Citation Excerpt :

      This concept, however, cannot be applied to all the categories of “things” present in the images. Several real-world applications [2,7,26,29,30] deal with objects that are not distinguishable from the background since they are not neatly different from their surroundings and are not unique within the image. This situation can depend on many factors such as the characteristics of the employed sensor, the size of the objects or the resolution of the images at hand.

    View all citing articles on Scopus
    View full text