Elsevier

Neurocomputing

Volume 214, 19 November 2016, Pages 758-766
Neurocomputing

Traffic sign detection and recognition using fully convolutional network guided proposals

https://doi.org/10.1016/j.neucom.2016.07.009Get rights and content

Abstract

Detecting and recognizing traffic signs is a hot topic in the field of computer vision with lots of applications, e.g., safe driving, path planning, robot navigation etc. We propose a novel framework with two deep learning components including fully convolutional network (FCN) guided traffic sign proposals and deep convolutional neural network (CNN) for object classification. Our core idea is to use CNN to classify traffic sign proposals to perform fast and accurate traffic sign detection and recognition. Due to the complexity of the traffic scene, we improve the state-of-the-art object proposal method, EdgeBox, by incorporating with a trained FCN. The FCN guided object proposals can produce more discriminative candidates, which help to make the whole detection system fast and accurate. In the experiments, we have evaluated the proposed method on publicly available traffic sign benchmark, Swedish Traffic Signs Dataset (STSD), and achieved the state-of-the-art results.

Introduction

Traffic scene analysis is a very important topic in computer vision and intelligent systems [1], [2], [3], [4], [5], [6]. Traffic signs are designed to inform drivers of the current condition and other important information of the road. They are rigid and simple shapes with eye-catching colors and the information they carry is easy to understand. However, accidents may still occur when drivers do not pay attention to a traffic sign in time. Hence, it is very important to design an automatic real-time driver assistance system to detect and recognize traffic signs.

Although the research community and industry in the field of computer vision have achieved significant progresses in traffic sign detection and recognition in the past decades, there are two main inevitable difficulties. One is poor image quality due to low resolution, motion blur and noise. The other is uncontrolled environmental factors including weather conditions, complex background, variable illumination, sign color fading, etc. Fig. 1 shows some difficult examples. These make the traffic sign detection and recognition task still an open problem.

In order to tackle these problems, we propose a novel and efficient method to detect and recognize traffic signs. Generally, traffic signs have a distinctive property that they always appear on the two sides of the road to be traveled. Taking advantage of this property, the proposed method extracts traffic sign proposals by the guidance of fully convolutional network, unlike traditional traffic sign detection and recognition methods, which usually start from color segmentation [7], [8], [9], [10], [11], shape detection [12], [13], [14], [15] or sliding-window scanning [16], [17] to find traffic sign regions. This can effectively reduce the search range of traffic signs, so as to reduce the number of proposals and improve the efficiency.

The pipeline of the proposed method for traffic sign detection is illustrated in Fig. 2, which is mainly composed of two parts. One is the proposal extraction stage guided by fully convolutional network [18] and EdgeBox [19]. The other is the traffic sign classification stage using convolutional neural network [20]. Given a scene image, coarse regions of traffic signs are generated by fully convolutional network firstly. Then, proposals of traffic signs are extracted by EdgeBox from the coarse sign regions. Finally, traffic signs are identified and false positives are eliminated by a trained convolutional neural network classifier and the optimal bounding boxes are retained by non-maximal suppression.

Different from the previous methods on both traffic sign detection and general object detection, a novel FCN guided object proposal method is proposed. In the case of traffic sign detection where pixel-level annotation is unavailable, we propose to train FCN using boundingbox-level annotation which is a typical weak supervision for the semantic segmentation method like FCN.

The proposed method is validated on a publicly available database: Swedish Traffic Signs Dataset (STSD) [21]. The experimental results demonstrate that the proposed method has a much higher detection rate than other methods while producing less false positives. In addition, the proposed method is not time consuming, which is potentially to be applied in real time applications by utilizing the computational power of GPU devices.

Object detection using proposal is popular in recent literatures, especially after R-CNN [22] series works have achieved the state-of-the-art object detection performance on PASCAL VOC [23] and ILSVRC detection task [24]. The proposed method follows R-CNN, but there are at least three major differences from R-CNN: (1) We extent the R-CNN in a new application, the challenging traffic sign detection problem. (2) We adopt a much faster object proposal method, EdgeBox, rather than the selective search method [25] which is more accurate but slower. (3) We proposed a novel FCN guided object proposal which is both fast and accurate.

In summary, the main contributions of this work can be concluded in three folds: first, this paper proposes a new framework for traffic sign detection and recognition based on object proposal, which works in a coarse-to-fine manner. Second, due to the guidance of heat maps of traffic signs that generated by fully convolutional networks, the searching area of signs is drastically narrowed. Third, an efficient multi-class CNN model is trained for sign recognition with the bootstrapping strategy. The proposed method achieves the state-of-the art performance on the traffic sign benchmark.

The remainder of this paper is organized as follows: Section 2 presents recent works related to the proposed method. Section 3 details the proposed method, including coarse sign region segmentation, proposal extraction and sign classification. In Section 4, the experimental results and discussions are presented. Finally, the conclusions and future works are given in Section 5.

Section snippets

Related work

Over the last decade, research in traffic sign detection and recognition has grown rapidly. A large number of novel ideas and effective methods have been proposed [7], [8], [9], [10], [12], [13], [16], [17], [21], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]. Usually, the detection part hunts potential regions of traffic signs and the recognition part determines the category of traffic signs.

The previous traffic detection methods can be divided into

Approach

In this section, we present the details for the detection pipeline illustrated in Fig. 2. It includes two main components, the novel proposal extraction component and the CNN classification component with training and testing details.

Dataset and evaluation protocol

The proposed method is evaluated on a publicly available database: Swedish Traffic Signs Dataset (STSD) [21]. STSD consists of two sets (Set1 and Set2) that contain more than 20,000 frames in total recorded from Swedish highways and city roads and every fifth frame has been manually labeled. Part0 for each set is annotated (roughly 2000 images) while Part1–Part4 are not annotated. Some examples from STSD can be seen in Fig. 5. The annotations for each image present status, location, type, and

Conclusions

In this paper, we propose a novel and efficient method to detect and recognize traffic signs. The main contributions of this paper are the following: (1) We propose a new framework of traffic sign detection and recognition based on proposals by the guidance of fully convolutional network, which largely reduces the search area of traffic signs under the premise of ensuring the detection rate. (2) We extent the R-CNN in a new application, the challenging traffic sign detection and recognition

Yingying Zhu was born in 1988. She received the B.S. degree in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2011. She is currently a Ph.D. student with the School of Electronic Information and Communications, HUST. Her research areas mainly include text/traffic sign detection and recognition in natural images.

References (56)

  • F. Zaklouta et al.

    Real-time traffic sign recognition in three stages

    Robot. Auton. Syst.

    (2014)
  • Z.-L. Sun et al.

    Application of bw-elm model on traffic sign recognition

    Neurocomputing

    (2014)
  • F. Yang et al.

    Dynamic texture recognition by aggregating spatial and temporal features via ensemble svms

    Neurocomputing

    (2016)
  • B. Shi et al.

    Script identification in the wild via discriminative convolutional neural network

    Pattern Recognit.

    (2016)
  • Z. Zhang et al.

    Practical camera calibration from moving objects for traffic scene surveillance

    IEEE Trans. Circuits Syst. Video Technol.

    (2013)
  • C. Yao, X. Bai, W. Liu, L. Latecki, Human detection using learned part alphabet and pose dictionary, in: Proceedings of...
  • C. Hu et al.

    Learning discriminative pattern for real-time car brand recognition

    IEEE Trans. Intell. Transp. Syst.

    (2015)
  • A. De La Escalera et al.

    Road traffic sign detection and classification

    IEEE Trans. Ind. Electron.

    (1997)
  • S. Maldonado-Bascón et al.

    Road-sign detection and recognition based on support vector machines

    IEEE Trans. Intell. Transp. Syst.

    (2007)
  • M.Á. García-Garrido, M.Á. Sotelo, E. Martín-Gorostiza, Fast road sign detection using Hough transform for assisted...
  • G. Loy, N. Barnes, Fast shape-based road sign detection for a driver assistance system, in: 2004 Proceedings of...
  • X. Bai et al.

    3d shape matching via two layer coding

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2015)
  • I.M. Creusen, R.G. Wijnhoven, E. Herbschleb, P. De With, Color exploitation in hog-based traffic sign detection, in:...
  • X. Baró et al.

    Traffic sign recognition using evolutionary adaboost detection and forest-ecoc classification

    IEEE Trans. Intell. Transp. Syst.

    (2009)
  • J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of CVPR,...
  • C.L. Zitnick, P. Dollár, Edge boxes: locating object proposals from edges, in: Proceedings of ECCV, Springer, Zurich,...
  • A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances...
  • F. Larsson, M. Felsberg, Using Fourier descriptors and spatial models for traffic sign recognition, in: Image Analysis,...
  • Cited by (209)

    • Indian traffic sign detection and recognition using deep learning

      2023, International Journal of Transportation Science and Technology
      Citation Excerpt :

      Majority of the TSDR work, carried out for traffic signs, outside India, i.e., dealt with a limited number of categories associated with Advanced Driver Assistance Systems (ADAS) (Timofte et al., 2011). Consequently, benchmarks related to TSDR have fewer categories of traffic sign, focused only on recognition of traffic sign (Zaklouta and Stanciulescu, 2012) and (Zhu et al., 2016a,b). It can be very difficult to detect many other categories of traffic signs with higher disparity in appearance, which are not included in the benchmarks.

    View all citing articles on Scopus

    Yingying Zhu was born in 1988. She received the B.S. degree in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2011. She is currently a Ph.D. student with the School of Electronic Information and Communications, HUST. Her research areas mainly include text/traffic sign detection and recognition in natural images.

    Chengquan Zhang was born in 1990. He received the B.S. degree in electronics and information engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China in 2014. He is currently a Master student with the school of Electronic Information and Communications, HUST. His main research interests include image classification and scene text detection.

    Duoyou Zhou was born in 1993. He received the B.S. degree in college of Information Science & Technology from the Hainan University (HNU), Haikou, China in 2015. He is currently a Master student with the school of Electronic Information and Communications, HUST. His main research interests include object segmentation and scene text detection.

    Xinggang Wang is an Assistant Professor of School of Electronics Information and Communications of Huazhong University of Science and Technology. He received his Bachelors' degree in communication and information system and Ph.D. degree in Computer Vision both from Huazhong University of Science and Technology. From May 2010 to July 2011, he was with the Department of Computer and Information Science, Temple University, Philadelphia, PA, as a visiting scholar. From February 2013 to September 2013, he was with the University of California, Los Angeles, as a Visiting Graduate Researcher. He is a Reviewer of IEEE Transaction on Cybernetics, Pattern Recognition, Computer Vision and Image Understanding, Neurocomputing, CVPR, ICCV, ECCV, etc. His research interests include computer vision and machine learning.

    Bai Xiang is currently a Professor with the School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST). He received the B.S., M.S., and Ph.D. degrees from HUST in 2003, 2005, and 2009 respectively. His research interests include computer vision and pattern recognition, specifically including object recognition, shape analysis, scene text recognition and intelligent systems. He is now serving as the Editorial Board Member of Pattern Recognition Letters, Neurocomputing, Frontier of Computer Science.

    Wenyu Liu was born in 1963. He received the B.S. degree in Computer Science from Tsinghua University, Beijing, China, in 1986, and the M.S. and Ph.D. degrees, both in Electronics and Information Engineering, from Huazhong University of Science & Technology (HUST), Wuhan, China, in 1991 and 2001, respectively. He is now a professor and associate dean of the School of Electronic Information and Communications, HUST. His current research areas include computer vision, multimedia, and sensor network. He is a senior member of IEEE.

    View full text