Review
A comprehensive and systematic review on classical and deep learning based region proposal algorithms

https://doi.org/10.1016/j.eswa.2021.116105Get rights and content

Highlights

  • A comprehensive review of recent works of region proposal algorithms is presented.

  • A taxonomy of region proposals is presented.

  • Advantages and disadvantages of different categories in this area are discussed.

  • Applications in different areas including natural and medical images are reviewed.

  • Different challenges and open-questions are pointed out.

Abstract

Development of region proposal algorithms has rapidly become one of the most critical research areas over recent years. The perfect accuracy of region-based recognition techniques has led to the use of proposal algorithms as an imperative core in various recognition problems. The main purpose of these algorithms is to extract effective regions of an image with an appropriate number that will reduce the search space and increase detection accuracy. The early development of these algorithms was based on a set of hand-crafted features. Recently, with advances in deep learning techniques, they have been widely and successfully applied to the region proposals. This paper reviews region proposal algorithms, theory, and evaluation metrics and also addresses the existing challenges. In addition, we present a classification for generating proposals, including classical and advanced methods based on hand-crafted features and deep learning, respectively. Both categories are described in details, and an extensive review of recent works is presented. The proposal improvement methods, including ranking algorithms, are also described. In total, more than 60 different algorithms have been studied and classified, and we also point out several applications based on region proposals.

Introduction

In the last few years, region proposal has been used as a key method for performing various visual recognition tasks such as text extraction (Gómez and Karatzas, 2017, Nguyen et al., 2017), object detection in natural images (Girshick, 2015, Girshick et al., 2014, He et al., 2015), traffic sign recognition (Ku, Mozian, Lee, Harakeh, & Waslander, 2018), object detection in medical images (Akselrod-Ballin et al., 2016, Mansoor et al., 2019), instance semantic segmentation (Liang, Lin, Wei, Shen, Yang, & Yan, 2017), saliency object detection (Guo, Wang, Shen, Shao, Yang, Tao, et al., 2017), and object segmentation (Hariharan et al., 2014, Liu et al., 2018). According to the remarkable results from region-based methods on standard datasets such as ImageNet (Deng, Dong, Socher, Li, Li, & Fei-Fei, 2009), Pascal VOC (Everingham, Eslami, Van Gool, Williams, Winn, & Zisserman, 2015), and Microsoft COCO (Lin et al., 2014), region proposal can play a central role in visual recognition applications. As a matter of fact, region proposal is gaining research popularity, and numerous studies have developed and improved different region proposal algorithms.

The sliding window technique has widely been used in many applications (Dalal & Triggs, 2005). The results have also shown a considerable improvement in visual recognition problems (Felzenszwalb, Girshick, McAllester, & Ramanan, 2009). However, the sliding window technique processes millions of windows; therefore, their classification would be exceptionally inefficient. Regular grids, fixed scales, and fixed aspect ratios are considered to obtain a reduced set of windows. Although they decrease the number of regions, the search space still remains huge. Hence, it is essential to impose some constraints on the search space as an example branch and bound technique. Despite the significantly decreased number of regions and high efficiency for linear classifiers (Lampert, Blaschko, & Hofmann, 2009), those approaches failed to lower the computation time effectively.

A superior alternative is image segmentation. The first region-based example of object detection was proposed by Gu, Lim, Arbeláez, and Malik (2009), after whom, Endres and Hoeim introduced an independent algorithm for object detection (Endres & Hoiem, 2010). Arbeláez et al. developed a semantic segmentation algorithm in which hierarchical segmentation acquires the initial regions based on a contour detector (Arbeláez, Hariharan, Gu, Gupta, Bourdev, & Malik, 2012). The algorithm performed properly on semantic image segmentation, especially for a number of specific objects. The region-based approach is also known as the region proposal algorithm, which extracts an adequate number of regions from the image. It has been used in various computer vision problems. This method extracts a much smaller number of regions than the previous techniques. For instance, the sliding window technique evaluates more than one million windows, whereas the region proposal method extracts 100000 regions or fewer (Alexe et al., 2010, Carreira and Sminchisescu, 2011, Pont-Tuset et al., 2016, Uijlings et al., 2013). The efficiency of region proposal generation is another significant issue. The quality of region also matters because it shows the effectiveness of proposals in recognition problems. The number of produced regions in proposal algorithms is satisfactory. This intuitively facilitates use of powerful classifiers per region. In this case, the technique is able to improve performance and decrease the error rate. In other words, region proposal methods have achieved a much higher accuracy than the Deformable Parts Model (DPM) technique (Felzenszwalb, McAllester, & Ramanan, 2008) in visual recognition problems (Girshick et al., 2014). In addition, region proposal and deep learning techniques have shown outstanding results in computer vision. Moreover, there are deep learning-based approaches (e.g. You Only Look Once (YOLO) Redmon, Divvala, Girshick, & Farhadi, 2016 and YOLO9000 Redmon & Farhadi, 2017), which have efficiently been implemented without using region proposals and are suitable for real-time applications. Despite their high-speed computation, they are incomparable to region proposals in terms of accuracy.

According to the impressive performance of region proposal algorithms in computer vision applications, it is necessary to introduce its theory and analyze it. In addition, in recent years, the influence of deep learning techniques in the growth of region proposal is considerable. Region proposal is an acceptable approach in image processing and computer vision. Therefore, this article presents a comprehensive review of the recent progress of region proposal algorithm along with its theory and concepts. As stated early, region proposal is used as a practical approach for solving various computer vision applications. On the other hand, several review articles have been presented on computer vision problems. Although most review papers have pointed to the region-based approaches at computer vision tasks, including object detection and semantic segmentation, the authors have not fully investigated region proposal algorithms with details. Review papers, presented over the past five years, have been summarized in Table 1. As it can be seen, several region proposal algorithms have been briefly described. On the contrary, there is only one review extensively stating and analyzing region proposal algorithms; also, the authors have reviewed the existing works until 2015 (Hosang, Benenson, Dollár, & Schiele, 2015). They have conducted an in-depth analysis of 12 different algorithms along with their impact on object detection. Furthermore, Chavali et al. provided an extensive survey on classical region proposals and evaluation metrics (Chavali, Agrawal, Mahendru, & Batra, 2016). They reviewed algorithms up to 2016; however, they did not investigate deep learning-based algorithms. To the best of our knowledge, there is no detailed review about the current developments of region proposals and more specifically, the effectiveness of deep learning techniques on them. Therefore, presenting an overview paper that precisely and comprehensively illustrates region proposal algorithms, concepts, evaluation metrics, applications, challenges, and future directions seems necessary. Differently from Hosang et al. (2015) and Hosang, Benenson, and Schiele (2014), this paper introduces recent region proposal algorithms and their properties. It also tries to compare them and specifically present deep learning-based region proposals. We describe the advantages and disadvantages of region proposal generation algorithms. Furthermore, ranking and refinement algorithms presented for improving region proposals are explained. Moreover, several practical examples are pointed out. We generally intend to make this survey to guide readers and researchers to better understand the region proposal, its strength and weakness, and open problems. Our contributions are mostly: (i) Presenting a comprehensive classification of region proposal algorithms and explaining deep learning-based region proposals; (ii) Reviewing ranking and refinement algorithms; (iii) Pointing to a number of practical examples in various region-based computer vision tasks; and (iv) addressing existing open-problems in region proposals.

The structure of the paper is as follows: afterwards, a review of concepts and theory of the region proposal algorithms is provided in Section 2. The challenges of proposals, along with evaluation metrics, are also explained. Then, in Section 3, we classify the region proposal generation methods and review the existing works. Section 4 presents an overview of the ranking algorithms, and Section 5 refers to computer vision applications using region proposals. Next, we discuss and summarize different region proposals in Section 6, and future directions are also sketched. Finally, Section 7 concludes the paper.

Section snippets

Region proposal: Theory, challenge, and evaluation metrics

Region proposal is used as a preprocessing stage or even the key step in many computer vision issues. The algorithm extracts a pool of appropriate regions, which are likely to contain objects, from an image. The extracted regions are shown as a bounding box or a segmented candidate. In a different manner, we can divide region proposal methods into independent and specific classes. The specific-class region proposal method is adjusted to capture definite objects. It has shown satisfactory

Region proposal generation

This section provides an overview of region proposal algorithms based on the defined classification. Accordingly, classical and advanced methods have been developed for proposal generation. Advanced methods are based on deep learning-based techniques, especially CNNs, whereas classical methods employ low-level features. Classical methods are also divided into window scoring-based and segmented subcategories. Fig. 3 shows a proposed taxonomy for region proposals. Region proposals can be

Ranking algorithms

So far, two practical approaches have been introduced to improve the results of the region proposal algorithms. In the first approach, the ranking algorithm is used to rank proposals which can be performed in different ways. In the second approach, called the refinement method, the proposals are refined according to defined rules. It should be noted that window scoring and advanced methods generally use ranking techniques which are not considered as an independent unit. Moreover, some of

Applications

Region proposal algorithms are used in various computer vision tasks such as object detection, object segmentation, image labeling, and instance semantic segmentation, and have also been evaluated on different datasets. Meanwhile, many practical examples have shown satisfactory performance. In this study we reviewed 65 scientific articles where Fig. 11 shows the percentage of deep learning-based network architectures used in different region-based applications. These network architectures

Discussion and future directions

This section discusses challenges in region proposal algorithms, analyzes and summaries different categories in region proposals. Also, some future directions in this area are presented.

With the capability of region proposals, the region-based approach has been accepted as a beneficial strategy in visual recognition problems and computer vision tasks. This paper reviewed comprehensively region proposal algorithms based on classical and advanced categories. In the classical category, the window

Conclusion

This paper conducted a review on region proposal algorithms representing one of the most significant issues of the recent decade. More concretely, more than 60 different algorithms were reviewed in detail with presenting a classification of the region proposals, including classical and advanced categories. Based on low-level features, classical methods utilize bottom-up segmentation or the sliding-window technique, in which the classical region proposal methods were implemented on the CPU. Deep

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (172)

  • AchantaR. et al.

    Slic superpixels compared to state-of-the-art superpixel methods

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2012)
  • Akselrod-BallinA. et al.

    A region based convolutional network for tumor detection and classification in breast mammography

  • AlexeB. et al.

    What is an object?

  • AlexeB. et al.

    Measuring the objectness of image windows

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2012)
  • ArbeláezP. et al.

    Semantic segmentation using regions and parts

  • ArbeláezP. et al.

    Contour detection and hierarchical image segmentation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2010)
  • Bell, S., Lawrence Zitnick, C., Bala, K., & Girshick, R. (2016). Inside-outside net: Detecting objects in context with...
  • CaesarH. et al.

    Region-based semantic segmentation with end-to-end training

  • CarreiraJ. et al.

    Cpmc: Automatic object segmentation using constrained parametric min-cuts

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • ChaiD.

    Rooted spanning superpixels

    International Journal of Computer Vision

    (2020)
  • ChanL. et al.

    A comprehensive analysis of weakly-supervised semantic segmentation in different image domains

    International Journal of Computer Vision

    (2020)
  • Chavali, N., Agrawal, H., Mahendru, A., & Batra, D. (2016). Object-proposal evaluation protocol is ’gameable’. In...
  • ChenY.P. et al.

    An enhanced region proposal network for object detection using deep learning method

    PLoS One

    (2018)
  • ChenC. et al.

    R-cnn for small object detection

  • Chen, X., Ma, H., Wang, X., & Zhao, Z. (2015). Improving object proposals with multi-thresholding straddling expansion....
  • Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., & Sun, S., et al. (2019). Hybrid task cascade for instance...
  • ChenG. et al.

    A survey of the four pillars for small object detection: Multiscale representation, contextual information, super-resolution, and region proposal

    IEEE Transactions on Systems, Man, and Cybernetics: Systems

    (2020)
  • Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018a). Encoder–decoder with atrous separable...
  • ChengG. et al.

    High-quality proposals for weakly supervised object detection

    IEEE Transactions on Image Processing

    (2020)
  • Cheng, M.-M., Zhang, Z., Lin, W.-Y., & Torr, P. (2014). Bing: Binarized normed gradients for objectness estimation at...
  • ChoM. et al.

    N-rpn: Hard example learning for region proposal networks

  • ComaniciuD. et al.

    Mean shift: A robust approach toward feature space analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • Dai, J., He, K., & Sun, J. (2015a). Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic...
  • Dai, J., He, K., & Sun, J. (2015b). Convolutional feature masking for joint object and stuff segmentation. In...
  • Dai, J., He, K., & Sun, J. (2016a). Instance-aware semantic segmentation via multi-task network cascades. In...
  • DaiJ. et al.

    R-fcn: Object detection via region-based fully convolutional networks

  • DalalN. et al.

    Histograms of oriented gradients for human detection

  • DanielczukM. et al.

    Segmenting unknown 3d objects from real depth images using mask r-cnn trained on synthetic data

  • DengJ. et al.

    Imagenet: A large-scale hierarchical image database

  • EndresI. et al.

    Category independent object proposals

  • Erhan, D., Szegedy, C., Toshev, A., & Anguelov, D. (2014). Scalable object detection using deep neural networks. In...
  • EveringhamM. et al.

    The pascal visual object classes challenge: A retrospective

    International Journal of Computer Vision

    (2015)
  • Fan, H., & Ling, H. (2019). Siamese cascaded region proposal networks for real-time visual tracking. In Proceedings of...
  • FelzenszwalbP.F. et al.

    Object detection with discriminatively trained part-based models

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • FelzenszwalbP.F. et al.

    Efficient graph-based image segmentation

    International Journal of Computer Vision

    (2004)
  • FelzenszwalbP. et al.

    A discriminatively trained, multiscale, deformable part model

  • GengQ. et al.

    Survey of recent progress in semantic image segmentation with cnns

    Science China. Information Sciences

    (2018)
  • Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., & Van Gool, L. (2015). Deepproposal: Hunting objects by...
  • GhoshS. et al.

    Understanding deep learning techniques for image segmentation

    ACM Computing Surveys

    (2019)
  • Gidaris, S., & Komodakis, N. (2015). Object detection via a multi-region and semantic segmentation-aware cnn model. In...
  • Cited by (10)

    View all citing articles on Scopus
    View full text