Skip to main content
Log in

Deep learning in multi-object detection and tracking: state of the art

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Object detection and tracking is one of the most important and challenging branches in computer vision, and have been widely applied in various fields, such as health-care monitoring, autonomous driving, anomaly detection, and so on. With the rapid development of deep learning (DL) networks and GPU’s computing power, the performance of object detectors and trackers has been greatly improved. To understand the main development status of object detection and tracking pipeline thoroughly, in this survey, we have critically analyzed the existing DL network-based methods of object detection and tracking and described various benchmark datasets. This includes the recent development in granulated DL models. Primarily, we have provided a comprehensive overview of a variety of both generic object detection and specific object detection models. We have enlisted various comparative results for obtaining the best detector, tracker, and their combination. Moreover, we have listed the traditional and new applications of object detection and tracking showing its developmental trends. Finally, challenging issues, including the relevance of granular computing, in the said domain are elaborated as a future scope of research, together with some concerns. An extensive bibliography is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868

    Article  Google Scholar 

  2. Pal S K (2018) Data science and technology: challenges, opportunities and national relevance. 14th annual convocation speech, national institute of technology, Calicut

  3. Pal S K, Bhoumik D, Chakraborty D B (2020) Granulated deep learning and z-numbers in motion detection and object recognition. Neural Comput Appl 32(21):16533–16548

    Article  Google Scholar 

  4. Chakraborty DB, Pal S K (2021) Granular Video Computing: with Rough Sets, Deep Learning and in IoT. World Scientific, Singapore

  5. Liu Y, Cheng M-M, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3000–3009

  6. Pal S K, King R A (1983) On edge detection of x-ray images using fuzzy sets. IEEE Trans Pattern Anal Mach Intell 5(1):69–77

    Article  Google Scholar 

  7. Deravi F, Pal S K (1983) Grey level thresholding using second-order statistics. Pattern Recogn Lett 1(5-6):417–422

    Article  Google Scholar 

  8. Pal S K, King R A, Hashim AA (1983) Automatic grey level thresholding through index of fuzziness and entropy. Pattern Recogn Lett 1(3):141–146

    Article  Google Scholar 

  9. Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299

  10. Masi I, Wu Y, Hassner T, Natarajan P (2018) Deep face recognition: A survey. In: 2018 31st SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE, pp 471–478

  11. Hasan M, Orgun M A, Schwitter R (2018) A survey on real-time event detection from the twitter data stream. J Inf Sci 44(4):443–463

    Article  Google Scholar 

  12. Brunetti A, Buongiorno D, Trotta G F, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing 300:17–33

    Article  Google Scholar 

  13. Ren X, Zhou Y, He J, Chen K, Yang X, Sun J (2016) A convolutional neural network-based chinese text detection algorithm via text structure modeling. IEEE Trans Multimed 19(3):506–518

    Article  Google Scholar 

  14. Fan D-P, Wang W, Cheng M-M, Shen J (2019) Shifting more attention to video salient object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8554–8564

  15. Pal N R, Pal S K (1993) A review on image segmentation techniques. Pattern Recogn 26 (9):1277–1294

    Article  Google Scholar 

  16. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  17. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361

  18. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  19. Everingham M, Van Gool L, Williams Christopher KI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  20. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, pp 740–755

  21. Kuznetsova A, Rom H, Alldrin N, Uijlings J, Krasin I, Pont-Tuset J, Kamali S, Popov S, Malloci M, Duerig T et al (2018) The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982

  22. Krizhevsky A, Sutskever I, Hinton G E (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  23. Zhang X, Fang Z, Wen Y, Li Z, Qiao Y (2017) Range loss for deep face recognition with long-tailed training data. In: Proceedings of the IEEE International Conference on Computer Vision, pp 5409–5418

  24. Chung D, Tahboub K, Delp E J (2017) A two stream siamese convolutional neural network for person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1983–1991

  25. Zhao S, Liu Y, Han Y, Hong R, Hu Q, Tian Q (2017) Pooling the convolutional layers in deep convnets for video action recognition. IEEE Trans Circ Syst Video Technol 28(8):1839–1849

    Article  Google Scholar 

  26. Geng H-, Zhang H, Xue Y-, Zhou M, Xu G-, Gao Z (2017) Semantic image segmentation with fused cnn features. Optoelectron Lett 13(5):381–385

    Article  Google Scholar 

  27. Chakraborty D B, Pal S K (2016) Neighborhood granules and rough rule-base in tracking. Nat Comput 15(3):359–370

    Article  MathSciNet  MATH  Google Scholar 

  28. Chakraborty D B, Pal S K (2017) Neighborhood rough filter and intuitionistic entropy in unsupervised tracking. IEEE Trans Fuzzy Syst 26(4):2188–2200

    Article  Google Scholar 

  29. Pal S K, Chakraborty D B (2016) Granular flow graph, adaptive rule generation and tracking. IEEE Trans Cybern 47(12):4096– 4107

    Article  Google Scholar 

  30. Wang N, Yeung D-Y (2013) Learning a deep compact image representation for visual tracking. In: Advances in neural information processing systems, pp 809–817

  31. Henriques J F, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  32. Choi J, Jin Chang H, Fischer T, Yun S, Lee K, Jeong J, Demiris Y, Young Choi J (2018) Context-aware deep feature compression for high-speed visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 479–488

  33. Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr Philip HS (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2805–2813

  34. Li J, Zhou X, Chan S, Chen S (2017) Object tracking using a convolutional network and a structured output svm. Comput Vis Media 3(4):325–335

    Article  Google Scholar 

  35. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  36. Danelljan M, Robinson A, Khan F S, Felsberg M (2016) Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European conference on computer vision. Springer, pp 472–488

  37. Ma C, Huang J-B, Yang X, Yang M-H (2015) Hierarchical convolutional features for visual tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3074–3082

  38. Milan A, Rezatofighi S H, Dick A, Reid I, Schindler K (2016) Online multi-target tracking using recurrent neural networks. arXiv:1604.03635

  39. Li P, Wang D, Wang L, Lu H (2018) Deep visual tracking: Review and experimental comparison. Pattern Recogn 76:323–338

    Article  Google Scholar 

  40. Xu Y, Zhou X, Chen S, Li F (2019) Deep learning for multiple object tracking: a survey. IET Comput Vis 13(4):355–368

    Article  Google Scholar 

  41. Leal-Taixé L, Milan A, Schindler K, Cremers D, Reid I, Roth S (2017) Tracking the trackers: an analysis of the state of the art in multiple object tracking. arXiv:1704.02781

  42. Zhao Z-Q, Zheng P, Xu S-, Wu X (2019) Object detection with deep learning: A review. IEEE Trans Neural Networks Learn Syst 30(11):3212–3232

    Article  Google Scholar 

  43. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  44. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  45. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  46. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  47. Weinzaepfel P, Revaud J, Harchaoui Z, Schmid C (2013) Deepflow: Large displacement optical flow with deep matching. In: Proceedings of the IEEE international conference on computer vision, pp 1385–1392

  48. Cheng H Y, Hwang J N (2007) Multiple-target tracking for crossroad traffic utilizing modified probabilistic data association. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol 1. IEEE, pp I–921

  49. Lim Y-C, Lee M, Lee C-H, Kwon S, Lee J- (2010) Improvement of stereo vision-based position and velocity estimation and tracking using a stripe-based disparity estimation and inverse perspective map-based extended kalman filter. Opt Lasers Eng 48(9):859–868

    Article  Google Scholar 

  50. Cao X, Lan J, Yan P, Li X (2012) Vehicle detection and tracking in airborne videos by multi-motion layer analysis. Mach Vis Appl 23(5):921–935

    Article  Google Scholar 

  51. Kim C, Li F, Ciptadi A, Rehg J M (2015) Multiple hypothesis tracking revisited. In: Proceedings of the IEEE international conference on computer vision, pp 4696–4704

  52. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  53. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  54. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) Detnet: A backbone network for object detection. arXiv:1804.06215

  55. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  56. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  57. Ghiasi G, Lin T-Y, Le Q V (2019) Nas-fpn: Learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7036–7045

  58. Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  59. Iandola F N, Han S, Moskewicz M W, Ashraf K, Dally W J, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv:1602.07360

  60. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  61. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  62. Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput 29(9):2352–2449

    Article  MathSciNet  MATH  Google Scholar 

  63. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  64. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  65. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  66. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  67. Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  68. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  69. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  70. Liu J, Zhang S, Wang S, Metaxas D N (2016) Multispectral deep neural networks for pedestrian detection. arXiv:1611.02644

  71. Zadeh L A (2011) A note on z-numbers. Inf Sci 181(14):2923–2932

    Article  MATH  Google Scholar 

  72. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  73. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg A C (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

  74. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  75. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 9259–9266

  76. Zhang S, Wen L, Bian X, Lei Z, Li S Z (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212

  77. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

  78. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  79. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  80. Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9308–9316

  81. Yang Z, Nevatia R (2016) A multi-scale cascade fully convolutional network face detector. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, pp 633–638

  82. Tu W-C, He S, Yang Q, Chien S-Y (2016) Real-time salient object detection with a minimum spanning tree. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2334–2342

  83. Yang J, Yang M-H (2016) Top-down visual saliency via joint crf and dictionary learning. IEEE Trans Pattern Anal Mach Intell 39(3):576–588

    Article  MathSciNet  Google Scholar 

  84. Tomè D, Monti F, Baroffio L, Bondi L, Tagliasacchi M, Tubaro S (2016) Deep convolutional neural networks for pedestrian detection. Signal Process Image Commun 47:482–489

    Article  Google Scholar 

  85. Zhao Z-Q, Bian H, Hu D, Cheng W, Glotin H (2017) Pedestrian detection based on fast r-cnn and batch normalization. In: International Conference on Intelligent Computing. Springer, pp 735–746

  86. Rother C, Bordeaux L, Hamadi Y, Blake A (2006) Autocollage. ACM Trans Graph (TOG) 25(3):847–852

    Article  Google Scholar 

  87. Chakraborty D, Shankar B U, Pal S K (2013) Granulation, rough entropy and spatiotemporal moving object detection. Appl Soft Comput 13(9):4001–4009

    Article  Google Scholar 

  88. Pal S K, Mitra P (2002) Multispectral image segmentation using the rough-set-initialized em algorithm. IEEE Trans Geosci Remote Sens 40(11):2495–2501

    Article  Google Scholar 

  89. Pal S K, Shankar B U, Mitra P (2005) Granular computing, rough entropy and object extraction. Pattern Recogn Lett 26(16):2509–2517

    Article  Google Scholar 

  90. Rosin P L (2009) A simple method for detecting salient regions. Pattern Recogn 42(11):2363–2371

    Article  MATH  Google Scholar 

  91. Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum H-Y (2010) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367

    Google Scholar 

  92. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  93. Gao D, Han S, Vasconcelos N (2009) Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Trans Pattern Anal Mach Intell 31(6):989–1005

    Article  Google Scholar 

  94. Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403

  95. Vig E, Dorr M, Cox D (2014) Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2798–2805

  96. Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 262–270

  97. Wang L, Lu H, Ruan X, Yang M-H (2015) Deep networks for saliency detection via local estimation and global search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3183–3192

  98. Cholakkal H, Johnson J, Rajan D (2018) Backtracking spatial pyramid pooling-based image classifier for weakly supervised top–down salient object detection. IEEE Trans Image Process 27(12):6064–6078

    Article  MathSciNet  Google Scholar 

  99. He S, Lau RWH, Liu W, Huang Z, Yang Q (2015) Supercnn: A superpixelwise convolutional neural network for salient object detection. Int J Comput Vis 115(3):330–344

    Article  MathSciNet  Google Scholar 

  100. Tang Y, Wu X (2016) Saliency detection via combining region-level and pixel-level predictions with cnns. In: European Conference on Computer Vision. Springer, pp 809–825

  101. Wang X, Ma H, Chen X, You S (2017) Edge preserving and multi-scale contextual neural network for salient object detection. IEEE Trans Image Process 27(1):121–134

    Article  MathSciNet  MATH  Google Scholar 

  102. Gao X, Wang N, Tao D, Li X (2012) Face sketch–photo synthesis and retrieval using sparse representation. IEEE Trans Circ Sys Video Technol 22(8):1213–1226

    Article  Google Scholar 

  103. Niu B, Yang Q, Shiu S C K, Pal S K (2008) Two-dimensional laplacianfaces method for face recognition. Pattern Recogn 41(10):3237–3243

    Article  MATH  Google Scholar 

  104. Wang N, Tao D, Gao X, Li X, Li J (2014) A comprehensive survey to face hallucination. Int J Comput Vis 106(1):9–30

    Article  Google Scholar 

  105. Majumder A, Behera L, Subramanian V K (2016) Automatic facial expression recognition system using deep network-based data fusion. IEEE Trans Cybern 48(1):103–114

    Article  Google Scholar 

  106. Jiang H, Learned-Miller E (2017) Face detection with the faster r-cnn. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, pp 650–657

  107. Sun X, Wu P, Hoi Steven CH (2018) Face detection using deep learning: An improved faster rcnn approach. Neurocomputing 299:42–50

    Article  Google Scholar 

  108. Wang H, Li Z, Ji X, Wang Y (2017) Face r-cnn. arXiv:1706.01061

  109. Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection. arXiv:1509.04874

  110. Li Y, Sun B, Wu T, Wang Y (2016) Face detection with end-to-end integration of a convnet and a 3d model. . In: European Conference on Computer Vision. Springer, pp 420–436

  111. Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? . In: European conference on computer vision. Springer, pp 443–457

  112. Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912

  113. Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3361–3369

  114. Reid D (1979) An algorithm for tracking multiple targets. IEEE Trans Autom Control 24 (6):843–854

    Article  Google Scholar 

  115. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 3645–3649

  116. Leal-Taixé L, Canton-Ferrer C, Schindler K (2016) Learning by tracking: Siamese cnn for robust target association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 33–40

  117. Bae S-H, Yoon K-J (2017) Confidence-based data association and discriminative deep appearance learning for robust online multi-object tracking. IEEE Trans Pattern Anal Mach Intell 40(3):595–610

    Article  Google Scholar 

  118. Bae S-H, Yoon K-J (2014) Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1218–1225

  119. Wang B, Wang L, Shuai B, Zuo Z, Liu T, Luk Chan K, Wang G (2016) Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 1–8

  120. Xiang Y, Alahi A, Savarese S (2015) Learning to track: Online multi-object tracking by decision making. In: Proceedings of the IEEE international conference on computer vision, pp 4705– 4713

  121. Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3539–3548

  122. Chen L, Ai H, Shang C, Zhuang Z, Bai B (2017) Online multi-object tracking with convolutional neural networks. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE, pp 645–649

  123. Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4836–4845

  124. Son J, Baek M, Cho M, Han B (2017) Multi-object tracking with quadruplet convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5620–5629

  125. Fang K (2016) Track-rnn: joint detection and tracking using recurrent neural networks. . In: Proceedings of the 29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona

  126. Zhou S, Wang J, Wang J, Gong Y, Zheng N (2017) Point to set similarity based deep feature learning for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3741–3750

  127. Xiang J, Zhang G, Hou J, Sang N, Huang R (2018) Multiple target tracking by learning feature representation and distance metric jointly. arXiv:1802.03252

  128. Cheng D, Gong Y, Zhou S, Wang J, Zheng N (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: Proceedings of the iEEE conference on computer vision and pattern recognition, pp 1335–1344

  129. Ma C, Yang C, Yang F, Zhuang Y, Zhang Z, Jia H, Xie X (2018) Trajectory factory: Tracklet cleaving and re-connection by deep siamese bi-gru for multiple object tracking. In: 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, pp 1–6

  130. Fernando T, Denman S, Sridharan S, Fookes C (2018) Task specific visual saliency prediction with memory augmented conditional generative adversarial networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 1539–1548

  131. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  132. Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D (2014) Deep autoregressive networks. In: International Conference on Machine Learning. PMLR, pp 1242–1250

  133. Fang K, Xiang Y, Li X, Savarese S (2018) Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 466–475

  134. Fernando T, Denman S, Sridharan S, Fookes C (2018) Tracking by prediction: A deep generative model for mutli-person localisation and tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp 1122–1132

  135. Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision, pp 300–311

  136. Kim C, Li F, Rehg J M (2018) Multi-object tracking with neural gating using bilinear lstm. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 200–215

  137. Schulter S, Vernaza P, Choi W, Chandraker M (2017) Deep network flow for multi-object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6951–6960

  138. Tang S, Andres B, Andriluka M, Schiele B (2016) Multi-person tracking by multicut and deep matching. In: European Conference on Computer Vision. Springer, pp 100–111

  139. Li W, Zhao R, Xiao T, Wang X (2014) Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 152–159

  140. Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: A video benchmark for large-scale person re-identification. In: European Conference on Computer Vision. Springer, pp 868–884

  141. Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv:1504.01942

  142. Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) Mot16: A benchmark for multi-object tracking. arXiv:1603.00831

  143. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 4126–4134

  144. Bodla N, Singh B, Chellappa R, Davis L S (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569

  145. Sun S, Akhtar N, Song H, Mian A S, Shah M (2019) Deep affinity network for multiple object tracking. IEEE transactions on pattern analysis and machine intelligence

  146. Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  147. Shen J, Liang Z, Liu J, Sun H, Shao L, Tao D (2018) Multiobject tracking by submodular optimization. IEEE Trans Cybern 49(6):1990–2001

    Article  Google Scholar 

  148. Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–6

  149. Pirsiavash H, Ramanan D, Fowlkes C C (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR 2011. IEEE, pp 1201–1208

  150. Andriyenko A, Schindler K, Roth S (2012) Discrete-continuous optimization for multi-target tracking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1926–1933

  151. Wen L, Li W, Yan J, Lei Z, Yi D, Li S Z (2014) Multiple target tracking based on undirected hierarchical relation hypergraph. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1282–1289

  152. Dicle C, Camps O I, Sznaier M (2013) The way they move: Tracking multiple targets with similar appearance. In: Proceedings of the IEEE international conference on computer vision, pp 2304–2311

  153. Andriyenko A, Schindler K (2011) Multi-target tracking by continuous energy minimization. In: CVPR, vol 2, pp 7

  154. Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE, pp 3464– 3468

  155. He R, Wu X, Sun Z, Tan T (2018) Wasserstein cnn: Learning invariant features for nir-vis face recognition. IEEE Trans Pattern Anal Mach Intell 41(7):1761–1773

    Article  Google Scholar 

  156. Saberian M J, Vasconcelos N (2012) Learning optimal embedded cascades. IEEE Trans Pattern Anal Mach Intell 34(10):2005–2018

    Article  Google Scholar 

  157. Datondji S R E, Dupuis Y, Subirats P, Vasseur P (2016) A survey of vision-based traffic monitoring of road intersections. IEEE Trans Intell Transp Syst 17(10):2681–2698

    Article  Google Scholar 

  158. Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415

    Article  Google Scholar 

  159. Cheng G, Han J (2016) A survey on object detection in optical remote sensing images. ISPRS J Photogramm Remote Sens 117:11–28

    Article  Google Scholar 

  160. Shivakumara P, Tang D, Asadzadehkaljahi M, Lu T, Pal U, Anisi M H (2018) Cnn-rnn based method for license plate recognition. CAAI Trans Intell Technol 3(3):169–175

    Article  Google Scholar 

  161. Sarfraz M, Ahmed M J (2019) An approach to license plate recognition system using neural network. In: Exploring Critical Approaches of Evolutionary Computation. IGI Global, pp 20–36

  162. Nair A S, Raju S, Harikrishnan KJ, Mathew A (2018) A survey of techniques for license plate detection and recognition. i-manager’s J Image Process 5(1):25

    Article  Google Scholar 

  163. Banerjee K, Notz D, Windelen J, Gavarraju S, He M (2018) Online camera lidar fusion and object detection on hybrid data for autonomous driving. In: 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, pp 1632–1638

  164. Arnold E, Al-Jarrah O Y, Dianati M, Fallah S, Oxtoby D, Mouzakitis A (2019) A survey on 3d object detection methods for autonomous driving applications. IEEE Trans Intell Transp Syst 20 (10):3782–3795

    Article  Google Scholar 

  165. Li Z, Dong M, Wen S, Hu X, Zhou P, Zeng Z (2019) Clu-cnns: Object detection for medical images. Neurocomputing 350:53–59

    Article  Google Scholar 

  166. Lu W, Zhou Y, Wan G, Hou S, Song S (2019) L3-net: Towards learning based lidar localization for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6389–6398

  167. Altaf F, Islam Syed MS, Akhtar N, Janjua N K (2019) Going deep in medical image analysis: Concepts, methods, challenges, and future directions. IEEE Access 7:99540–99572

    Article  Google Scholar 

  168. Naji S, Jalab H A, Kareem S A (2019) A survey on skin detection in colored images. Artif Intell Rev 52(2):1041–1087

    Article  Google Scholar 

  169. Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6077–6086

  170. Friedman S, Stamos I (2013) Online detection of repeated structures in point clouds of urban scenes for compression and registration. Int J Comput Vis 102(1-3):112–128

    Article  Google Scholar 

  171. Bai S, An S (2018) A survey on automatic image caption generation. Neurocomputing 311:291–304

    Article  Google Scholar 

  172. Yang W, Tan R T, Feng J, Guo Z, Yan S, Liu J (2019) Joint rain detection and removal from a single image with contextualized deep networks. IEEE Trans Pattern Anal Mach Intell 42(6):1377–1393

    Article  Google Scholar 

  173. Sen D, Pal S K (2008) Generalized rough sets, entropy, and image ambiguity measures. IEEE Trans Syst Man Cybern Part B (Cybern) 39(1):117–128

    Article  Google Scholar 

  174. Ganivada A, Ray S S, Pal S K (2012) Fuzzy rough granular self-organizing map and fuzzy rough entropy. Theor Comput Sci 466:37–63

    Article  MathSciNet  MATH  Google Scholar 

  175. Pal S K, Mitra S (1992) Multi-layer perceptron, fuzzy sets and classification. IEEE Trans Neural Netw 3(5):683–697

    Article  Google Scholar 

  176. Mitra S, Pal S K (1995) Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Trans Neural Netw 6(1):51–63

    Article  Google Scholar 

  177. Sen D, Pal SK (2010) Gradient histogram: thresholding in a region of interest for edge detection. Image Vis Comput 28(4):677–695

    Article  Google Scholar 

  178. Pramanik A, Pal SK, Maiti J, Mitra P (2021) Granulated RCNN and multi-class deep sort for multi-object detection and tracking. IEEE Transactions on Emerging Topics in Computational Intelligence. https://doi.org/10.1109/TETCI.2020.3041019

  179. Pal SK, Banerjee R, Dutta S, Sarma SS (2013) An insight into the Z-number approach to CWW. Fundamenta Informaticae 124(1–2):197–229

    Article  MathSciNet  Google Scholar 

  180. Banerjee R, Pal SK (2015) Z*-numbers: augmented Z-numbers for machine-subjectivity representation. Inform Sci 323:143–178

    Article  MathSciNet  Google Scholar 

  181. Pal SK, Mandal DP (1992) Linguistic recognition system based on approximate reasoning. Inform Sci 61(1–2):135–161

    Article  Google Scholar 

Download references

Acknowledgements

S.K. Pal acknowledges the National Science Chair, SERB-DST, Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anima Pramanik.

Ethics declarations

Conflict of interest

The paper is original in its contents and is not under consideration for publication in any other journals/proceedings. There is no potential conflict of interest to disclose, such as employment, financial or non-financial interest. There is no funding received by this work. The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: 30th Anniversary Special Issue

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pal, S.K., Pramanik, A., Maiti, J. et al. Deep learning in multi-object detection and tracking: state of the art. Appl Intell 51, 6400–6429 (2021). https://doi.org/10.1007/s10489-021-02293-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02293-7

Keywords

Navigation