Skip to main content
Log in

A survey of deep learning methods and software tools for image classification and object detection

  • Representation, Processing, Analysis and Understanding of Images
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

Deep learning methods for image classification and object detection are overviewed. In particular we consider such deep models as autoencoders, restricted Boltzmann machines and convolutional neural networks. Existing software packages for deep learning problems are compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G. E. Hinton, “Learning multiple layers of representation,” Trends Cognitive Sci. 11, 428–434 (2007).

    Article  Google Scholar 

  2. J. Schmidhuber, Deep learning in neural networks: an overview. http://arxivorg/abs/1404.7828

  3. Resources and pointers to information about Deep Learning. http://deeplearningnet

  4. D. P. Vetrov, “Machine learning: current state and perspectives,” in Proc. of RCDL (Yaroslavl, 2013), Vol. 1, pp. 21–28.

    Google Scholar 

  5. ImageNet. http://wwwimage-netorg

  6. PASCAL Visual Object Challenge. http://pascallinecssotonacuk/challenges/VOC

  7. C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, “Visual categorization with bags of keypoints,” in Proc. ECCV Int. Workshop on Statistical Learning in CV (Prague, 2004).

    Google Scholar 

  8. H. Lee, A. Battle, R. Raina, and A. Y. Ng, “Efficient sparse coding algorithms,” in Proc. of NIPS (Vancouver, 2006), pp. 801–808.

    Google Scholar 

  9. D. Lowe, “Distinctive image features from scaleinvariant keypoints,” Int. J. Comput. Vision 60 (2), 91–110 (2004).

    Article  Google Scholar 

  10. Y. He, K. Kavukcuoglu, Y. Wang, A. Szlam, and Y. Qi, “Unsupervised feature learning by deep sparse coding,” in Proc. of SIAM Int. Conf. on Data Mining (Philadelphia, 2014), pp. 902–910.

    Google Scholar 

  11. J. Yang, K. Yu, and T. Huang, “Supervised translationinvariant sparse coding,” in Proc. of CVPR (San Francisco, 2010), pp. 3517–3524.

    Google Scholar 

  12. Q. Zhang and B. Li, “Discriminative k-svd for dictionary learning in face recognition,” in Proc. of CVPR (San Francisco, 2010), pp. 2691–2698.

    Google Scholar 

  13. Z. Jiang, Z. Lin, and L. S. Davis, “Learning a discriminative dictionary for sparse coding via label consistent k-svd,” in Proc. of CVPR (Colorado Springs, 2011), pp. 1697–1704.

    Google Scholar 

  14. A. Coates, H. Lee, and A. Y. Ng, “An analysis of singlelayer networks in unsupervised feature learning,” in Proc. of Int. Conf. on Artificial Intelligence and Statistics (Ft. Lauderdale, FL, 2011), Vol. 15, pp. 215–223.

    Google Scholar 

  15. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. of NIPS (Lake Tahoe, 2012), pp. 1097–1105.

    Google Scholar 

  16. K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep fisher networks for large-scale image classification,” in Proc. of NIPS (Lake Tahoe, 2013), pp. 163–171.

    Google Scholar 

  17. C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” in Proc. of NIPS (Lake Tahoe, 2013), pp. 2553–2561.

    Google Scholar 

  18. D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object detection using deep neural networks,” in Proc. of CVPR (Columbus, OH, 2014).

    Google Scholar 

  19. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. of CVPR (Columbus, OH, 2014), pp. 580–587.

    Google Scholar 

  20. M. Hayat, M. Bennamoun, and S. An, “Learning nonlinear reconstruction models for image set classification,” in Proc. of CVPR (Columbus, OH, 2014).

    Google Scholar 

  21. M. Ranzato, C. Poultney, and S. Chopra, “Efficient learning of sparse representations with an energy-based model,” in Proc. of NIPS (Vancouver, 2006), pp. 1137–1144.

    Google Scholar 

  22. S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, “Contractive auto-encoders: explicit invariance during feature extraction,” in Proc. of ICML (Bellevue, 2011), pp. 833–840.

    Google Scholar 

  23. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol, “Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion,” J. Mach. Learn. Res. 11, 3371–3408 (2010).

    MathSciNet  MATH  Google Scholar 

  24. K. Kavukcuoglu, P. Sermanet, Y-lan Boureau, K. Gregor, M. Mathieu, and Y. L. Cun, “Learning convolutional feature hierarchies for visual recognition,” in Proc. of NIPS (Vancouver, 2010), pp. 1090–1098.

    Google Scholar 

  25. P. Luo, Y. Tian, X. Wang, and X. Tan, “Switchable deep network for pedestrian detection,” in Proc. of CVPR (Columbus, OH, 2014).

    Google Scholar 

  26. H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” in Proc. of ICML (Montreal, 2009), pp. 609–616.

    Google Scholar 

  27. V. D. Kustikova, N. Yu. Zolotykh, and I. B. Meyerov, “A review of vehicle detection and tracking methods in video,” Vestn. Lobachevsky State Univ. Nizhni Novgorod, No. 5 (2), 347–357 (2012).

    Google Scholar 

  28. P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Trans. PAMI’10 32 (9), 1627–1645 (2010).

    Google Scholar 

  29. J. Shotton, A. Blake, and R. Cipolla, “Contour-based learning for object detection,” in Proc. ICCV (Beijing, 2005), Vol. 1, pp. 503–510.

    Google Scholar 

  30. C. H. Hilario, J. M. Collado, J. M. Armingol, and A. de la Escalera, “Pyramidal image analysis for vehicle detection,” in Proc. of Intelligent Vehicles Symp. (Las Vegas, 2005), pp. 88–93.

    Google Scholar 

  31. Y. Amit, 2D Object Detection and Recognition: Models, Algorithms and Networks (MIT Press, 2002).

    Google Scholar 

  32. M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis and Machine Vision (Thomson, 2008).

    Google Scholar 

  33. Restricted Boltzmann Machines (RBMs). http://wwwdeeplearningnet/tutorial/rbmhtml. Assessed 07.08.2014.

  34. R. Salakhutdinov and G. Hinton, Deep Boltzmann Machines, DBMs. http://wwwcstorontoedu/~fritz/absps/dbmpdf

  35. Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng, “Building high-level features using large scale unsupervised learning,” in Proc. of ICML (Edinburgh, 2012).

    Google Scholar 

  36. Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks and applications in vision,” in Proc. of ISCAS (Paris, 2010), pp. 253–256.

    Google Scholar 

  37. M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Weakly supervised object recognition with convolutional neural networks,” in Proc. of NIPS (Montreal, 2014).

    Google Scholar 

  38. M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and transferring mid-level image representations using convolutional neural networks (2013). http://halinriafr/docs/00/91/11/79/PDF/paperpdf

    Google Scholar 

  39. J.R. R. Uijlings, K.E.A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vision 104 (2), 154–171 (2013).

    Article  Google Scholar 

  40. X. Wang, M. Yang, S. Zhu, and Y. Lin, “Regionlets for generic object detection,” in Proc. of ICCV (Sydney, 2013).

    Google Scholar 

  41. K. Kavukcuoglu, M. Ranzato, R. Fergus, and Y. LeCun, “Learning invariant features through topographic filter maps,” in Proc. of CVPR (Miami, 2009), pp. 1605–1612.

    Google Scholar 

  42. R-CNN–a visual object detection system. https://githubcom/rbgirshick/rcnn

  43. Caffe–a deep learning framework. http://caffeberkeleyvisionorg

  44. nnForgeLibrary. http://milakovgithubio/nnForge

  45. DeapLearnToolbox. https://githubcom/rasmusbergpalm/DeepLearnToolbox

  46. Cuda-convnet–high-performance C++/CUDA implementation of convolutional neural networks. http://codegooglecom/p/cuda-convnet

  47. EBLearn–a machine learning library. http://eblearnsourceforgenet

  48. Cuda CNN Library. http://wwwmathworkscom/matlabcentral/fileexchange/24291-cnnconvolutionalneural-network-class, https://bitbucketorg/intelligenceagent/cudacnnpublic/wiki/Home

  49. DeepMat Library. https://githubcom/kyunghyuncho/deepmat

  50. Package Darch. http://cranr-projectorg/web/packages/darch/indexhtml

  51. Software Environment R. http://wwwr-projectorg

  52. Torch–a scientific computing framework. http://wwwtorchch

  53. Theano Library. https://githubcom/Theano/Theano, http://deeplearningnet/software/theano

  54. Lush programming language. http://lushsourceforgenet

  55. Pylearn2–a machine learning library. http://deeplearningnet/software/pylearn2

  56. Deepnet Library. https://githubcom/nitishsrivastava/deepnet

  57. DeCAFFramework. https://githubcom/UCB-ICSIVision-Group/decaf-release

  58. Cuda-convnet NYU. http://csnyuedu/~wanli/dropc

  59. Hebel–GPU-accelerated deep learning library. https://githubcom/hannes-brt/hebel

  60. CXXNET–a neural network toolkit. https://githubcom/antinucleon/cxxnet

  61. Crino–a neural network library. https://githubcom/jlerouge/crino

  62. A. Courville, J. Bergstra, and Y. Bengio, A spike and slab restricted Boltzmann machine (2011). http://jmlrorg/proceedings/papers/v31/luo13apdf

    Google Scholar 

  63. Y. He, K. Kavukcuoglu, Y. Wang, A. Szlam, and Y. Qi, “Unsupervised feature learning by deep sparse coding,” in Proc. of ICDM (Shenzhen, 2014), pp. 902–910.

    Google Scholar 

  64. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, Image Net large scale visual recognition challenge. http://arxivorg/abs/1409.0575

  65. C. Vens and F. Costa, “Random forest based feature induction,” in Proc. of ICDM (Vancouver, 2011), pp. 744–753.

    Google Scholar 

  66. V. Yu. Martyanov, A. N. Polovinkin, and E. V. Tuv, “Image classification with codebook based on decision tree ensembles,” in Proc. of Intelligent Information Processing Conf. (Guilin, 2012), pp. 480–482.

    Google Scholar 

  67. The Intel® Deep Learning Framework (IDLF). https://githubcom/01org/idlf

  68. Scikit-neuralnetwork Library. https://githubcom/aigamedev/scikit-neuralnetwork

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. N. Druzhkov.

Additional information

This paper uses the materials of the report submitted at the 9th Open German-Russian Workshop on Pattern Recognition and Image Understanding, held in Koblenz, December 1–5, 2014 (OGRW-9-2014).

The article published in the original.

Pavel Nikolaevich Druzhkov Born 1989 Graduated Lobachevsky State University of Nizni Novgorod in 2012. He is a junior research of the Lobachevsky State University of Nizhni Novgorod.

Research interests: machine learning and data mining, computer vision.

Number of publications (monographs and articles): 6.

Valentina Dmitrievna Kustikova Born 1987. Graduated in 2010, Lobachevsky State University of Nizhni Novgorod. Year of dissertation completion (Candidate’s, Doctoral): 2015, Candidate of Engineering Sciences. Assistent at the Lobachevsky State University of Nizhni Novgorod.

Research interests: computer vision, machine learning, parallel computing.

Number of publications (monographs and articles): 8.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Druzhkov, P.N., Kustikova, V.D. A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit. Image Anal. 26, 9–15 (2016). https://doi.org/10.1134/S1054661816010065

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661816010065

Keywords

Navigation