计算机应用 ›› 2016, Vol. 36 ›› Issue (9): 2508-2515.DOI: 10.11772/j.issn.1001-9081.2016.09.2508
收稿日期:
2016-03-30
修回日期:
2016-04-20
出版日期:
2016-09-10
发布日期:
2016-09-08
通讯作者:
李彦冬
作者简介:
李彦冬(1984-),男,四川泸州人,博士研究生,主要研究方向:机器学习、计算机视觉;郝宗波(1977-),男,河南新乡人,副教授,博士,主要研究方向:图像理解、视频信息处理;雷航(1960-),男,四川自贡人,教授,博士,主要研究方向:图像处理。
基金资助:
LI Yandong, HAO Zongbo, LEI Hang
Received:
2016-03-30
Revised:
2016-04-20
Online:
2016-09-10
Published:
2016-09-08
Supported by:
摘要: 近年来,卷积神经网络在图像分类、目标检测、图像语义分割等领域取得了一系列突破性的研究成果,其强大的特征学习与分类能力引起了广泛的关注,具有重要的分析与研究价值。首先回顾了卷积神经网络的发展历史,介绍了卷积神经网络的基本结构和运行原理,重点针对网络过拟合、网络结构、迁移学习、原理分析四个方面对卷积神经网络在近期的研究进行了归纳与分析,总结并讨论了基于卷积神经网络的相关应用领域取得的最新研究成果,最后指出了卷积神经网络目前存在的不足以及未来的发展方向。
中图分类号:
李彦冬, 郝宗波, 雷航. 卷积神经网络研究综述[J]. 计算机应用, 2016, 36(9): 2508-2515.
LI Yandong, HAO Zongbo, LEI Hang. Survey of convolutional neural network[J]. Journal of Computer Applications, 2016, 36(9): 2508-2515.
[1] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [2] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets [J]. Neural Computation, 2006, 18(7): 1527-1554. [3] LEE H, GROSSE R, RANGANATH R, et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations [C]// ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. New York: ACM, 2009: 609-616. [4] HUANG G B, LEE H, ERIK G. Learning hierarchical representations for face verification with convolutional deep belief networks [C]// CVPR '12: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2012: 2518-2525. [5] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2012: 1106-1114. [6] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 580-587. [7] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 3431-3440. [8] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2015-11-04]. http://www.robots.ox.ac.uk:5000/~vgg/publications/2015/Simonyan15/simonyan15.pdf. [9] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 1-8. [10] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [EB/OL]. [2016-01-04]. https://www.researchgate.net/publication/286512696_Deep_Residual_Learning_for_Image_Recognition. [11] PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359. [12] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch [J]. Journal of Machine Learning Research, 2011, 12(1): 2493-2537. [13] OQUAB M, BOTTOU L, LAPTEV I, et al. Learning and transferring mid-level image representations using convolutional neural networks [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1717-1724. [14] HUBEL D H, WIESEL T N. Receptive fields, binocular interaction, and functional architecture in the cat's visual cortex [J]. Journal of Physiology, 1962, 160(1): 106-154. [15] FUKUSHIMA K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position [J]. Biological Cybernetics, 1980, 36(4): 193-202. [16] WAIBEL A, HANAZAWA T, HINTON G, et al. Phoneme recognition using time-delay neural networks [M]// Readings in Speech Recognition. Amsterdam: Elsvier, 1990: 393-404. [17] VAILLANT R, MONROCQ C, LE CUN Y. Original approach for the localization of objects in images [J]. IEE Proceedings—Vision, Image and Signal Processing, 1994, 141(4): 245-250. [18] LAWRENCE S, GILES C L, TSOI A C, et al. Face recognition: a convolutional neural-network approach [J]. IEEE Transactions on Neural Networks, 1997, 8(1): 98-113. [19] DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2009:248-255. [20] DONAHUE J, HENDRICKS L A, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 2625-2634. [21] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 3156-3164. [22] MALINOWSKI M, ROHRBACH M, FRITZ M. Ask your neurons: a neural-based approach to answering questions about images [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 1-9. [23] ANTOL S, AGRAWAL A, LU J, et al. VQA: visual question answering [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 2425-2433. [24] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks [C]// Proceedings of European Conference on Computer Vision, LNCS 8689. Berlin: Springer, 2014: 818-833. [25] JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221-231. [26] LOWE D G. Distinctive image features from scale-invariant keypoints [J]. International Journal of Computer Vision, 2004, 60(2): 91-110. [27] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C]// Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2005: 886-893. [28] LECUN Y, BENGIO Y, HINTON G E. Deep learning [J]. Nature, 2015, 521(7553): 436-444. [29] 孙志军,薛磊,许阳明,等.深度学习研究综述[J].计算机应用研究,2012,29(8):2806-2810.(SUN Z J, XUE L, XU Y M, et al. Overview of deep learning [J]. Application Research of Computers, 2012, 29(8): 2806-2810) [30] DONAHUE J, JIA Y, VINYALS O, et al. DeCAF: a deep convolutional activation feature for generic visual recognition [J]. Computer Science, 2013, 50(1): 815-830. [31] RAZAVIAN A S, AZIZPOUR H, SULLIVAN J, et al. CNN features off-the-shelf: an astounding baseline for recognition [EB/OL]. [2015-11-22]. http://www.csc.kth.se/~azizpour/papers/ha_cvpr14w.pdf. [32] SERMANET P, KAVUKCUOGLU K, CHINTALA S, et al. Pedestrian detection with unsupervised multi-stage feature learning [C]// CVPR '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2013: 3626-3633. [33] KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks [C]// CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1725-1732. [34] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1653-1660. [35] KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences [EB/OL]. [2016-01-07]. http://anthology.aclweb.org/P/P14/P14-1062.pdf. [36] KIM Y. Convolutional neural networks for sentence classification [EB/OL]. [2016-01-07]. http://anthology.aclweb.org/D/D14/D14-1181.pdf. [37] ABDEL-HAMID O, MOHAMMED A, JIANG H, et al. Convolutional neural networks for speech recognition [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2014, 22(10): 1533-1545. [38] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search [J]. Nature, 2016, 529(7587): 484-489. [39] ZEILER M D, FERGUS R. Stochastic pooling for regularization of deep convolutional neural networks [EB/OL]. [2016-01-11]. http://www.matthewzeiler.com/pubs/iclr2013/iclr2013.pdf. [40] MURPHY K P. Machine Learning: A Probabilistic Perspective [M]. Cambridge, MA: MIT Press, 2012: 82-92. [41] CHATFIELD K, SIMONYAN K, VEDALDI A, et al. Return of the devil in the details: delving deep into convolutional nets [EB/OL]. [2016-01-12]. http://www.robots.ox.ac.uk/~vedaldi/assets/pubs/chatfield14return.pdf. [42] GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al. Maxout networks [EB/OL]. [2016-01-12]. http://www-etud.iro.umontreal.ca/~goodfeli/maxout.pdf. [43] LIN M, CHEN Q, YAN S. Network in network [EB/OL]. [2016-01-12]. http://arxiv.org/pdf/1312.4400v3.pdf. [44] MONTAVON G, ORR G, MVLLER K R. Neural Networks: Tricks of the Trade [M]. 2nd ed. London: Springer, 2012: 49-131. [45] BENGIO Y, SIMARD P, FRASCONI P. Learning long-term dependencies with gradient descent is difficult [J]. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166. [46] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 1026-1034. [47] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaption of feature detectors [R/OL]. [2015-10-26]. http://arxiv.org/pdf/1207.0580v1.pdf. [48] WAN L, ZEILER M, ZHANG S, et al. Regularization of neural networks using dropconnect [C]// Proceedings of the 2013 International Conference on Machine Learning. New York: ACM Press, 2013: 1058-1066. [49] HE K, SUN J. Convolutional neural networks at constrained time cost [C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 5353-5360. [50] SPRINGENBERG J T, DOSOVITSKIY A, BROX T, et al. Striving for simplicity: the all convolutional net [EB/OL]. [2015-12-24]. http://arxiv.org/pdf/1412.6806.pdf. [51] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE [EB/OL]. [2015-12-24]. http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf. [52] OLIVA A, TORRALBA A. Modeling the shape of the scene: a holistic representation of the spatial envelope [J]. International Journal of Computer Vision, 2001, 42(3): 145-175. [53] WANG J, YANG J, YU K. Locality-constrained linear coding for image classification [C]// Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2010: 3360-3367. [54] ZEILER M D, TAYLOR G W, FERGUS R. Adaptive deconvolutional networks for mid and high level feature learning [C]// ICCV '11: Proceedings of the 2011 International Conference on Computer Vision. Piscataway, NJ: IEEE, 2011: 2018-2025. [55] NGUYEN A, YOSINSKI J, CLUNE J, et al. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 427-436. [56] FLOREANO D, MATTIUSSI C. Bio-inspired Artificial Intelligence: Theories Methods and Technologies [M]. Cambridge, MA: MIT Press, 2008: 1-97. [57] 庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报,2015,26(1):26-39.(ZHUANG F Z, LUO P, HE Q, et al. Survey on transfer learning research [J]. Journal of Software, 2015, 26(1): 26-39.) [58] LI F, FERGUS R, PERONA P. One-shot learning of object categories [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4):594-611. [59] GRIFFIN B G, HOLUB A, PERONA P. The Caltech-256 [R/OL]. [2016-01-03]. http://xueshu.baidu.com/s?wd=paperuri%3A%28699092e99ad6f96f8696507d539a51c8%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fciteseer.ist.psu.edu%2Fshowciting%3Fcid%3D11093943&ie=utf-8&sc_us=16824823650146432853. [60] ZHOU B, LAPEDRIZA A, XIAO J, et al. Learning deep features for scene recognition using places database [C]// Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press. 2014:487-495. [61] LOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [EB/OL]. [2016-01-06]. http://jmlr.org/proceedings/papers/v37/ioffe15.pdf. [62] GIRSHICK R B. Fast R-CNN [EB/OL]. [2016-01-06]. http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf. [63] REN S, HE K, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [EB/OL]. [2016-01-06]. http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf. [64] UIJLINGS J, SANDE K, GEVERS T, et al. Selective search for object recognition [J]. International Journal of Computer Vision, 2013, 104(2): 154-171. [65] KHAN S H, BENNAMOUN M, SOHEL F, et al. Automatic feature learning for robust shadow detection [C]// CVPR'14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1939-1946. [66] TAIGMAN Y, YANG M, RANZATO M, et al. DeepFace: closing the gap to human-level performance in face verification [C]// CVPR'14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2014: 1701-1708. [67] SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: a unified embedding for face recognition and clustering [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 815-823. [68] LEVI G, HASSNER T. Age and gender classification using convolutional neural networks [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Washington, DC: IEEE Computer Society, 2015: 34-42. |
[1] | 陈成瑞, 孙宁, 何世彪, 廖勇. 面向C-V2X通信的基于深度学习的联合信道估计与均衡算法[J]. 计算机应用, 2021, 41(9): 2687-2693. |
[2] | 王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747. |
[3] | 郑志强, 胡鑫, 翁智, 王雨禾, 程曦. 基于改进DenseNet的牛眼图像特征提取方法[J]. 计算机应用, 2021, 41(9): 2780-2784. |
[4] | 谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495. |
[5] | 代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551. |
[6] | 刘子辰, 李小娟, 韦伟. 基于循环神经网络的专利价格自动评估[J]. 计算机应用, 2021, 41(9): 2532-2538. |
[7] | 宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735. |
[8] | 李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509. |
[9] | 张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614. |
[10] | 赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503. |
[11] | 徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725. |
[12] | 牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779. |
[13] | 何正海, 线岩团, 王蒙, 余正涛. 融合句法指导与字符注意力机制的案情阅读理解方法[J]. 计算机应用, 2021, 41(8): 2427-2431. |
[14] | 曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287. |
[15] | 丁尹, 桑楠, 李晓瑜, 吴飞舟. 基于循环神经网络的电信行业容量数据预测方法[J]. 计算机应用, 2021, 41(8): 2373-2378. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||