skip to main content
tutorial

A Survey on Deep Learning: Algorithms, Techniques, and Applications

Published:18 September 2018Publication History
Skip Abstract Section

Abstract

The field of machine learning is witnessing its golden era as deep learning slowly becomes the leader in this domain. Deep learning uses multiple layers to represent the abstractions of data to build computational models. Some key enabler deep learning algorithms such as generative adversarial networks, convolutional neural networks, and model transfers have completely changed our perception of information processing. However, there exists an aperture of understanding behind this tremendously fast-paced domain, because it was never previously represented from a multiscope perspective. The lack of core understanding renders these powerful methods as black-box machines that inhibit development at a fundamental level. Moreover, deep learning has repeatedly been perceived as a silver bullet to all stumbling blocks in machine learning, which is far from the truth. This article presents a comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications. It was also undertaken to review the issues faced in deep learning such as unsupervised learning, black-box models, and online learning and to illustrate how these challenges can be transformed into prolific future research avenues.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016). Retrieved from http://arxiv.org/abs/1603.04467.Google ScholarGoogle Scholar
  2. Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. 2014. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 10 (2014), 1533--1545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Johannes Abel and Tim Fingscheidt. 2017. A DNN regression approach to speech enhancement by artificial bandwidth extension. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 219--223.Google ScholarGoogle ScholarCross RefCross Ref
  4. Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A large-scale video classification benchmark. CoRR abs/1609.08675 (2016). Retrieved from http://arxiv.org/abs/1609.08675.Google ScholarGoogle Scholar
  5. Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, and Ying Zhang. 2016. Theano: A Python framework for fast computation of mathematical expressions. CoRR abs/1605.02688 (2016). Retrieved from http://arxiv.org/abs/1605.02688.Google ScholarGoogle Scholar
  6. Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, Jie Chen, Jingdong Chen, Zhijie Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Ke Ding, Niandong Du, Erich Elsen, Jesse Engel, Weiwei Fang, Linxi Fan, Christopher Fougner, Liang Gao, Caixia Gong, Awni Hannun, Tony Han, Lappi Vaino Johannes, Bing Jiang, Cai Ju, Billy Jun, Patrick LeGresley, Libby Lin, Junjie Liu, Yang Liu, Weigao Li, Xiangang Li, Dongpeng Ma, Sharan Narang, Andrew Ng, Sherjil Ozair, Yiping Peng, Ryan Prenger, Sheng Qian, Zongfeng Quan, Jonathan Raiman, Vinay Rao, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Kavya Srinet, Anuroop Sriram, Haiyuan Tang, Liliang Tang, Chong Wang, Jidong Wang, Kaifu Wang, Yi Wang, Zhijian Wang, Zhiqian Wang, Shuang Wu, Likai Wei, Bo Xiao, Wen Xie, Yan Xie, Dani Yogatama, Bin Yuan, Jun Zhan, and Zhenyao Zhu. 2016. Deep speech 2: End-to-end speech recognition in English and Mandarin. In International Conference on Machine Learning, Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christof Angermueller, Tanel Pärnamaa, Leopold Parts, and Oliver Stegle. 2016. Deep learning for computational biology. Molecular Systems Biology 12, 7 (2016), 878.Google ScholarGoogle ScholarCross RefCross Ref
  8. Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. 2017. Neurostream: Scalable and energy efficient deep learning with smart memory cubes. CoRR abs/1701.06420 (2017). Retrieved from http://arxiv.org/abs/1701.06420.Google ScholarGoogle Scholar
  9. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014). Retrieved from http://arxiv.org/abs/1409.0473.Google ScholarGoogle Scholar
  10. Nicolas Ballas, Li Yao, Chris Pal, and Aaron C. Courville. 2015. Delving deeper into convolutional networks for learning video representations. CoRR abs/1511.06432 (2015). Retrieved from http://arxiv.org/abs/1511.06432.Google ScholarGoogle Scholar
  11. Yoshua Bengio, Eric Laufer, Guillaume Alain, and Jason Yosinski. 2014. Deep generative stochastic networks trainable by backprop. In International Conference on Machine Learning. Omnipress, 226--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Empirical Methods in Natural Language Processing, Vol. 2. Association for Computational Linguistics, 6.Google ScholarGoogle Scholar
  13. Leo Breiman. 2003. Statistical modeling: The two cultures. Quality Control and Applied Statistics 48, 1 (2003), 81--82.Google ScholarGoogle Scholar
  14. Davide Castelvecchi. 2016. Can we open the black box of AI? Nature 538, 7623 (2016), 20--23.Google ScholarGoogle Scholar
  15. Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In IEEE International Conference on Computer Vision. IEEE, 2722--2730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ting Chen and Christophe Chefd’hotel. 2014. Deep learning based automatic immune cell detection for immunohistochemistry images. In International Workshop on Machine Learning in Medical Imaging. Springer, 17--24.Google ScholarGoogle ScholarCross RefCross Ref
  17. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015). Retrieved from http://arxiv.org/abs/1512.01274.Google ScholarGoogle Scholar
  18. Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759 (2014). Retrieved from http://arxiv.org/abs/1410.0759.Google ScholarGoogle Scholar
  19. Jen-Tzung Chien and Hsin-Lung Hsieh. 2013. Nonstationary source separation using sequential and variational Bayesian learning. IEEE Transactions on Neural Networks and Learning Systems 24, 5 (2013), 681--694.Google ScholarGoogle ScholarCross RefCross Ref
  20. Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In The Conference on Empirical Methods in Natural Language Processing. 1724--1734.Google ScholarGoogle ScholarCross RefCross Ref
  21. Min Chee Choy, Dipti Srinivasan, and Ruey Long Cheu. 2006. Neural networks for continuous online learning and control. IEEE Transactions on Neural Networks 17, 6 (2006), 1511--1531. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. CIFAR. 2009. CIFAR-10 and CIFAR-100 datasets. Retrieved from https://www.cs.toronto.edu/∼kriz/cifar.html. Accessed April 18, 2017.Google ScholarGoogle Scholar
  23. Dan C. Cireşan, Alessandro Giusti, Luca M. Gambardella, and Jürgen Schmidhuber. 2013. Mitosis detection in breast cancer histology images with deep neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 411--418.Google ScholarGoogle ScholarCross RefCross Ref
  24. Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng, and Bryan Catanzaro. 2013. Deep learning with COTS HPC systems. In International Conference on Machine Learning. Omnipress, 1337--1345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. 2002. Torch: A Modular Machine Learning Software Library. Idiap-RR Idiap-RR-46-2002. Idiap.Google ScholarGoogle Scholar
  26. George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 30--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, 886--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In The 25th International Conference on Neural Information Processing Systems. Curran Associates, 1223--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Li Deng. 2014. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014), 1--29.Google ScholarGoogle Scholar
  30. Li Deng, Xiaodong He, and Jianfeng Gao. 2013. Deep stacking networks for information retrieval. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3153--3157.Google ScholarGoogle ScholarCross RefCross Ref
  31. Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2625--2634.Google ScholarGoogle ScholarCross RefCross Ref
  33. Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question answering over freebase with multi-column convolutional neural networks. In 53rd Annual Meeting of the Association for Computational Linguistics, Vol. 1. Association for Computational Linguistics, 260--269.Google ScholarGoogle ScholarCross RefCross Ref
  34. Timothy Dozat. 2016. Incorporating Nesterov momentum into Adam. In International Conference on Learning Representations Workshop. 1--4.Google ScholarGoogle Scholar
  35. John C. Duchi, Elad Hazan, and Yoram Singer. 2010. Adaptive subgradient methods for online learning and stochastic optimization. In Conference on Learning Theory. Omnipress, 257--269.Google ScholarGoogle Scholar
  36. Moataz El Ayadi, Mohamed S. Kamel, and Fakhri Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, 3 (2011), 572--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rasool Fakoor, Faisal Ladhak, Azade Nazi, and Manfred Huber. 2013. Using deep learning to enhance cancer diagnosis and classification. In International Conference on Machine Learning. Omnipress.Google ScholarGoogle Scholar
  38. Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1933--1941.Google ScholarGoogle ScholarCross RefCross Ref
  39. Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. 2015. Applying deep learning to answer selection: A study and an open task. In IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 813--820.Google ScholarGoogle ScholarCross RefCross Ref
  40. Kunihiko Fukushima. 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36, 4 (1980), 193--202.Google ScholarGoogle ScholarCross RefCross Ref
  41. Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. In 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 340--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathon G. Fiscus, and David S. Pallett. 1993. DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N 93 (1993).Google ScholarGoogle Scholar
  43. Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32, 11 (2013), 1231--1237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ross Girshick. 2015. Fast R-CNN. In IEEE International Conference on Computer Vision. IEEE, 1440--1448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 580--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In The 13th International Conference on Artificial Intelligence and Statistics, Vol. 9. JMLR.org, 249--256.Google ScholarGoogle Scholar
  47. Christoph Goller and Andreas Kuchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks, Vol. 1. IEEE, 347--352.Google ScholarGoogle Scholar
  48. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning, Vol. 1. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. Curran Associates, 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Google. 2016. Alphago. Retrieved from https://deepmind.com/research/alphago. Accessed April 18, 2017.Google ScholarGoogle Scholar
  51. Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6645--6649.Google ScholarGoogle ScholarCross RefCross Ref
  52. Hayit Greenspan, Bram van Ginneken, and Ronald M. Summers. 2016. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging 35, 5 (2016), 1153--1159.Google ScholarGoogle ScholarCross RefCross Ref
  53. Karol Gregor and Yann LeCun. 2010. Learning fast approximations of sparse coding. In The 27th International Conference on Machine Learning. Omnipress, 399--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Hsin-Yu Ha, Yimin Yang, Samira Pouyanfar, Haiman Tian, and Shu-Ching Chen. 2015. Correlation-based deep learning for multimedia semantic concept detection. In International Conference on Web Information Systems Engineering. Springer, 473--487.Google ScholarGoogle ScholarCross RefCross Ref
  55. Raia Hadsell, Ayse Erkan, Pierre Sermanet, Marco Scoffier, Urs Muller, and Yann LeCun. 2008. Deep belief net learning in a long-range vision system for autonomous off-road driving. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 628--633.Google ScholarGoogle ScholarCross RefCross Ref
  56. Kun Han, Dong Yu, and Ivan Tashev. 2014. Speech emotion recognition using deep neural network and extreme learning machine. In Interspeech. ISCA, 223--227.Google ScholarGoogle Scholar
  57. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision. IEEE, 2980--2988.Google ScholarGoogle Scholar
  58. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  59. Irina Higgins, Loic Matthey, Xavier Glorot, Arka Pal, Benigno Uria, Charles Blundell, Shakir Mohamed, and Alexander Lerchner. 2016. Early visual concept learning with unsupervised deep learning. CoRR abs/1606.05579 (2016). Retrieved from http://arxiv.org/abs/1606.05579.Google ScholarGoogle Scholar
  60. Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82--97.Google ScholarGoogle ScholarCross RefCross Ref
  61. Geoffrey E. Hinton. 2009. Deep belief networks. Scholarpedia 4, 5 (2009), 5947.Google ScholarGoogle ScholarCross RefCross Ref
  62. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (July 2006), 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Sunpyo Hong and Hyesoon Kim. 2010. An integrated GPU power and performance model. In The 37th International Symposium on Computer Architecture, Vol. 38. ACM, 280--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. 2016. Deep networks with stochastic depth. In European Conference on Computer Vision. Springer, 646--661.Google ScholarGoogle Scholar
  65. Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry P. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In The 22nd ACM International Conference on Information and Knowledge Management. ACM, 2333--2338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, and Paris Smaragdis. 2014. Deep learning for monaural speech separation. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1562--1566.Google ScholarGoogle ScholarCross RefCross Ref
  67. David H. Hubel and Torsten N. Wiesel. 1962. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology 160, 1 (1962), 106--154.Google ScholarGoogle ScholarCross RefCross Ref
  68. ImageNet. 2017. Retrieved from http://image-Net.org. Accessed April 18, 2017.Google ScholarGoogle Scholar
  69. Intel Nervana Systems. 2017. Neon deep learning framework. Retrieved from https://www.nervanasys.com/technology/neon. Accessed April 4, 2017.Google ScholarGoogle Scholar
  70. Anastasia Ioannidou, Elisavet Chatzilari, Spiros Nikolopoulos, and Ioannis Kompatsiaris. 2017. Deep learning advances in computer vision with 3D data: A survey. Computing Surveys 50, 2 (2017), 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Herbert Jaeger and Harald Haas. 2004. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304, 5667 (2004), 78--80.Google ScholarGoogle Scholar
  72. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross B. Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia. ACM, 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Michael I. Jordan. 1986. Serial order: A parallel distributed processing approach. Advances in Psychology 121 (1986), 471--495.Google ScholarGoogle ScholarCross RefCross Ref
  74. Jean-Claude Junqua and Jean-Paul Haton. 2012. Robustness in Automatic Speech Recognition: Fundamentals and Applications, Vol. 341. Springer Science 8 Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Mikael Kågebäck, Olof Mogren, Nina Tahmasebi, and Devdatt Dubhashi. 2014. Extractive summarization using continuous vector space models. In 2nd Workshop on Continuous Vector Space Models and their Compositionality. Citeseer, Association for Computational Linguistics, 31--39.Google ScholarGoogle Scholar
  76. Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, and Jakob Uszkoreit. 2017. One model to learn them all. CoRR abs/1706.05137 (2017). Retrieved from http://arxiv.org/abs/1706.05137.Google ScholarGoogle Scholar
  77. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1725--1732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The kinetics human action video dataset. CoRR abs/1705.06950 (2017). Retrieved from http://arxiv.org/abs/1705.06950.Google ScholarGoogle Scholar
  79. Yoon Kim. 2014. Convolutional neural networks for sentence classification. CoRR abs/1408.5882 (2014). Retrieved from http://arxiv.org/abs/1408.5882.Google ScholarGoogle Scholar
  80. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014). Retrieved from http://arxiv.org/abs/1412.6980.Google ScholarGoogle Scholar
  81. Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. CoRR abs/1312.6114 (2013). Retrieved from http://arxiv.org/abs/1312.6114.Google ScholarGoogle Scholar
  82. Morten Kolbæk, Zheng-Hua Tan, and Jesper Jensen. 2017. Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Transactions on Audio, Speech and Language Processing 25, 1 (2017), 153--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez. 2013. Evolving large-scale neural networks for vision-based reinforcement learning. In 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 1061--1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Vassili Kovalev, Alexander Kalinovsky, and Sergey Kovalev. 2016. Deep learning with Theano, Torch, Caffe, Tensorflow, and Deeplearning4J: Which one is the best in speed and accuracy? In The 13th International Conference on Pattern Recognition and Information Processing.Google ScholarGoogle Scholar
  85. Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, and Li Fei-Fei. 2016. The unreasonable effectiveness of noisy data for fine-grained recognition. In European Conference on Computer Vision. Springer, 301--320.Google ScholarGoogle ScholarCross RefCross Ref
  86. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Michel Lang, Helena Kotthaus, Peter Marwedel, Claus Weihs, Jörg Rahnenführer, and Bernd Bischl. 2015. Automatic model selection for high-dimensional survival analysis. Journal of Statistical Computation and Simulation 85, 1 (2015), 62--76.Google ScholarGoogle ScholarCross RefCross Ref
  88. Quoc V. Le. 2013. Building high-level features using large scale unsupervised learning. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 8595--8598.Google ScholarGoogle ScholarCross RefCross Ref
  89. Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. Handbook of Brain Theory and Neural Networks 3361, 10 (1995), 255--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google ScholarGoogle Scholar
  91. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  92. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 583--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Xiangang Li and Xihong Wu. 2015. Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 4520--4524.Google ScholarGoogle ScholarCross RefCross Ref
  94. Yuxi Li. 2017. Deep reinforcement learning: An overview. CoRR abs/1701.07274 (2017). Retrieved from https://arxiv.org/abs/1701.07274.Google ScholarGoogle Scholar
  95. Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems. Curran Associates, 379--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. Springer, 740--755.Google ScholarGoogle Scholar
  97. Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A. W. M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. 2017. A survey on deep learning in medical image analysis. CoRR abs/1702.05747 (2017). Retrieved from http://arxiv.org/abs/1702.05747.Google ScholarGoogle Scholar
  98. Geert Litjens, Clara I. Sánchez, Nadya Timofeeva, Meyke Hermsen, Iris Nagtegaal, Iringo Kovacs, Christina Hulsbergen-Van De Kaa, Peter Bult, Bram Van Ginneken, and Jeroen Van Der Laak. 2016. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific Reports 6 (2016), 26286.Google ScholarGoogle ScholarCross RefCross Ref
  99. Feng Liu, Bingquan Liu, Chengjie Sun, Ming Liu, and Xiaolong Wang. 2015. Deep belief network-based approaches for link prediction in signed social networks. Entropy 17, 4 (2015), 2140--2169.Google ScholarGoogle ScholarCross RefCross Ref
  100. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21--37.Google ScholarGoogle Scholar
  101. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 3431--3440.Google ScholarGoogle ScholarCross RefCross Ref
  102. David G. Lowe. 1999. Object recognition from local scale-invariant features. In IEEE International Conference on Computer Vision, Vol. 2. IEEE, 1150--1157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Junjie Lu, Steven Young, Itamar Arel, and Jeremy Holleman. 2015. A 1 TOPS/W analog deep machine-learning engine with floating-gate storage in 0.13 m CMOS. IEEE Journal of Solid-State Circuits 50, 1 (2015), 270--281.Google ScholarGoogle ScholarCross RefCross Ref
  104. Jianzhu Ma, Michael Ku Yu, Samson Fong, Keiichiro Ono, Eric Sage, Barry Demchak, Roded Sharan, and Trey Ideker. 2018. Using deep learning to model the hierarchical structure and function of a cell. Nature Methods 15, 4 (2018), 290--298.Google ScholarGoogle ScholarCross RefCross Ref
  105. Xiaolei Ma, Haiyang Yu, Yunpeng Wang, and Yinhai Wang. 2015. Large-scale transportation network congestion evolution prediction using deep learning theory. PLoS ONE 10, 3 (2015), e0119044.Google ScholarGoogle ScholarCross RefCross Ref
  106. Christopher Manning. 2016. Understanding human language: Can NLP and deep learning help? In The 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Warren S. McCulloch and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 4 (1943), 115--133.Google ScholarGoogle ScholarCross RefCross Ref
  108. H. Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Agüera y Arcas. 2016. Federated learning of deep networks using model averaging. CoRR abs/1602.05629 (2016). Retrieved from http://arxiv.org/abs/1602.05629.Google ScholarGoogle Scholar
  109. Alessio Micheli. 2009. Neural network for graphs: A contextual constructive approach. IEEE Transactions on Neural Networks 20, 3 (2009), 498--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, and Jeff Dean. 2018. A hierarchical model for device placement. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  111. Marc’Aurelio Ranzato, Volodymyr Mnih, Joshua M. Susskind, and Geoffrey E. Hinton. 2013. Modeling natural images using gated MRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 2206--2222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. MNIST. 2017. The MNIST database of handwritten digits. Retrieved from http://yann.lecun.com/exdb/mnist/. Accessed April 18, 2017.Google ScholarGoogle Scholar
  113. Alexander Mordvintsev, Christopher Olah, and Mike Tyka. 2015. Inceptionism: Going deeper into neural networks. Google Research Blog. Retrieved from https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html. Accessed March 26, 2018.Google ScholarGoogle Scholar
  114. Igor Mozetic, Miha Grcar, and Jasmina Smailovic. 2016. Multilingual Twitter sentiment classification: The role of human annotators. CoRR abs/1602.07563 (2016). arxiv:1602.07563. Retrieved from http://arxiv.org/abs/1602.07563.Google ScholarGoogle Scholar
  115. Maryam M. Najafabadi, Flavio Villanustre, Taghi M. Khoshgoftaar, Naeem Seliya, Randall Wald, and Edin Muharemagic. 2015. Deep learning applications and challenges in big data analytics. Journal of Big Data 2, 1 (2015), 1--21.Google ScholarGoogle ScholarCross RefCross Ref
  116. Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and Veselin Stoyanov. 2016. SemEval-2016 task 4: Sentiment analysis in Twitter. In The 10th International Workshop on Semantic Evaluation. Association for Computer Linguistics, 1--18.Google ScholarGoogle ScholarCross RefCross Ref
  117. Kazuhiro Negi, Keisuke Dohi, Yuichiro Shibata, and Kiyoshi Oguri. 2011. Deep pipelined one-chip FPGA implementation of a real-time image-based human detection algorithm. In International Conference on Field-Programmable Technology. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  118. Michael Neumann and Ngoc Thang Vu. 2017. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. CoRR abs/1706.00612 (2017). arxiv:1706.00612. Retrieved from http://arxiv.org/abs/1706.00612.Google ScholarGoogle Scholar
  119. Evan W. Newell and Yang Cheng. 2016. Mass cytometry: Blessed with the curse of dimensionality. Nature Immunology 17, 8 (2016), 890--895.Google ScholarGoogle ScholarCross RefCross Ref
  120. Dat Tien Nguyen, Shafiq R. Joty, Muhammad Imran, Hassan Sajjad, and Prasenjit Mitra. 2016. Applications of online deep learning for crisis response using social media information. CoRR abs/1610.01030 (2016). Retrieved from http://arxiv.org/abs/1610.01030.Google ScholarGoogle Scholar
  121. Laisen Nie, Dingde Jiang, Lei Guo, Shui Yu, and Houbing Song. 2016. Traffic matrix prediction and estimation based on deep learning for data center networks. In IEEE Globecom Workshops. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  122. Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In IEEE International Conference on Computer Vision. IEEE, 1520--1528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Pascal VOC. 2012. The PASCAL Visual Object Classes. Retrieved from http://host.robots.ox.ac.uk/pascal/VOC/. Accessed April 18, 2017.Google ScholarGoogle Scholar
  124. Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2013. How to construct deep recurrent neural networks. CoRR abs/1312.6026 (2013). Retrieved from http://arxiv.org/abs/1312.6026.Google ScholarGoogle Scholar
  125. Santiago Pascual, Antonio Bonafonte, and Joan Serrà. 2017. SEGAN: Speech enhancement generative adversarial network. CoRR abs/1703.09452 (2017). arxiv:1703.09452. Retrieved from http://arxiv.org/abs/1703.09452.Google ScholarGoogle Scholar
  126. Ryan Poplin, Avinash V. Varadarajan, Katy Blumer, Yun Liu, Michael V. McConnell, Greg S. Corrado, Lily Peng, and Dale R. Webster. 2018. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature Biomedical Engineering 2, 3 (2018), 158--164.Google ScholarGoogle ScholarCross RefCross Ref
  127. Samira Pouyanfar and Shu-Ching Chen. 2017. Automatic video event detection for imbalance data using enhanced ensemble deep learning. International Journal of Semantic Computing 11, 1 (2017), 85--109.Google ScholarGoogle ScholarCross RefCross Ref
  128. Samira Pouyanfar and Shu-Ching Chen. 2017. T-LRA: Trend-based learning rate annealing for deep neural networks. In The 3rd IEEE International Conference on Multimedia Big Data. IEEE, 50--57.Google ScholarGoogle ScholarCross RefCross Ref
  129. Samira Pouyanfar, Shu-Ching Chen, and Mei-Ling Shyu. 2017. An efficient deep residual-inception network for multimedia classification. In International Conference on Multimedia and Expo. IEEE, 373--378.Google ScholarGoogle ScholarCross RefCross Ref
  130. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434 (2015). Retrieved from http://arxiv.org/abs/1511.06434.Google ScholarGoogle Scholar
  131. Rajesh Ranganath, Adler J. Perotte, Noémie Elhadad, and David M. Blei. 2016. Deep survival analysis. In Machine Learning in Health Care. JMLR.org, 101--114.Google ScholarGoogle Scholar
  132. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 779--788.Google ScholarGoogle ScholarCross RefCross Ref
  133. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. MIT Press, 91--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Frank Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 6 (1958), 386.Google ScholarGoogle ScholarCross RefCross Ref
  135. Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Deep Boltzmann machines. In Artificial Intelligence and Statistics. PMLR, 448--455.Google ScholarGoogle Scholar
  136. Ruslan Salakhutdinov and Geoffrey Hinton. 2012. An efficient learning procedure for deep Boltzmann machines. Neural Computation 24, 8 (2012), 1967--2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. International Conference on Artificial Neural Networks 6354 (2010), 92--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Frank Seide, Gang Li, and Dong Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In 12th Annual Conference of the International Speech Communication Association. ISCA, 437--440.Google ScholarGoogle ScholarCross RefCross Ref
  140. Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, and Yann LeCun. 2013. Pedestrian detection with unsupervised multi-stage feature learning. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 3626--3633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In The 23rd International World Wide Web Conference. ACM, 373--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). Retrieved from http://arxiv.org/abs/1409.1556.Google ScholarGoogle Scholar
  143. Skymind. 2017. Deeplearning4j deep learning framework. Retrieved from https://deeplearning4j.org. Accessed April 18, 2017.Google ScholarGoogle Scholar
  144. Paul Smolensky. 1986. Information Processing in Dynamical Systems: Foundations of Harmony Theory. Technical Report. DTIC Document.Google ScholarGoogle Scholar
  145. Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems, Vol. 24. Neural Information Processing Systems Foundation, 801--809. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. 2011. Parsing natural scenes and natural language with recursive neural networks. In International Conference on Machine Learning. Omnipress, 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing. Citeseer, Association for Computational Linguistics, 1631--1642.Google ScholarGoogle Scholar
  148. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012). Retrieved from http://arxiv.org/abs/1212.0402.Google ScholarGoogle Scholar
  149. Hang Su and Haoyu Chen. 2015. Experiments on parallel training of deep neural network using model averaging. CoRR abs/1507.01239 (2015). Retrieved from http://arxiv.org/abs/1507.01239.Google ScholarGoogle Scholar
  150. Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey E. Hinton. 2013. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning. JMLR.org, 1139--1147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  152. Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Communications of the ACM 59, 2 (2016), 64--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Haiman Tian and Shu-Ching Chen. 2017. MCA-NN: Multiple correspondence analysis based neural network for disaster information detection. In The 3rd IEEE International Conference on Multimedia Big Data. IEEE, 268--275.Google ScholarGoogle ScholarCross RefCross Ref
  154. Haiman Tian and Shu-Ching Chen. 2017. A video-aided semantic analytics system for disaster information integration. In The 3rd IEEE International Conference on Multimedia Big Data. IEEE, 242--243.Google ScholarGoogle ScholarCross RefCross Ref
  155. Antonio Torralba, Rob Fergus, and William T. Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 11 (2008), 1958--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In IEEE International Conference on Computer Vision. IEEE, 4489--4497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Transfer Learning. 2017. Convolutional Neural Network for Visual Recognition. Retrieved from http://cs231n.github.io/transfer-learning/. Accessed April 25, 2017.Google ScholarGoogle Scholar
  158. Trecvid. 2017. TREC Video Retrieval Evaluation. Retrieved from http://trecvid.nist.gov. Accessed April 18, 2017.Google ScholarGoogle Scholar
  159. Grigorios Tsagkatakis, Mustafa Jaber, and Panagiotis Tsakalides. 2017. Goal!! Event detection in sports video. Electronic Imaging 2017, 16 (2017), 15--20.Google ScholarGoogle ScholarCross RefCross Ref
  160. Nicolas Vasilache, Jeff Johnson, Michaël Mathieu, Soumith Chintala, Serkan Piantino, and Yann LeCun. 2014. Fast convolutional nets with fbfft: A GPU performance evaluation. CoRR abs/1412.7580 (2014). Retrieved from http://arxiv.org/abs/1412.7580.Google ScholarGoogle Scholar
  161. Soroush Vosoughi, Prashanth Vijayaraghavan, and Deb Roy. 2016. Tweet2Vec: learning Tweet embeddings using character-level CNN-LSTM encoder-decoder. In The 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1041--1044. Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Chao Wang, Lei Gong, Qi Yu, Xi Li, Yuan Xie, and Xuehai Zhou. 2016. DLAU: A scalable deep learning accelerator unit on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 36, 3 (2016), 513--517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. Peng Wang, Baowen Xu, Yurong Wu, and Xiaoyu Zhou. 2015. Link prediction in social networks: The state-of-the-art. Science China Information Sciences 58, 1 (2015), 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  164. Joonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C. Barros. 2017. A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In International Joint Conference on Neural Networks. IEEE, 2384--2391.Google ScholarGoogle Scholar
  165. Chao Weng, Dong Yu, Michael L. Seltzer, and Jasha Droppo. 2015. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing 23, 10 (2015), 1670--1679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). arxiv:1609.08144. Retrieved from http://arxiv.org/abs/1609.08144.Google ScholarGoogle Scholar
  167. Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2016. Aggregated residual transformations for deep neural networks. CoRR abs/1611.05431 (2016). Retrieved from http://arxiv.org/abs/1611.05431.Google ScholarGoogle Scholar
  168. Omry Yadan, Keith Adams, Yaniv Taigman, and Marc’Aurelio Ranzato. 2013. Multi-GPU training of ConvNets. CoRR abs/1312.5853 (2013).Google ScholarGoogle Scholar
  169. Yilin Yan, Min Chen, Saad Sadiq, and Mei-Ling Shyu. 2017. Efficient imbalanced multimedia concept retrieval by deep learning on spark clusters. International Journal of Multimedia Data Engineering and Management 8, 1 (2017), 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. Yilin Yan, Min Chen, Mei-Ling Shyu, and Shu-Ching Chen. 2015. Deep learning for imbalanced multimedia data classification. In The IEEE International Symposium on Multimedia. IEEE, 483--488.Google ScholarGoogle ScholarCross RefCross Ref
  171. Yilin Yan, Qiusha Zhu, Mei-Ling Shyu, and Shu-Ching Chen. 2016. A classifier ensemble framework for multimedia big data classification. In The 17th IEEE International Conference on Information Reuse and Integration. IEEE, 615--622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2015. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. CoRR abs/1512.05193 (2015). arxiv:1512.05193. Retrieved from http://arxiv.org/abs/1512.05193.Google ScholarGoogle Scholar
  173. Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoffrey Zweig, Christopher J. Rossbach, and Jon Currey. 2014. An introduction to computational networks and the computational network toolkit. In The 15th Annual Conference of the International Speech Communication Association. ISCA.Google ScholarGoogle Scholar
  174. Dong Yu, Morten Kolbæk, Zheng-Hua Tan, and Jesper Jensen. 2017. Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 241--245.Google ScholarGoogle ScholarCross RefCross Ref
  175. Qi Yu, Chao Wang, Xiang Ma, Xi Li, and Xuehai Zhou. 2015. A deep learning prediction process accelerator based FPGA. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 1159--1162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. Matthew D. Zeiler. 2012. ADADELTA: An adaptive learning rate method. CoRR abs/1212.5701 (2012). Retrieved from http://arxiv.org/abs/1212.5701.Google ScholarGoogle Scholar
  177. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. Xueliang Zhang and DeLiang Wang. 2017. Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 (2017), 1075--1084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. 649--657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. Yue Zhao, Xingyu Jin, and Xiaolin Hu. 2017. Recurrent convolutional neural networks for speech processing. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE SigPort, 5300--5304.Google ScholarGoogle ScholarCross RefCross Ref
  181. Zhiwei Zhao and Youzheng Wu. 2016. Attention-based convolutional neural networks for sentence classification. In The 17th Annual Conference of the International Speech Communication Association. ISCA, 705--709.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Survey on Deep Learning: Algorithms, Techniques, and Applications

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Computing Surveys
              ACM Computing Surveys  Volume 51, Issue 5
              September 2019
              791 pages
              ISSN:0360-0300
              EISSN:1557-7341
              DOI:10.1145/3271482
              • Editor:
              • Sartaj Sahni
              Issue’s Table of Contents

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 18 September 2018
              • Accepted: 1 June 2018
              • Revised: 1 April 2018
              • Received: 1 May 2017
              Published in csur Volume 51, Issue 5

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • tutorial
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format