Abstract
The field of machine learning is witnessing its golden era as deep learning slowly becomes the leader in this domain. Deep learning uses multiple layers to represent the abstractions of data to build computational models. Some key enabler deep learning algorithms such as generative adversarial networks, convolutional neural networks, and model transfers have completely changed our perception of information processing. However, there exists an aperture of understanding behind this tremendously fast-paced domain, because it was never previously represented from a multiscope perspective. The lack of core understanding renders these powerful methods as black-box machines that inhibit development at a fundamental level. Moreover, deep learning has repeatedly been perceived as a silver bullet to all stumbling blocks in machine learning, which is far from the truth. This article presents a comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications. It was also undertaken to review the issues faced in deep learning such as unsupervised learning, black-box models, and online learning and to illustrate how these challenges can be transformed into prolific future research avenues.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016). Retrieved from http://arxiv.org/abs/1603.04467.Google Scholar
- Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. 2014. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 10 (2014), 1533--1545. Google ScholarDigital Library
- Johannes Abel and Tim Fingscheidt. 2017. A DNN regression approach to speech enhancement by artificial bandwidth extension. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 219--223.Google ScholarCross Ref
- Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A large-scale video classification benchmark. CoRR abs/1609.08675 (2016). Retrieved from http://arxiv.org/abs/1609.08675.Google Scholar
- Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, and Ying Zhang. 2016. Theano: A Python framework for fast computation of mathematical expressions. CoRR abs/1605.02688 (2016). Retrieved from http://arxiv.org/abs/1605.02688.Google Scholar
- Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, Jie Chen, Jingdong Chen, Zhijie Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Ke Ding, Niandong Du, Erich Elsen, Jesse Engel, Weiwei Fang, Linxi Fan, Christopher Fougner, Liang Gao, Caixia Gong, Awni Hannun, Tony Han, Lappi Vaino Johannes, Bing Jiang, Cai Ju, Billy Jun, Patrick LeGresley, Libby Lin, Junjie Liu, Yang Liu, Weigao Li, Xiangang Li, Dongpeng Ma, Sharan Narang, Andrew Ng, Sherjil Ozair, Yiping Peng, Ryan Prenger, Sheng Qian, Zongfeng Quan, Jonathan Raiman, Vinay Rao, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Kavya Srinet, Anuroop Sriram, Haiyuan Tang, Liliang Tang, Chong Wang, Jidong Wang, Kaifu Wang, Yi Wang, Zhijian Wang, Zhiqian Wang, Shuang Wu, Likai Wei, Bo Xiao, Wen Xie, Yan Xie, Dani Yogatama, Bin Yuan, Jun Zhan, and Zhenyao Zhu. 2016. Deep speech 2: End-to-end speech recognition in English and Mandarin. In International Conference on Machine Learning, Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, 173--182. Google ScholarDigital Library
- Christof Angermueller, Tanel Pärnamaa, Leopold Parts, and Oliver Stegle. 2016. Deep learning for computational biology. Molecular Systems Biology 12, 7 (2016), 878.Google ScholarCross Ref
- Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. 2017. Neurostream: Scalable and energy efficient deep learning with smart memory cubes. CoRR abs/1701.06420 (2017). Retrieved from http://arxiv.org/abs/1701.06420.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014). Retrieved from http://arxiv.org/abs/1409.0473.Google Scholar
- Nicolas Ballas, Li Yao, Chris Pal, and Aaron C. Courville. 2015. Delving deeper into convolutional networks for learning video representations. CoRR abs/1511.06432 (2015). Retrieved from http://arxiv.org/abs/1511.06432.Google Scholar
- Yoshua Bengio, Eric Laufer, Guillaume Alain, and Jason Yosinski. 2014. Deep generative stochastic networks trainable by backprop. In International Conference on Machine Learning. Omnipress, 226--234. Google ScholarDigital Library
- Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-answer pairs. In Empirical Methods in Natural Language Processing, Vol. 2. Association for Computational Linguistics, 6.Google Scholar
- Leo Breiman. 2003. Statistical modeling: The two cultures. Quality Control and Applied Statistics 48, 1 (2003), 81--82.Google Scholar
- Davide Castelvecchi. 2016. Can we open the black box of AI? Nature 538, 7623 (2016), 20--23.Google Scholar
- Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In IEEE International Conference on Computer Vision. IEEE, 2722--2730. Google ScholarDigital Library
- Ting Chen and Christophe Chefd’hotel. 2014. Deep learning based automatic immune cell detection for immunohistochemistry images. In International Workshop on Machine Learning in Medical Imaging. Springer, 17--24.Google ScholarCross Ref
- Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015). Retrieved from http://arxiv.org/abs/1512.01274.Google Scholar
- Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759 (2014). Retrieved from http://arxiv.org/abs/1410.0759.Google Scholar
- Jen-Tzung Chien and Hsin-Lung Hsieh. 2013. Nonstationary source separation using sequential and variational Bayesian learning. IEEE Transactions on Neural Networks and Learning Systems 24, 5 (2013), 681--694.Google ScholarCross Ref
- Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In The Conference on Empirical Methods in Natural Language Processing. 1724--1734.Google ScholarCross Ref
- Min Chee Choy, Dipti Srinivasan, and Ruey Long Cheu. 2006. Neural networks for continuous online learning and control. IEEE Transactions on Neural Networks 17, 6 (2006), 1511--1531. Google ScholarDigital Library
- CIFAR. 2009. CIFAR-10 and CIFAR-100 datasets. Retrieved from https://www.cs.toronto.edu/∼kriz/cifar.html. Accessed April 18, 2017.Google Scholar
- Dan C. Cireşan, Alessandro Giusti, Luca M. Gambardella, and Jürgen Schmidhuber. 2013. Mitosis detection in breast cancer histology images with deep neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 411--418.Google ScholarCross Ref
- Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng, and Bryan Catanzaro. 2013. Deep learning with COTS HPC systems. In International Conference on Machine Learning. Omnipress, 1337--1345. Google ScholarDigital Library
- Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. 2002. Torch: A Modular Machine Learning Software Library. Idiap-RR Idiap-RR-46-2002. Idiap.Google Scholar
- George E. Dahl, Dong Yu, Li Deng, and Alex Acero. 2012. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 1 (2012), 30--42. Google ScholarDigital Library
- Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, 886--893. Google ScholarDigital Library
- Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large scale distributed deep networks. In The 25th International Conference on Neural Information Processing Systems. Curran Associates, 1223--1231. Google ScholarDigital Library
- Li Deng. 2014. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014), 1--29.Google Scholar
- Li Deng, Xiaodong He, and Jianfeng Gao. 2013. Deep stacking networks for information retrieval. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3153--3157.Google ScholarCross Ref
- Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 350. Google ScholarDigital Library
- Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2625--2634.Google ScholarCross Ref
- Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question answering over freebase with multi-column convolutional neural networks. In 53rd Annual Meeting of the Association for Computational Linguistics, Vol. 1. Association for Computational Linguistics, 260--269.Google ScholarCross Ref
- Timothy Dozat. 2016. Incorporating Nesterov momentum into Adam. In International Conference on Learning Representations Workshop. 1--4.Google Scholar
- John C. Duchi, Elad Hazan, and Yoram Singer. 2010. Adaptive subgradient methods for online learning and stochastic optimization. In Conference on Learning Theory. Omnipress, 257--269.Google Scholar
- Moataz El Ayadi, Mohamed S. Kamel, and Fakhri Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, 3 (2011), 572--587. Google ScholarDigital Library
- Rasool Fakoor, Faisal Ladhak, Azade Nazi, and Manfred Huber. 2013. Using deep learning to enhance cancer diagnosis and classification. In International Conference on Machine Learning. Omnipress.Google Scholar
- Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1933--1941.Google ScholarCross Ref
- Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. 2015. Applying deep learning to answer selection: A study and an open task. In IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 813--820.Google ScholarCross Ref
- Kunihiko Fukushima. 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36, 4 (1980), 193--202.Google ScholarCross Ref
- Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A graph-based approach to abstractive summarization of highly redundant opinions. In 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 340--348. Google ScholarDigital Library
- John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathon G. Fiscus, and David S. Pallett. 1993. DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N 93 (1993).Google Scholar
- Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32, 11 (2013), 1231--1237. Google ScholarDigital Library
- Ross Girshick. 2015. Fast R-CNN. In IEEE International Conference on Computer Vision. IEEE, 1440--1448. Google ScholarDigital Library
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 580--587. Google ScholarDigital Library
- Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In The 13th International Conference on Artificial Intelligence and Statistics, Vol. 9. JMLR.org, 249--256.Google Scholar
- Christoph Goller and Andreas Kuchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In IEEE International Conference on Neural Networks, Vol. 1. IEEE, 347--352.Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning, Vol. 1. MIT Press. Google ScholarDigital Library
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. Curran Associates, 2672--2680. Google ScholarDigital Library
- Google. 2016. Alphago. Retrieved from https://deepmind.com/research/alphago. Accessed April 18, 2017.Google Scholar
- Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6645--6649.Google ScholarCross Ref
- Hayit Greenspan, Bram van Ginneken, and Ronald M. Summers. 2016. Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging 35, 5 (2016), 1153--1159.Google ScholarCross Ref
- Karol Gregor and Yann LeCun. 2010. Learning fast approximations of sparse coding. In The 27th International Conference on Machine Learning. Omnipress, 399--406. Google ScholarDigital Library
- Hsin-Yu Ha, Yimin Yang, Samira Pouyanfar, Haiman Tian, and Shu-Ching Chen. 2015. Correlation-based deep learning for multimedia semantic concept detection. In International Conference on Web Information Systems Engineering. Springer, 473--487.Google ScholarCross Ref
- Raia Hadsell, Ayse Erkan, Pierre Sermanet, Marco Scoffier, Urs Muller, and Yann LeCun. 2008. Deep belief net learning in a long-range vision system for autonomous off-road driving. In IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 628--633.Google ScholarCross Ref
- Kun Han, Dong Yu, and Ivan Tashev. 2014. Speech emotion recognition using deep neural network and extreme learning machine. In Interspeech. ISCA, 223--227.Google Scholar
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision. IEEE, 2980--2988.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 770--778.Google ScholarCross Ref
- Irina Higgins, Loic Matthey, Xavier Glorot, Arka Pal, Benigno Uria, Charles Blundell, Shakir Mohamed, and Alexander Lerchner. 2016. Early visual concept learning with unsupervised deep learning. CoRR abs/1606.05579 (2016). Retrieved from http://arxiv.org/abs/1606.05579.Google Scholar
- Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara Sainath, and Brian Kingsbury. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82--97.Google ScholarCross Ref
- Geoffrey E. Hinton. 2009. Deep belief networks. Scholarpedia 4, 5 (2009), 5947.Google ScholarCross Ref
- Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (July 2006), 1527--1554. Google ScholarDigital Library
- Sunpyo Hong and Hyesoon Kim. 2010. An integrated GPU power and performance model. In The 37th International Symposium on Computer Architecture, Vol. 38. ACM, 280--289. Google ScholarDigital Library
- Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. 2016. Deep networks with stochastic depth. In European Conference on Computer Vision. Springer, 646--661.Google Scholar
- Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry P. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In The 22nd ACM International Conference on Information and Knowledge Management. ACM, 2333--2338. Google ScholarDigital Library
- Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, and Paris Smaragdis. 2014. Deep learning for monaural speech separation. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 1562--1566.Google ScholarCross Ref
- David H. Hubel and Torsten N. Wiesel. 1962. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology 160, 1 (1962), 106--154.Google ScholarCross Ref
- ImageNet. 2017. Retrieved from http://image-Net.org. Accessed April 18, 2017.Google Scholar
- Intel Nervana Systems. 2017. Neon deep learning framework. Retrieved from https://www.nervanasys.com/technology/neon. Accessed April 4, 2017.Google Scholar
- Anastasia Ioannidou, Elisavet Chatzilari, Spiros Nikolopoulos, and Ioannis Kompatsiaris. 2017. Deep learning advances in computer vision with 3D data: A survey. Computing Surveys 50, 2 (2017), 20. Google ScholarDigital Library
- Herbert Jaeger and Harald Haas. 2004. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304, 5667 (2004), 78--80.Google Scholar
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross B. Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia. ACM, 675--678. Google ScholarDigital Library
- Michael I. Jordan. 1986. Serial order: A parallel distributed processing approach. Advances in Psychology 121 (1986), 471--495.Google ScholarCross Ref
- Jean-Claude Junqua and Jean-Paul Haton. 2012. Robustness in Automatic Speech Recognition: Fundamentals and Applications, Vol. 341. Springer Science 8 Business Media.Google ScholarDigital Library
- Mikael Kågebäck, Olof Mogren, Nina Tahmasebi, and Devdatt Dubhashi. 2014. Extractive summarization using continuous vector space models. In 2nd Workshop on Continuous Vector Space Models and their Compositionality. Citeseer, Association for Computational Linguistics, 31--39.Google Scholar
- Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, and Jakob Uszkoreit. 2017. One model to learn them all. CoRR abs/1706.05137 (2017). Retrieved from http://arxiv.org/abs/1706.05137.Google Scholar
- Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1725--1732. Google ScholarDigital Library
- Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The kinetics human action video dataset. CoRR abs/1705.06950 (2017). Retrieved from http://arxiv.org/abs/1705.06950.Google Scholar
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. CoRR abs/1408.5882 (2014). Retrieved from http://arxiv.org/abs/1408.5882.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014). Retrieved from http://arxiv.org/abs/1412.6980.Google Scholar
- Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. CoRR abs/1312.6114 (2013). Retrieved from http://arxiv.org/abs/1312.6114.Google Scholar
- Morten Kolbæk, Zheng-Hua Tan, and Jesper Jensen. 2017. Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems. IEEE/ACM Transactions on Audio, Speech and Language Processing 25, 1 (2017), 153--167. Google ScholarDigital Library
- Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez. 2013. Evolving large-scale neural networks for vision-based reinforcement learning. In 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 1061--1068. Google ScholarDigital Library
- Vassili Kovalev, Alexander Kalinovsky, and Sergey Kovalev. 2016. Deep learning with Theano, Torch, Caffe, Tensorflow, and Deeplearning4J: Which one is the best in speed and accuracy? In The 13th International Conference on Pattern Recognition and Information Processing.Google Scholar
- Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, and Li Fei-Fei. 2016. The unreasonable effectiveness of noisy data for fine-grained recognition. In European Conference on Computer Vision. Springer, 301--320.Google ScholarCross Ref
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, 1097--1105. Google ScholarDigital Library
- Michel Lang, Helena Kotthaus, Peter Marwedel, Claus Weihs, Jörg Rahnenführer, and Bernd Bischl. 2015. Automatic model selection for high-dimensional survival analysis. Journal of Statistical Computation and Simulation 85, 1 (2015), 62--76.Google ScholarCross Ref
- Quoc V. Le. 2013. Building high-level features using large scale unsupervised learning. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 8595--8598.Google ScholarCross Ref
- Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. Handbook of Brain Theory and Neural Networks 3361, 10 (1995), 255--257. Google ScholarDigital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444.Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling distributed machine learning with the parameter server. In USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 583--598. Google ScholarDigital Library
- Xiangang Li and Xihong Wu. 2015. Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 4520--4524.Google ScholarCross Ref
- Yuxi Li. 2017. Deep reinforcement learning: An overview. CoRR abs/1701.07274 (2017). Retrieved from https://arxiv.org/abs/1701.07274.Google Scholar
- Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-FCN: Object detection via region-based fully convolutional networks. In Advances in Neural Information Processing Systems. Curran Associates, 379--387. Google ScholarDigital Library
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision. Springer, 740--755.Google Scholar
- Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A. W. M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. 2017. A survey on deep learning in medical image analysis. CoRR abs/1702.05747 (2017). Retrieved from http://arxiv.org/abs/1702.05747.Google Scholar
- Geert Litjens, Clara I. Sánchez, Nadya Timofeeva, Meyke Hermsen, Iris Nagtegaal, Iringo Kovacs, Christina Hulsbergen-Van De Kaa, Peter Bult, Bram Van Ginneken, and Jeroen Van Der Laak. 2016. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific Reports 6 (2016), 26286.Google ScholarCross Ref
- Feng Liu, Bingquan Liu, Chengjie Sun, Ming Liu, and Xiaolong Wang. 2015. Deep belief network-based approaches for link prediction in signed social networks. Entropy 17, 4 (2015), 2140--2169.Google ScholarCross Ref
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21--37.Google Scholar
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 3431--3440.Google ScholarCross Ref
- David G. Lowe. 1999. Object recognition from local scale-invariant features. In IEEE International Conference on Computer Vision, Vol. 2. IEEE, 1150--1157. Google ScholarDigital Library
- Junjie Lu, Steven Young, Itamar Arel, and Jeremy Holleman. 2015. A 1 TOPS/W analog deep machine-learning engine with floating-gate storage in 0.13 m CMOS. IEEE Journal of Solid-State Circuits 50, 1 (2015), 270--281.Google ScholarCross Ref
- Jianzhu Ma, Michael Ku Yu, Samson Fong, Keiichiro Ono, Eric Sage, Barry Demchak, Roded Sharan, and Trey Ideker. 2018. Using deep learning to model the hierarchical structure and function of a cell. Nature Methods 15, 4 (2018), 290--298.Google ScholarCross Ref
- Xiaolei Ma, Haiyang Yu, Yunpeng Wang, and Yinhai Wang. 2015. Large-scale transportation network congestion evolution prediction using deep learning theory. PLoS ONE 10, 3 (2015), e0119044.Google ScholarCross Ref
- Christopher Manning. 2016. Understanding human language: Can NLP and deep learning help? In The 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1--1. Google ScholarDigital Library
- Warren S. McCulloch and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 4 (1943), 115--133.Google ScholarCross Ref
- H. Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Agüera y Arcas. 2016. Federated learning of deep networks using model averaging. CoRR abs/1602.05629 (2016). Retrieved from http://arxiv.org/abs/1602.05629.Google Scholar
- Alessio Micheli. 2009. Neural network for graphs: A contextual constructive approach. IEEE Transactions on Neural Networks 20, 3 (2009), 498--511. Google ScholarDigital Library
- Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, and Jeff Dean. 2018. A hierarchical model for device placement. In International Conference on Learning Representations.Google Scholar
- Marc’Aurelio Ranzato, Volodymyr Mnih, Joshua M. Susskind, and Geoffrey E. Hinton. 2013. Modeling natural images using gated MRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 2206--2222. Google ScholarDigital Library
- MNIST. 2017. The MNIST database of handwritten digits. Retrieved from http://yann.lecun.com/exdb/mnist/. Accessed April 18, 2017.Google Scholar
- Alexander Mordvintsev, Christopher Olah, and Mike Tyka. 2015. Inceptionism: Going deeper into neural networks. Google Research Blog. Retrieved from https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html. Accessed March 26, 2018.Google Scholar
- Igor Mozetic, Miha Grcar, and Jasmina Smailovic. 2016. Multilingual Twitter sentiment classification: The role of human annotators. CoRR abs/1602.07563 (2016). arxiv:1602.07563. Retrieved from http://arxiv.org/abs/1602.07563.Google Scholar
- Maryam M. Najafabadi, Flavio Villanustre, Taghi M. Khoshgoftaar, Naeem Seliya, Randall Wald, and Edin Muharemagic. 2015. Deep learning applications and challenges in big data analytics. Journal of Big Data 2, 1 (2015), 1--21.Google ScholarCross Ref
- Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and Veselin Stoyanov. 2016. SemEval-2016 task 4: Sentiment analysis in Twitter. In The 10th International Workshop on Semantic Evaluation. Association for Computer Linguistics, 1--18.Google ScholarCross Ref
- Kazuhiro Negi, Keisuke Dohi, Yuichiro Shibata, and Kiyoshi Oguri. 2011. Deep pipelined one-chip FPGA implementation of a real-time image-based human detection algorithm. In International Conference on Field-Programmable Technology. IEEE, 1--8.Google ScholarCross Ref
- Michael Neumann and Ngoc Thang Vu. 2017. Attentive convolutional neural network based speech emotion recognition: A study on the impact of input features, signal length, and acted speech. CoRR abs/1706.00612 (2017). arxiv:1706.00612. Retrieved from http://arxiv.org/abs/1706.00612.Google Scholar
- Evan W. Newell and Yang Cheng. 2016. Mass cytometry: Blessed with the curse of dimensionality. Nature Immunology 17, 8 (2016), 890--895.Google ScholarCross Ref
- Dat Tien Nguyen, Shafiq R. Joty, Muhammad Imran, Hassan Sajjad, and Prasenjit Mitra. 2016. Applications of online deep learning for crisis response using social media information. CoRR abs/1610.01030 (2016). Retrieved from http://arxiv.org/abs/1610.01030.Google Scholar
- Laisen Nie, Dingde Jiang, Lei Guo, Shui Yu, and Houbing Song. 2016. Traffic matrix prediction and estimation based on deep learning for data center networks. In IEEE Globecom Workshops. IEEE, 1--6.Google ScholarCross Ref
- Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In IEEE International Conference on Computer Vision. IEEE, 1520--1528. Google ScholarDigital Library
- Pascal VOC. 2012. The PASCAL Visual Object Classes. Retrieved from http://host.robots.ox.ac.uk/pascal/VOC/. Accessed April 18, 2017.Google Scholar
- Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2013. How to construct deep recurrent neural networks. CoRR abs/1312.6026 (2013). Retrieved from http://arxiv.org/abs/1312.6026.Google Scholar
- Santiago Pascual, Antonio Bonafonte, and Joan Serrà. 2017. SEGAN: Speech enhancement generative adversarial network. CoRR abs/1703.09452 (2017). arxiv:1703.09452. Retrieved from http://arxiv.org/abs/1703.09452.Google Scholar
- Ryan Poplin, Avinash V. Varadarajan, Katy Blumer, Yun Liu, Michael V. McConnell, Greg S. Corrado, Lily Peng, and Dale R. Webster. 2018. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature Biomedical Engineering 2, 3 (2018), 158--164.Google ScholarCross Ref
- Samira Pouyanfar and Shu-Ching Chen. 2017. Automatic video event detection for imbalance data using enhanced ensemble deep learning. International Journal of Semantic Computing 11, 1 (2017), 85--109.Google ScholarCross Ref
- Samira Pouyanfar and Shu-Ching Chen. 2017. T-LRA: Trend-based learning rate annealing for deep neural networks. In The 3rd IEEE International Conference on Multimedia Big Data. IEEE, 50--57.Google ScholarCross Ref
- Samira Pouyanfar, Shu-Ching Chen, and Mei-Ling Shyu. 2017. An efficient deep residual-inception network for multimedia classification. In International Conference on Multimedia and Expo. IEEE, 373--378.Google ScholarCross Ref
- Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR abs/1511.06434 (2015). Retrieved from http://arxiv.org/abs/1511.06434.Google Scholar
- Rajesh Ranganath, Adler J. Perotte, Noémie Elhadad, and David M. Blei. 2016. Deep survival analysis. In Machine Learning in Health Care. JMLR.org, 101--114.Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 779--788.Google ScholarCross Ref
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. MIT Press, 91--99. Google ScholarDigital Library
- Frank Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 6 (1958), 386.Google ScholarCross Ref
- Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Deep Boltzmann machines. In Artificial Intelligence and Statistics. PMLR, 448--455.Google Scholar
- Ruslan Salakhutdinov and Geoffrey Hinton. 2012. An efficient learning procedure for deep Boltzmann machines. Neural Computation 24, 8 (2012), 1967--2006. Google ScholarDigital Library
- Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. International Conference on Artificial Neural Networks 6354 (2010), 92--101. Google ScholarDigital Library
- Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85--117. Google ScholarDigital Library
- Frank Seide, Gang Li, and Dong Yu. 2011. Conversational speech transcription using context-dependent deep neural networks. In 12th Annual Conference of the International Speech Communication Association. ISCA, 437--440.Google ScholarCross Ref
- Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, and Yann LeCun. 2013. Pedestrian detection with unsupervised multi-stage feature learning. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 3626--3633. Google ScholarDigital Library
- Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In The 23rd International World Wide Web Conference. ACM, 373--374. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). Retrieved from http://arxiv.org/abs/1409.1556.Google Scholar
- Skymind. 2017. Deeplearning4j deep learning framework. Retrieved from https://deeplearning4j.org. Accessed April 18, 2017.Google Scholar
- Paul Smolensky. 1986. Information Processing in Dynamical Systems: Foundations of Harmony Theory. Technical Report. DTIC Document.Google Scholar
- Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems, Vol. 24. Neural Information Processing Systems Foundation, 801--809. Google ScholarDigital Library
- Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. 2011. Parsing natural scenes and natural language with recursive neural networks. In International Conference on Machine Learning. Omnipress, 129--136. Google ScholarDigital Library
- Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing. Citeseer, Association for Computational Linguistics, 1631--1642.Google Scholar
- Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402 (2012). Retrieved from http://arxiv.org/abs/1212.0402.Google Scholar
- Hang Su and Haoyu Chen. 2015. Experiments on parallel training of deep neural network using model averaging. CoRR abs/1507.01239 (2015). Retrieved from http://arxiv.org/abs/1507.01239.Google Scholar
- Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey E. Hinton. 2013. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning. JMLR.org, 1139--1147. Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 1--9.Google ScholarCross Ref
- Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Communications of the ACM 59, 2 (2016), 64--73. Google ScholarDigital Library
- Haiman Tian and Shu-Ching Chen. 2017. MCA-NN: Multiple correspondence analysis based neural network for disaster information detection. In The 3rd IEEE International Conference on Multimedia Big Data. IEEE, 268--275.Google ScholarCross Ref
- Haiman Tian and Shu-Ching Chen. 2017. A video-aided semantic analytics system for disaster information integration. In The 3rd IEEE International Conference on Multimedia Big Data. IEEE, 242--243.Google ScholarCross Ref
- Antonio Torralba, Rob Fergus, and William T. Freeman. 2008. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 11 (2008), 1958--1970. Google ScholarDigital Library
- Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In IEEE International Conference on Computer Vision. IEEE, 4489--4497. Google ScholarDigital Library
- Transfer Learning. 2017. Convolutional Neural Network for Visual Recognition. Retrieved from http://cs231n.github.io/transfer-learning/. Accessed April 25, 2017.Google Scholar
- Trecvid. 2017. TREC Video Retrieval Evaluation. Retrieved from http://trecvid.nist.gov. Accessed April 18, 2017.Google Scholar
- Grigorios Tsagkatakis, Mustafa Jaber, and Panagiotis Tsakalides. 2017. Goal!! Event detection in sports video. Electronic Imaging 2017, 16 (2017), 15--20.Google ScholarCross Ref
- Nicolas Vasilache, Jeff Johnson, Michaël Mathieu, Soumith Chintala, Serkan Piantino, and Yann LeCun. 2014. Fast convolutional nets with fbfft: A GPU performance evaluation. CoRR abs/1412.7580 (2014). Retrieved from http://arxiv.org/abs/1412.7580.Google Scholar
- Soroush Vosoughi, Prashanth Vijayaraghavan, and Deb Roy. 2016. Tweet2Vec: learning Tweet embeddings using character-level CNN-LSTM encoder-decoder. In The 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1041--1044. Google ScholarDigital Library
- Chao Wang, Lei Gong, Qi Yu, Xi Li, Yuan Xie, and Xuehai Zhou. 2016. DLAU: A scalable deep learning accelerator unit on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 36, 3 (2016), 513--517. Google ScholarDigital Library
- Peng Wang, Baowen Xu, Yurong Wu, and Xiaoyu Zhou. 2015. Link prediction in social networks: The state-of-the-art. Science China Information Sciences 58, 1 (2015), 1--38.Google ScholarCross Ref
- Joonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C. Barros. 2017. A character-based convolutional neural network for language-agnostic Twitter sentiment analysis. In International Joint Conference on Neural Networks. IEEE, 2384--2391.Google Scholar
- Chao Weng, Dong Yu, Michael L. Seltzer, and Jasha Droppo. 2015. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ACM Transactions on Audio, Speech and Language Processing 23, 10 (2015), 1670--1679. Google ScholarDigital Library
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs/1609.08144 (2016). arxiv:1609.08144. Retrieved from http://arxiv.org/abs/1609.08144.Google Scholar
- Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2016. Aggregated residual transformations for deep neural networks. CoRR abs/1611.05431 (2016). Retrieved from http://arxiv.org/abs/1611.05431.Google Scholar
- Omry Yadan, Keith Adams, Yaniv Taigman, and Marc’Aurelio Ranzato. 2013. Multi-GPU training of ConvNets. CoRR abs/1312.5853 (2013).Google Scholar
- Yilin Yan, Min Chen, Saad Sadiq, and Mei-Ling Shyu. 2017. Efficient imbalanced multimedia concept retrieval by deep learning on spark clusters. International Journal of Multimedia Data Engineering and Management 8, 1 (2017), 1--20. Google ScholarDigital Library
- Yilin Yan, Min Chen, Mei-Ling Shyu, and Shu-Ching Chen. 2015. Deep learning for imbalanced multimedia data classification. In The IEEE International Symposium on Multimedia. IEEE, 483--488.Google ScholarCross Ref
- Yilin Yan, Qiusha Zhu, Mei-Ling Shyu, and Shu-Ching Chen. 2016. A classifier ensemble framework for multimedia big data classification. In The 17th IEEE International Conference on Information Reuse and Integration. IEEE, 615--622.Google ScholarDigital Library
- Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2015. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. CoRR abs/1512.05193 (2015). arxiv:1512.05193. Retrieved from http://arxiv.org/abs/1512.05193.Google Scholar
- Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoffrey Zweig, Christopher J. Rossbach, and Jon Currey. 2014. An introduction to computational networks and the computational network toolkit. In The 15th Annual Conference of the International Speech Communication Association. ISCA.Google Scholar
- Dong Yu, Morten Kolbæk, Zheng-Hua Tan, and Jesper Jensen. 2017. Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 241--245.Google ScholarCross Ref
- Qi Yu, Chao Wang, Xiang Ma, Xi Li, and Xuehai Zhou. 2015. A deep learning prediction process accelerator based FPGA. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 1159--1162.Google ScholarDigital Library
- Matthew D. Zeiler. 2012. ADADELTA: An adaptive learning rate method. CoRR abs/1212.5701 (2012). Retrieved from http://arxiv.org/abs/1212.5701.Google Scholar
- Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170. Google ScholarDigital Library
- Xueliang Zhang and DeLiang Wang. 2017. Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 (2017), 1075--1084. Google ScholarDigital Library
- Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems. 649--657. Google ScholarDigital Library
- Yue Zhao, Xingyu Jin, and Xiaolin Hu. 2017. Recurrent convolutional neural networks for speech processing. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE SigPort, 5300--5304.Google ScholarCross Ref
- Zhiwei Zhao and Youzheng Wu. 2016. Attention-based convolutional neural networks for sentence classification. In The 17th Annual Conference of the International Speech Communication Association. ISCA, 705--709.Google ScholarCross Ref
Index Terms
- A Survey on Deep Learning: Algorithms, Techniques, and Applications
Recommendations
Multi-agent deep reinforcement learning: a survey
AbstractThe advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid ...
Deep learning: systematic review, models, challenges, and research directions
AbstractThe current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...
Deep Learning: Methods and Applications
This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge ...
Comments