Abstract
There have been tremendous improvements in deep learning and reinforcement learning techniques. Automating learning and intelligence to the full extent remains a challenge. The amalgamation of Reinforcement Learning and Deep Learning has brought breakthroughs in games and robotics in the past decade. Deep Reinforcement Learning (DRL) involves training the agent with raw input and learning via interaction with the environment. Motivated by recent successes of DRL, we have explored its adaptability to different domains and application areas. This paper also presents a comprehensive survey of the work done in recent years and simulation tools used for DRL. The current focus of researchers is on recording the experience in a better way, and refining the policy for futuristic moves. It is found that even after obtaining good results in Atari, Go, Robotics, multi-agent scenarios, there are challenges such as generalization, satisfying multiple objectives, divergence, learning robust policy. Furthermore, the complex environment and multiple agents are throwing new challenges, which is an open area of research.
Similar content being viewed by others
Availability of Data and Materials
Not Applicable.
References
Ahmad SHA, Liu M, Javidi T, Zhao Q, Krishnamachari B (2009) Optimality of myopic sensing in multichannel opportunistic access. IEEE Trans Inf Theory 55(9):4040–4050
Abdullah Al W, Yun ID (2018) Partial policy-based reinforcement learning for anatomical landmark localization in 3d medical images. arXiv:1807.02908
Alabbasi A, Ghosh A, Aggarwal V (2019) Deeppool: distributed model-free algorithm for ride-sharing using deep reinforcement learning. arXiv:1903.03882
Alansary A, Le Folgoc L, Vaillant G, Oktay O, Li Y, Bai W, Passerat-Palmbach J, Guerrero R, Kamnitsas K, Hou B et al (2018) Automatic view planning with multi-scale deep reinforcement learning agents. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 277–285
Amos B, Xu L, Kolter JZ (2017) Input convex neural networks. In: Proceedings of the 34th international conference on machine learning, vol 70. JMLR. org, pp 146–155
Anylogic (2018) The anylogic company’s webplatform. https://www.anylogic.com/. Accessed 01 June 2019
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) A brief survey of deep reinforcement learning. arXiv:1708.05866
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38
Ashraf MI, Bennis M, Perfecto C, Saad W (2016) Dynamic proximity-aware resource allocation in vehicle-to-vehicle (v2v) communications. In: 2016 IEEE Globecom workshops (GC Wkshps)
Andrew HD, Nate K, John H, Willow G (2014) Gazebo: open source robotics foundation. http://gazebosim.org/. Accessed 28 May 2019
Baltrušaitis T, Robinson P, Morency L-P (2016) Openface: an open source facial behavior analysis toolkit. In: IEEE winter conference on applications of computer vision (WACV), pp 1–10. IEEE
Bard N, Foerster JN, Chandar S, Burch N, Lanctot M, Song HF, Parisotto E, Dumoulin V, Moitra S, Hughes E et al (2019) The Hanabi challenge: a new frontier for ai research. arXiv:1902.00506
Barros P, Bloem AC, Hootsmans IM, Opheij LM, Toebosch RHA, Barakova E, Sciutti A (2020) The chef’s hat simulation environment for reinforcement-learning-based agents. arXiv:2003.05861
Beattie C, Leibo JZ, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Green S, Valdés V, Sadik A et al (2016) Deepmind lab. arXiv:1612.03801
Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 449–458. JMLR. org
Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279
Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Krüger FA, Light Y, Mak L, McGlinchey S et al (2014) The chembl bioactivity database: an update. Nucl Acids Res 42(D1):D1083–D1090
Beveridge JR, Phillips PJ, Bolme DS, Draper BA, Givens GH, Lui YM, Teli MN, Zhang H, Scruggs WT, Bowyer KW et al (2013) The challenge of face recognition from digital point-and-shoot cameras. In: IEEE sixth international conference on biometrics: theory, applications and systems (BTAS), pp 1–8. IEEE, 2013
Bode H, Heid S, Weber D, Hullermeier E, Wallscheid O (2020) Towards a scalable and flexible simulation and testing environment toolbox for intelligent microgrid control. arXiv:2005.04869
Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K et al (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: IEEE international conference on robotics and automation (ICRA), pp 4243–4250. IEEE
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540
Cai P, Mei X, Tai L, Sun Y, Liu M (2020) High-speed autonomous drifting with deep reinforcement learning. IEEE Robot Autom Lett 5(2):1247–1254
Cai Y, Osman S, Sharma M, Landis M, Li S (2015) Multi-modality vertebra recognition in arbitrary views using 3d deformable hierarchical model. IEEE Trans Med Imaging 34(8):1676–1693
Cao Q, Lin L, Shi Y, Liang X, Li G (2017) Attention-aware face hallucination via deep reinforcement learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 690–698
Chen T, Wencong S (2018) Indirect customer-to-customer energy trading with reinforcement learning. IEEE Trans Smart Grid 10(4):4338–4348
Chen X, Fang H, Lin T-Y, Vedantam R, Gupta S, Dollár P, Zitnick CL (2015) Microsoft coco captions: data collection and evaluation server. arXiv:1504.00325
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078
Choudhary T, Mishra V, Goswami A, Sarangapani J (2020) A comprehensive survey on model compression and acceleration. Artif Intell Rev 53:5113–5155. https://doi.org/10.1007/s10462-020-09816-7
Chu W-S, Song Y, Jaimes A (2015) Video co-summarization: video summarization by visual co-occurrence. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3584–3592
Cobbe K, Hesse C, Hilton J, Schulman J (2019) Leveraging procedural generation to benchmark reinforcement learning. arXiv:1912.01588
Côté M-A, Kádár Á, Yuan X, Kybartas B, Barnes T, Fine E, Moore J, Hausknecht M, El Asri L, Adada M et al (2018) Textworld: a learning environment for text-based games. arXiv:1806.11532
Coumans E, Bai Y (2016) Pybullet, a python module for physics simulation for games, robotics and machine learning. GitHub repository
Cui R, Yang C, Li Y, Sharma S (2017) Adaptive neural network control of auvs with control input nonlinearities using reinforcement learning. IEEE Trans Syst Man Cybern Syst 47(6):1019–1029
Daftry S, Bagnell JA, Hebert M (2016) Learning transferable policies for monocular reactive mav control. In: International symposium on experimental robotics, pp 3–11. Springer
Dai W, Gai Y, Krishnamachari B (2012) Efficient online learning for opportunistic spectrum access. In: Proceedings IEEE INFOCOM, pp 3086–3090. IEEE
Dai W, Gai Y, Krishnamachari B (2014) Online learning for multi-channel opportunistic access over unknown Markovian channels. In: Eleventh annual IEEE international conference on sensing, communication, and networking (SECON), pp 64–71. IEEE
Dalal G, Dvijotham K, Vecerik M, Hester T, Paduraru C, Tassa Y (2018) Safe exploration in continuous action spaces. arXiv:1801.08757
Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) Covarep’a collaborative voice analysis repository for speech technologies. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 960–964. IEEE
Dehghan A, Tian Y, Torr PHS, Shah M (2015) Target identity-aware network flow for online multiple target tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1146–1154
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR09
Doctor J (2016) Sairen project. https://doctorj.gitlab.io/sairen/. Accessed 05 June 2019
Dong X, Shen J, Wang W, Liu Y, Shao L, Porikli F (2018) Hyperparameter optimization for tracking with continuous deep q-learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 518–527
Drugan MM (2019) Reinforcement learning versus evolutionary computation: a survey on hybrid algorithms. Swarm Evol Comput 44:228–246
Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, Doron Y, Firoiu V, Harley T, Dunning I et al (2018) Impala: scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv:1802.01561
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The pascal visual object classes challenge 2007 (voc2007) results
Florensa C, Degrave J, Heess N, Springenberg JT, Riedmiller M (2019) Self-supervised learning of image embedding for continuous control. arXiv:1901.00943
Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O et al (2017) Noisy networks for exploration. arXiv:1706.10295
Fox R, Pakman A, Tishby N (2015) Taming the noise in reinforcement learning via soft updates. arXiv:1512.08562
François-Lavet V , Henderson P, Islam R, Bellemare MG, Pineau J et al (2018) An introduction to deep reinforcement learning. Found Trends\({\textregistered }\) Mach Learn 11(3–4):219–354
Freese M, Singh S, Ozaki F, Matsuhira N (2010) Virtual robot experimentation platform v-rep: a versatile 3d robot simulator. In: International conference on simulation, modeling, and programming for autonomous robots, pp 51–62
Gao Y, Jiang D, Yan X (2018) Optimize taxi driving strategies based on reinforcement learning. Int J Geogr Inf Sci 32(8):1677–1696
Gaskett C, Wettergreen D, Zelinsky A (1999) Q-learning in continuous state and action spaces. In: Australasian joint conference on artificial intelligence, pp 417–428. Springer
Ghadirzadeh A, Maki A, Kragic D, Björkman M (2017) Deep predictive policy training using reinforcement learning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2351–2358. IEEE
Ghesu F-C, Georgescu B, Zheng Y, Grbic S, Maier A, Hornegger J, Comaniciu D (2019) Multi-scale deep reinforcement learning for real-time 3d-landmark detection in ct scans. IEEE Trans Pattern Anal Mach Intell 41(1):176–189
Gleave A, Dennis M, Wild C, Kant N, Levine S, Russell S (2019) Adversarial policies: attacking deep reinforcement learning. arXiv:1905.10615
Goyal A, Brakel P, Fedus W, Lillicrap T, Levine S, arochelle H, Bengio Y (2018) Recall traces: backtracking models for efficient reinforcement learning. arXiv:1804.00379
Goyal P, Malik H, Sharma R (2019) Application of evolutionary reinforcement learning (erl) approach in control domain: a review. In: Smart innovations in communication and computational sciences, pp 273–288. Springer
Gu S, Lillicrap T, Sutskever I, Levine S (2016) Continuous deep q-learning with model-based acceleration. In: International conference on machine learning, pp 2829–2838
Guo X, Singh S, Lee H, Lewis RL, Wang X (2014) Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In: Advances in neural information processing systems, pp 3338–3346
Guo Y, Yu R, An J, Yang K, He Y, Leung VCM (2019) Buffer-aware streaming in small scale wireless networks: a deep reinforcement learning approach. IEEE Trans Veh Technol 68(7):6891–6902
Gupta S, Sangeeta R, Mishra RS, Singal G, Badal T, Garg D (2020) Corridor segmentation for automatic robot navigation in indoor environment using edge devices. Comput Networks 178:107374
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: European conference on computer vision, pp 505–520. Springer
Gygli M, Grabner H, Van Gool L (2015) Video summarization by learning submodular mixtures of objectives. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3090–3098
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290
Hafner R, Riedmiller M (2011) Reinforcement learning in feedback control. Mach Learn 84(1–2):137–169
Hanna JP, Stone P (2017) Grounded action transformation for robot learning in simulation. In: Thirty-first AAAI conference on artificial intelligence
Hasselt HV (2010) Double q-learning. In: Advances in neural information processing systems, pp 2613–2621
He X, Wang K, Huang H, Miyazaki T, Wang Y, Guo S (2018) Green resource allocation based on deep reinforcement learning in content-centric IoT. IEEE Trans Emerg Top Comput 8(3):781–796
He Y, Lin J, Liu Z, Wang H, Li L-J, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp 784–800
Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2018) Deep reinforcement learning that matters. In: Thirty-second AAAI conference on artificial intelligence
Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Hosseini MJ, Hajishirzi H, Etzioni O, Kushman N (2014) Learning to solve arithmetic word problems with verb categorization. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 523–533
Huang D, Shi S, Lin C-Y, Yin J, Ma W-Y (2016) How well do computers solve math word problems? Large-scale dataset construction and evaluation
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Workshop on faces in ’real-life’ images: detection, alignment, and recognition
Huang W, Mordatch I, Pathak D (2020) One policy to control them all: shared modular policies for agent-agnostic control. arXiv:2007.04976
Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) Zinc: a free tool to discover chemistry for biology. J Chem Inf Model 52(7):1757–1768
Jaafra Y, Laurent JL, Deruyver A, Naceur MS (2019) Seeking for robustness in reinforcement learning: application on Carla simulator. In: International Conference on Machine Learning (ICML) Workshop RL4RealLife Submission. Modified Date: (07 Jun 2019).
Jaques N, Gu S, Turner RE, Eck D (2016) Generating music by fine-tuning recurrent neural networks with reinforcement learning. In: Deep Reinforcement Learning Workshop, NIPS.
Jaritz M, De Charette R, Toromanoff M, Perot E, Nashashibi F (2018) End-to-end race driving with deep reinforcement learning. In: IEEE international conference on robotics and automation (ICRA), pp 070–2075. IEEE
Jesorsky O, Kirchberg KJ, Frischholz RW (2001) Robust face detection using the hausdorff distance. In: International conference on audio-and video-based biometric person authentication, pP 90–95. Springer
Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv:1706.10059
Jin O, El-Saawy H (2016) Portfolio management using reinforcement learning. Technical report, working paper, Stanford University
Johnson AEW, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG (2016) Mimic-iii, a freely accessible critical care database. Sci Data 3:160035
Johnson M, Hofmann K, Hutton T, Bignell DD (2016) The malmo platform for artificial intelligence experimentation. In: IJCAI, pp 4246–4247
Jonsson A (2019) Deep reinforcement learning in medicine. Kidney Dis 5(1):3–7
Juliani A, Berges V-P, Vckay E, Gao Y, Henry H, Mattar M, Lange D (2018) Unity: a general platform for intelligent agents. arXiv:1809.02627
Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V et al (2018) Qt-opt: scalable deep reinforcement learning for vision-based robotic manipulation. arXiv:1806.10293
Kanehira A, Van Gool L, Ushiku Y, Harada T (2018) Viewpoint-aware video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, pp 18–22
Kang K, Belkhale S, Kahn G, Abbeel P, Levine S (2019) Generalization through simulation: integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. arXiv:1902.03701
Kaplan R, Sauer C, Sosa A (2017) Beating Atari with natural language guided reinforcement learning. arXiv:1704.05539
Kauchak D (2013) Improving text simplification language modeling using unsimplified text data. In: Proceedings of the 51st annual meeting of the association for computational linguistics. Long papers, vol 1, pp 1537–1546
Ke J, Xiao F, Yang H, Ye J (2019) Optimizing online matching for ride-sourcing services with multi-agent deep reinforcement learning. arXiv:1902.06228
Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) Vizdoom: a doom-based ai research platform for visual reinforcement learning. In: IEEE conference on computational intelligence and games (CIG), pp 1–8. IEEE
Khosla A, Hamid R, Lin C-J, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2698–2705
Kim M, Kumar S, Pavlovic V, Rowley H (2008) Face tracking and recognition with visual constraints in real-world videos. In: IEEE conference on computer vision and pattern recognition, pp 1–8. IEEE
Koch W (2019) Flight controller synthesis via deep reinforcement learning. arXiv:1909.06493
Kolve E, Mottaghi R, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: an interactive 3d environment for visual ai. arXiv:1712.05474
Kristan M et al (2013) The visual object tracking vot2013 challenge results. In: IEEE international conference on computer vision workshops. IEEE
Kristan M et al (2015) The visual object tracking vot2014 challenge results. In: Agapito L, Bronstein M, Rother C (eds) Computer vision—ECCV 2014 workshops. ECCV 2014. Lecture notes in computer science, pp 191–217. Springer, Cham
Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2015) The visual object tracking vot2015 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1–23
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lange S, Riedmiller M (2010) Deep auto-encoder neural networks in reinforcement learning. In: The international joint conference on neural networks (IJCNN), pp 1–8. IEEE
Lange S, Riedmiller M, Voigtländer A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: The international joint conference on neural networks (IJCNN), pp 1–8. IEEE
Lazaric A, Restelli M, Bonarini A (2008) Reinforcement learning in continuous action spaces through sequential Monte Carlo methods. In: Advances in neural information processing systems, pp 833–840
Lee H-Y, Chung P-H, Wu Y-C, Lin T-H, Wen T-H (2018) Interactive spoken content retrieval by deep reinforcement learning. IEEE/ACM Trans Audio Speech Lang Process 26(12):2447–2459
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: IEEE conference on computer vision and pattern recognition, pp 1346–1353. IEEE
Leuenberger G, Wiering MA (2018) Actor-critic reinforcement learning with neural networks in continuous games. In: ICAART (2), pp 53–60
Leurent E (2018) An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env. Accessed on 1 Jun 2020
Li C-H, Wu S-L, Liu C-L, Lee H (2018) Spoken squad: a study of mitigating the impact of speech recognition errors on listening comprehension. arXiv:1804.00320
Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. arXiv:1606.01541
Li Y (2017) Deep reinforcement learning: an overview. arXiv:1701.07274
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755. Springer
Liu H, Liu K, Zhao Q (2011) Logarithmic weak regret of non-Bayesian restless multi-armed bandit. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1968–1971. IEEE
Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Trans Inf Theory 56(11):5547–5567
Liu L, Hodgins J (2018) Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Trans Graph 37(4):142
Liu S, Ngiam KY, Feng M (2019) Deep reinforcement learning for clinical decision support: A brief survey. arXiv:1907.09475
Liu X, Xu Q, Chau T, Mu Y, Zhu L, Yan S (2018) Revisiting jump-diffusion process for visual tracking: a reinforcement learning approach. IEEE Trans Circuits Syst Video Technol 29(8):2431–2441
Lopez NG, Nuin YLE, Moral EB, Juan LUS, Rueda AS, Vilches VM, Kojcev R (2019) gym-gazebo2, a toolkit for reinforcement learning using ros 2 and gazebo
Lopez-Martinez D, Eschenfeldt P, Ostvar S, Ingram M, Hur C, Picard R (2019) Deep reinforcement learning for optimal critical care pain management with morphine using dueling double-deep q networks. arXiv:1904.11115
Lowe R, Wu Y, Tamar A, Harb J, Abbeel OAIP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390
Lowrey K, Kolev S, Dao J, Rajeswaran A, Todorov E (2018) Reinforcement learning for non-prehensile manipulation: transfer from simulation to physical system. In: IEEE international conference on simulation, modeling, and programming for autonomous robots (SIMPAR), pp 35–42. IEEE
Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2019) End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE Trans Pattern Anal Mach Intell 42:1317–1332
Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y-C, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor 21(4):3133–3174
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 202–211
Mahmud M, Kaiser MS, Hussain A, Vassanelli S (2018) Applications of deep learning and reinforcement learning to biological data. IEEE Trans Neural Netw Learn Syst 29(6):2063–2079
Maicas G, Carneiro G, Bradley AP, Nascimento JC, Reid I (2017) Deep reinforcement learning for active breast lesion detection from dce-mri. In: International conference on medical image computing and computer-assisted intervention, pp 665–673. Springer
Man Y, Huang Y, Feng J, Li X, Wu F (2019) Deep q learning driven ct pancreas segmentation with geometry-aware u-net. IEEE Trans Med Imaging 38(8):1971–1980
Manjari K, Verma M, Singal G (2020) A survey on Assistive Technology for visually impaired. Int Things 11:100188
Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks, pp 50–56. ACM
McClymont D, Mehnert A, Trakic A, Kennedy D, Crozier S (2014) Fully automatic lesion segmentation in breast mri using mean-shift and graph-cuts on a region adjacency graph. J Magn Reson Imaging 39(4):795–804
Microsoft (2014) Bonsai: Drl for industrial applications. https://www.bons.ai/ and https://aischool.microsoft.com/en-us/autonomous-systems/learning-paths. Accessed 30 May 2019
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv:1312.5602
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Mordatch I, Lowrey K, Todorov E (2015) Ensemble-cio: full-body dynamic motion planning that transfers to physical humanoids. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5307–5314. IEEE
Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack C, Jagust W, Trojanowski JQ, Toga AW, Beckett L (2005) The Alzheimer’s disease neuroimaging initiative. Neuroimaging Clin 15(4):869–877
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302
Naparstek O, Cohen K (2018) Deep multi-user reinforcement learning for distributed dynamic spectrum access. IEEE Trans Wirel Commun 18(1):310–323
Oh J, Guo X, Lee H, Lewis RL, Singh S (2015) Action-conditional video prediction using deep networks in Atari games. In: Advances in neural information processing systems, pp 2863–2871
Oh J, Guo Y, Singh S, Lee H (2018) Self-imitation learning. arXiv:1806.05635
Oh J, Hessel M, Czarnecki WM, Xu Z, van Hasselt H, Singh S, Silver D (2020) Discovering reinforcement learning algorithms. arXiv:2007.08794
Ortner R, Ryabko D, Auer P, Munos R (2012) Regret bounds for restless Markov bandits. In: International conference on algorithmic learning theory, pp 214–228. Springer
Ota K, Oiki T, Jha DK, Mariyama T, Nikovski D (2020) Can increasing input dimensionality improve deep reinforcement learning? arXiv:2003.01629
Pan L, Cai Q, Fang Z, Tang P, Huang L (2019) A deep reinforcement learning framework for rebalancing dockless bike sharing systems. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 1393–1400
Pan X, Seita D, Gao Y, Canny J (2019) Risk averse robust adversarial reinforcement learning. In: International conference on robotics and automation (ICRA), pp 8522–8528. IEEE
Panse A, Madheshia T, Sriraman A, Karande S (2018) Imitation learning on Atari using non-expert human annotations
Paulus R, Xiong C, Socher R (2017) A deep reinforced model for abstractive summarization. arXiv:1705.04304
Peng XB, Andrychowicz M, Zaremba W, Abbeel P (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: IEEE international conference on robotics and automation (ICRA), pp 1–8. IEEE
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Pinto L, Andrychowicz M, Welinder P, Zaremba W, Abbeel P (2017) Asymmetric actor critic for image-based robot learning. arXiv:1710.06542
Popova M, Isayev O, Tropsha A (2018) Deep reinforcement learning for de novo drug design. Sci Adv 4(7):eaap7885
Rajeswaran A, Ghotra S, Ravindran B, Levine S (2016) Epopt: learning robust neural network policies using model ensembles. arXiv:1610.01283
Ramani D (2019) A short survey on memory based reinforcement learning. arXiv:1904.06736
Rao Y, Lu J, Zhou J (2017) Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3931–3940
Rawlik K, Toussaint M, Vijayakumar S (2013) On stochastic optimal control and reinforcement learning by approximate inference. In: Twenty-third international joint conference on artificial intelligence
Ray A, Achiam J, Amodei D (2019) Benchmarking safe exploration in deep reinforcement learning
Ren Z, Wang X, Zhang N, Lv X, Li L-J (2017) Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 290–298
Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the rprop algorithm. In: Proceedings of the IEEE international conference on neural networks, vol 1993, pp 586–591. San Francisco
Rochan M, Wang Y (2019) Video summarization by learning from unpaired data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7902–7911
Rohmer E, Singh SPN, Freese M (2013) V-rep: a versatile and scalable robot simulation framework. In: IEEE/RSJ international conference on intelligent robots and systems, pp 1321–1326. IEEE
Ross S, Gordon G, Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 627–635
Roy S, Roth D (2016) Solving general arithmetic word problems. arXiv:1608.01413
Roy S, Vieira T, Roth D (2015) Reasoning about quantities in natural language. Trans Assoc Comput Linguist 3:1–13
Russ S (2018) Open dynamics engine. Accessed 01 June 2019
Russakovsky O, Deng J, Hao S, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Russel RH (2019) A short survey on probabilistic reinforcement learning. arXiv:1901.07010
Syracuse Research Corporation (1994) Physical/chemical property database–(physprop)
Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, Jain B, Straub J, Liu J, Koltun V, Malik J, Parikh D, Batra D (2019) Habitat: a platform for embodied AI research. arXiv:1904.01201
Sadeghi F, Levine S (2016) Cad2rl: real single-image flight without a single real image. arXiv:1611.04201
Sato Y (2019) Model-free reinforcement learning for financial portfolios: a brief survey. arXiv:1904.04973
Saunders W, Sastry G, Stuhlmueller A, Evans O (2018) Trial without error: towards safe reinforcement learning via human intervention. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp 2067–2069. International foundation for autonomous agents and multiagent systems
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. In: ICML, vol 37, pp 1889–1897
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Shibuya T, Yasunobu S (2011) Reinforcement learning with nonstationary reward depending on the episode. In: IEEE international conference on systems, man, and cybernetics, pp 2145–2150. IEEE
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354
Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2013) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
Song X, Chen K, Lei J, Sun L, Wang Z, Xie L, Song M (2016) Category driven deep recurrent neural network for video summarization. In: IEEE international conference on multimedia & expo workshops (ICMEW), pp 1–6. IEEE
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5179–5187
Stadie BC, Yang G, Houthooft R, Chen X, Duan Y, Wu Y, Abbeel P, Sutskever I (2018) Some considerations on learning to explore via meta-reinforcement learning. arXiv:1803.01118
Suri K, Shi XQ, Plataniotis KN, Lawryshyn YA (2020) Evolve to control: evolution-based soft actor-critic for scalable reinforcement learning. arXiv:2007.13690
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
Talpaert V, Sobh I, Kiran BR, Mannion P, Yogamani S, El-Sallab A, Perez P (2019) Exploring applications of deep reinforcement learning for real-world autonomous driving systems. arXiv:1901.01536
Tassa Y, Doron Y, Muldal A, Erez T, Li Y, de Las Casas D, Budden D, Abdolmaleki A, Merel J, Lefrancq A et al (2018) Deepmind control suite. arXiv:1801.00690
Tassa Y, Tunyasuvunakool S, Muldal A, Doron Y, Liu S, Bohez S, Merel J, Erez T, Lillicrap T, Heess N. dm\_control: software and tasks for continuous control. arXiv:2006.12983
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(7):1633–1685
Tekin C, Liu M (2011) Online learning in opportunistic spectrum access: a restless bandit approach. In: Proceedings IEEE INFOCOM, pp 2462–2470. IEEE
Tetko IV, Sushko Y, Novotarskyi S, Patiny L, Kondratov I, Petrenko AE, Charochkina L, Asiri AM (2014) How accurately can we predict the melting points of drug-like compounds? J Chem Inf Model 54(12):3320–3329
Thrun SB (1992) Efficient exploration in reinforcement learning. In: Technical Report, CMU-CS-92-102, Computer Science Department, Carnegie Mellon University
Traue A, Book G, Kirchgässner W, Wallscheid O (2019) Towards a reinforcement learning environment toolbox for intelligent electric motor control. In: IEEE Transactions on Neural Networks and Learning Systems
Trnsys (2017) Transient system simulation tool’s webplatform. http://www.trnsys.com/. Accessed 02 June 2019
Uzkent B, Ermon S (2020) Learning when and where to zoom with deep reinforcement learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12345–12354
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence
Vázquez-Canteli JR, Nagy Z (2019) Reinforcement learning for demand response: a review of algorithms and modeling techniques. Appl Energy 235:1072–1089
Veeramsetty V, Singal G, Badal T (2020) Coinnet: platform independent application to recognize Indian currency notes using deep learning techniques. Multimed Tools Appl 79(31–32):22569–22594
Verma S, Nair HS, Agarwal G, Dhar J, Shukla A (2020) Deep reinforcement learning for single-shot diagnosis and adaptation in damaged robots. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp 82–89
Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, Küttler H, Agapiou J, Schrittwieser J et al (2017) Starcraft ii: a new challenge for reinforcement learning. arXiv:1708.04782
Walraven E (2020) Solvepomdp. https://www.erwinwalraven.nl/solvepomdp/. Accessed 16 June 2020
Wan M, Gangwani T, Peng J (2020) Mutual information based knowledge transfer under state-action dimension mismatch. arXiv:2006.07041
Wang H-M, Chen B, Kuo J-W, Cheng S-S (2005) Matbn: a mandarin Chinese broadcast news corpus. Int J Comput Linguist Chin Lang Process 10(2). Special issue on annotated speech corpora 10(2):219–236
Wang L, Zhang D, Gao L, Song J, Guo L, Shen HT (2018) Mathdqn: solving arithmetic word problems via deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Wang S, Liu H, Gomes PH, Krishnamachari B (2018) Deep reinforcement learning for dynamic multichannel access in wireless networks. IEEE Trans Cognit Commun Netw 4(2):257–265
Wang Y, Bryant SH, Cheng T, Wang J, Gindulyte A, Shoemaker BA, Thiessen PA, He S, Zhang J (2016) Pubchem bioassay: 2017 update. Nucl Acids Res 45(D1):D955–D963
Wang Z, Li L, Yue X, Tian H, Cui S (2018) Handover control in wireless systems via asynchronous multiuser deep reinforcement learning. IEEE Internet of Things J 5(6):4296–4307
Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay. arXiv:1611.01224
Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581
Wen T-H, Lee H-Y, Su P, Lee L-S (2013) Interactive spoken content retrieval by extended query model and continuous state space Markov decision process. In: IEEE international conference on acoustics, speech and signal processing, pp 8510–8514. IEEE
Weng C, Yu D, Watanabe S, Juang B-HF (2014) Recurrent deep neural networks for robust speech recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5532–5536. IEEE
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. IEEE, Piscataway
Woodsend K, Lapata M (2011) Learning to simplify sentences with quasi-synchronous grammar and integer programming. In: Proceedings of the conference on empirical methods in natural language processing, pp 409–420. Association for Computational Linguistics
Wu Y-C, Lin T-H, Chen Y-D, Lee H-Y, Lee L-S (2016) Interactive spoken content retrieval by deep reinforcement learning. arXiv:1609.05234
Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418
Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848
Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Advances in neural information processing systems, pp 5279–5288
Wu Y, Hu B (2018) Learning to extract coherent summary via deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Xia F, Zamir AR, He Z, Sax A, Malik J, Savarese S (2018) Gibson env: real-world perception for embodied agents. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9068–9079D
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 532–539
Xu W, Callison-Burch C, Napoles C (2015) Problems in current text simplification research: new data can help. Trans Assoc Comput Linguist 3:283–297
Xu Z, Chen J, Tomizuka M (2020) Guided policy search model-based reinforcement learning for urban autonomous driving. arXiv:2005.03076
Yan X, Shao C, Wei C, Wang Y (2018) Look-ahead insertion policy for a shared-taxi system based on reinforcement learning. IEEE Access 6:5716–5726
Ye H, Li GY (2018) Deep reinforcement learning for resource allocation in v2v communications. In: IEEE international conference on communications (ICC), pp 1–6. IEEE
Ye H, Li GY, Juang B-H (2017) Power of deep learning for channel estimation and signal detection in ofdm systems. IEEE Wirel Commun Lett 7(1):114–117
Yu W, Tan J, Liu CK, Turk G (2017) Preparing for the unknown: learning a universal policy with online system identification. arXiv:1702.02453
Yun S, Choi J, Yoo Y, Yun K, Choi JY (2018) Action-driven visual object tracking with deep reinforcement learning. IEEE Trans Neural Netw Learn Syst 29(6):2239–2252
Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell Syst 31(6):82–88
Zamora I, Lopez NG, Vilches VM, Cordero AH (2016) Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo. arXiv:1608.05742
Zhang A, Ballas N, Pineau J (2018) A dissection of overfitting and generalization in continuous reinforcement learning. arXiv:1806.07937
Zhang C, Vinyals O, Munos R, Bengio S (2018) A study on overfitting in deep reinforcement learning. arXiv:1804.06893
Zhang F, Leitner J, Milford M, Corke P (2016) Modular deep q networks for sim-to-real transfer of visuo-motor policies. arXiv:1610.06781
Zhang J, Tai L, Yun P, Xiong Y, Liu M, Boedecker J, Burgard W (2019) Vr-goggles for robots: real-to-sim domain adaptation for visual control. IEEE Robot Autom Lett 4(2):1148–1155
Zhang K, Chao W-L, Sha F, Grauman K (2016) Summary transfer: exemplar-based subset selection for video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1059–1067
Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: European conference on computer vision, pp 766–782. Springer
Zhang L, Tan J, Liang Y-C, Feng G, Niyato D (2019) Deep reinforcement learning based modulation and coding scheme selection in cognitive heterogeneous networks. IEEE Trans Wirel Commun 18(6):3281–3294
Zhang X, Lapata M (2017) Sentence simplification with deep reinforcement learning. arXiv:1703.10931
Zhao B, Li X, Lu X (2018) Hsa-rnn: hierarchical structure-adaptive rnn for video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7405–7414
Zhao P, Wang Y, Chang N, Zhu Q, Lin X (2018) A deep reinforcement learning framework for optimizing fuel economy of hybrid electric vehicles. In: 2018 23rd Asia and South Pacific design automation conference (ASP-DAC), pp 196–202. IEEE
Zhao Q, Krishnamachari B, Liu K (2008) On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance. IEEE Trans Wirel Commun 7(12):5431–5440
Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li Z (2018) Drn: a deep reinforcement learning framework for news recommendation. In: Proceedings of the World Wide Web conference on World Wide Web, pp 167–176. International World Wide Web conferences steering committee
Zheng L, Yang J, Cai H, Zhou M, Zhang W, Wang J, Yu Y (2018) Magent: a many-agent reinforcement learning platform for artificial collective intelligence. In: Thirty-second AAAI conference on artificial intelligence
Zhong Z, Yang Z, Feng W, Wei W, Yangyang H, Liu C-L (2019) Decision controller for object tracking with deep reinforcement learning. IEEE Access 7:28069–28079
Zhou K, Qiao Y, Xiang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Thirty-second AAAI conference on artificial intelligence
Zhou Z, Li X, Zare RN (2017) Optimizing chemical reactions with deep reinforcement learning. ACS Cent Sci 3(12):1337–1344
Zhu Z, Bernhard D, Gurevych I (2010) A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd international conference on computational linguistics, pp 1353–1361. Association for Computational Linguistics
Funding
Not Applicable.
Author information
Authors and Affiliations
Contributions
Idea for the article: SG, GS, DG. Literature search: SG. Initial draft Prepared by: SG. Critical revisions done by: GS, DG. Contribution in writing future scope and challenges: GS, DG, SG.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Table 5 defines key MDP parameters that were considered in diversified applications. This table gives insight into the state, and action that were considered in the literature. This table also signifies that many problems can be solved by considering binary reward function.
Rights and permissions
About this article
Cite this article
Gupta, S., Singal, G. & Garg, D. Deep Reinforcement Learning Techniques in Diversified Domains: A Survey. Arch Computat Methods Eng 28, 4715–4754 (2021). https://doi.org/10.1007/s11831-021-09552-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11831-021-09552-3