Skip to main content
Log in

Deep Reinforcement Learning Techniques in Diversified Domains: A Survey

  • Original Paper
  • Published:
Archives of Computational Methods in Engineering Aims and scope Submit manuscript

Abstract

There have been tremendous improvements in deep learning and reinforcement learning techniques. Automating learning and intelligence to the full extent remains a challenge. The amalgamation of Reinforcement Learning and Deep Learning has brought breakthroughs in games and robotics in the past decade. Deep Reinforcement Learning (DRL) involves training the agent with raw input and learning via interaction with the environment. Motivated by recent successes of DRL, we have explored its adaptability to different domains and application areas. This paper also presents a comprehensive survey of the work done in recent years and simulation tools used for DRL. The current focus of researchers is on recording the experience in a better way, and refining the policy for futuristic moves. It is found that even after obtaining good results in Atari, Go, Robotics, multi-agent scenarios, there are challenges such as generalization, satisfying multiple objectives, divergence, learning robust policy. Furthermore, the complex environment and multiple agents are throwing new challenges, which is an open area of research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Availability of Data and Materials

Not Applicable.

References

  1. Ahmad SHA, Liu M, Javidi T, Zhao Q, Krishnamachari B (2009) Optimality of myopic sensing in multichannel opportunistic access. IEEE Trans Inf Theory 55(9):4040–4050

    Article  MathSciNet  MATH  Google Scholar 

  2. Abdullah Al W, Yun ID (2018) Partial policy-based reinforcement learning for anatomical landmark localization in 3d medical images. arXiv:1807.02908

  3. Alabbasi A, Ghosh A, Aggarwal V (2019) Deeppool: distributed model-free algorithm for ride-sharing using deep reinforcement learning. arXiv:1903.03882

  4. Alansary A, Le Folgoc L, Vaillant G, Oktay O, Li Y, Bai W, Passerat-Palmbach J, Guerrero R, Kamnitsas K, Hou B et al (2018) Automatic view planning with multi-scale deep reinforcement learning agents. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 277–285

  5. Amos B, Xu L, Kolter JZ (2017) Input convex neural networks. In: Proceedings of the 34th international conference on machine learning, vol 70. JMLR. org, pp 146–155

  6. Anylogic (2018) The anylogic company’s webplatform. https://www.anylogic.com/. Accessed 01 June 2019

  7. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) A brief survey of deep reinforcement learning. arXiv:1708.05866

  8. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38

    Article  Google Scholar 

  9. Ashraf MI, Bennis M, Perfecto C, Saad W (2016) Dynamic proximity-aware resource allocation in vehicle-to-vehicle (v2v) communications. In: 2016 IEEE Globecom workshops (GC Wkshps)

  10. Andrew HD, Nate K, John H, Willow G (2014) Gazebo: open source robotics foundation. http://gazebosim.org/. Accessed 28 May 2019

  11. Baltrušaitis T, Robinson P, Morency L-P (2016) Openface: an open source facial behavior analysis toolkit. In: IEEE winter conference on applications of computer vision (WACV), pp 1–10. IEEE

  12. Bard N, Foerster JN, Chandar S, Burch N, Lanctot M, Song HF, Parisotto E, Dumoulin V, Moitra S, Hughes E et al (2019) The Hanabi challenge: a new frontier for ai research. arXiv:1902.00506

  13. Barros P, Bloem AC, Hootsmans IM, Opheij LM, Toebosch RHA, Barakova E, Sciutti A (2020) The chef’s hat simulation environment for reinforcement-learning-based agents. arXiv:2003.05861

  14. Beattie C, Leibo JZ, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Green S, Valdés V, Sadik A et al (2016) Deepmind lab. arXiv:1612.03801

  15. Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 449–458. JMLR. org

  16. Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47:253–279

    Article  Google Scholar 

  17. Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Krüger FA, Light Y, Mak L, McGlinchey S et al (2014) The chembl bioactivity database: an update. Nucl Acids Res 42(D1):D1083–D1090

    Article  Google Scholar 

  18. Beveridge JR, Phillips PJ, Bolme DS, Draper BA, Givens GH, Lui YM, Teli MN, Zhang H, Scruggs WT, Bowyer KW et al (2013) The challenge of face recognition from digital point-and-shoot cameras. In: IEEE sixth international conference on biometrics: theory, applications and systems (BTAS), pp 1–8. IEEE, 2013

  19. Bode H, Heid S, Weber D, Hullermeier E, Wallscheid O (2020) Towards a scalable and flexible simulation and testing environment toolbox for intelligent microgrid control. arXiv:2005.04869

  20. Bousmalis K, Irpan A, Wohlhart P, Bai Y, Kelcey M, Kalakrishnan M, Downs L, Ibarz J, Pastor P, Konolige K et al (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: IEEE international conference on robotics and automation (ICRA), pp 4243–4250. IEEE

  21. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540

  22. Cai P, Mei X, Tai L, Sun Y, Liu M (2020) High-speed autonomous drifting with deep reinforcement learning. IEEE Robot Autom Lett 5(2):1247–1254

    Article  Google Scholar 

  23. Cai Y, Osman S, Sharma M, Landis M, Li S (2015) Multi-modality vertebra recognition in arbitrary views using 3d deformable hierarchical model. IEEE Trans Med Imaging 34(8):1676–1693

    Article  Google Scholar 

  24. Cao Q, Lin L, Shi Y, Liang X, Li G (2017) Attention-aware face hallucination via deep reinforcement learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 690–698

  25. Chen T, Wencong S (2018) Indirect customer-to-customer energy trading with reinforcement learning. IEEE Trans Smart Grid 10(4):4338–4348

    Article  Google Scholar 

  26. Chen X, Fang H, Lin T-Y, Vedantam R, Gupta S, Dollár P, Zitnick CL (2015) Microsoft coco captions: data collection and evaluation server. arXiv:1504.00325

  27. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078

  28. Choudhary T, Mishra V, Goswami A, Sarangapani J (2020) A comprehensive survey on model compression and acceleration. Artif Intell Rev 53:5113–5155. https://doi.org/10.1007/s10462-020-09816-7

  29. Chu W-S, Song Y, Jaimes A (2015) Video co-summarization: video summarization by visual co-occurrence. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3584–3592

  30. Cobbe K, Hesse C, Hilton J, Schulman J (2019) Leveraging procedural generation to benchmark reinforcement learning. arXiv:1912.01588

  31. Côté M-A, Kádár Á, Yuan X, Kybartas B, Barnes T, Fine E, Moore J, Hausknecht M, El Asri L, Adada M et al (2018) Textworld: a learning environment for text-based games. arXiv:1806.11532

  32. Coumans E, Bai Y (2016) Pybullet, a python module for physics simulation for games, robotics and machine learning. GitHub repository

  33. Cui R, Yang C, Li Y, Sharma S (2017) Adaptive neural network control of auvs with control input nonlinearities using reinforcement learning. IEEE Trans Syst Man Cybern Syst 47(6):1019–1029

    Article  Google Scholar 

  34. Daftry S, Bagnell JA, Hebert M (2016) Learning transferable policies for monocular reactive mav control. In: International symposium on experimental robotics, pp 3–11. Springer

  35. Dai W, Gai Y, Krishnamachari B (2012) Efficient online learning for opportunistic spectrum access. In: Proceedings IEEE INFOCOM, pp 3086–3090. IEEE

  36. Dai W, Gai Y, Krishnamachari B (2014) Online learning for multi-channel opportunistic access over unknown Markovian channels. In: Eleventh annual IEEE international conference on sensing, communication, and networking (SECON), pp 64–71. IEEE

  37. Dalal G, Dvijotham K, Vecerik M, Hester T, Paduraru C, Tassa Y (2018) Safe exploration in continuous action spaces. arXiv:1801.08757

  38. Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) Covarep’a collaborative voice analysis repository for speech technologies. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 960–964. IEEE

  39. Dehghan A, Tian Y, Torr PHS, Shah M (2015) Target identity-aware network flow for online multiple target tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1146–1154

  40. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: CVPR09

  41. Doctor J (2016) Sairen project. https://doctorj.gitlab.io/sairen/. Accessed 05 June 2019

  42. Dong X, Shen J, Wang W, Liu Y, Shao L, Porikli F (2018) Hyperparameter optimization for tracking with continuous deep q-learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 518–527

  43. Drugan MM (2019) Reinforcement learning versus evolutionary computation: a survey on hybrid algorithms. Swarm Evol Comput 44:228–246

    Article  Google Scholar 

  44. Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, Doron Y, Firoiu V, Harley T, Dunning I et al (2018) Impala: scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv:1802.01561

  45. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The pascal visual object classes challenge 2007 (voc2007) results

  46. Florensa C, Degrave J, Heess N, Springenberg JT, Riedmiller M (2019) Self-supervised learning of image embedding for continuous control. arXiv:1901.00943

  47. Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O et al (2017) Noisy networks for exploration. arXiv:1706.10295

  48. Fox R, Pakman A, Tishby N (2015) Taming the noise in reinforcement learning via soft updates. arXiv:1512.08562

  49. François-Lavet V , Henderson P, Islam R, Bellemare MG, Pineau J et al (2018) An introduction to deep reinforcement learning. Found Trends\({\textregistered }\) Mach Learn 11(3–4):219–354

  50. Freese M, Singh S, Ozaki F, Matsuhira N (2010) Virtual robot experimentation platform v-rep: a versatile 3d robot simulator. In: International conference on simulation, modeling, and programming for autonomous robots, pp 51–62

  51. Gao Y, Jiang D, Yan X (2018) Optimize taxi driving strategies based on reinforcement learning. Int J Geogr Inf Sci 32(8):1677–1696

    Article  Google Scholar 

  52. Gaskett C, Wettergreen D, Zelinsky A (1999) Q-learning in continuous state and action spaces. In: Australasian joint conference on artificial intelligence, pp 417–428. Springer

  53. Ghadirzadeh A, Maki A, Kragic D, Björkman M (2017) Deep predictive policy training using reinforcement learning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2351–2358. IEEE

  54. Ghesu F-C, Georgescu B, Zheng Y, Grbic S, Maier A, Hornegger J, Comaniciu D (2019) Multi-scale deep reinforcement learning for real-time 3d-landmark detection in ct scans. IEEE Trans Pattern Anal Mach Intell 41(1):176–189

    Article  Google Scholar 

  55. Gleave A, Dennis M, Wild C, Kant N, Levine S, Russell S (2019) Adversarial policies: attacking deep reinforcement learning. arXiv:1905.10615

  56. Goyal A, Brakel P, Fedus W, Lillicrap T, Levine S, arochelle H, Bengio Y (2018) Recall traces: backtracking models for efficient reinforcement learning. arXiv:1804.00379

  57. Goyal P, Malik H, Sharma R (2019) Application of evolutionary reinforcement learning (erl) approach in control domain: a review. In: Smart innovations in communication and computational sciences, pp 273–288. Springer

  58. Gu S, Lillicrap T, Sutskever I, Levine S (2016) Continuous deep q-learning with model-based acceleration. In: International conference on machine learning, pp 2829–2838

  59. Guo X, Singh S, Lee H, Lewis RL, Wang X (2014) Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In: Advances in neural information processing systems, pp 3338–3346

  60. Guo Y, Yu R, An J, Yang K, He Y, Leung VCM (2019) Buffer-aware streaming in small scale wireless networks: a deep reinforcement learning approach. IEEE Trans Veh Technol 68(7):6891–6902

    Article  Google Scholar 

  61. Gupta S, Sangeeta R, Mishra RS, Singal G, Badal T, Garg D (2020) Corridor segmentation for automatic robot navigation in indoor environment using edge devices. Comput Networks 178:107374

    Article  Google Scholar 

  62. Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: European conference on computer vision, pp 505–520. Springer

  63. Gygli M, Grabner H, Van Gool L (2015) Video summarization by learning submodular mixtures of objectives. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3090–3098

  64. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv:1801.01290

  65. Hafner R, Riedmiller M (2011) Reinforcement learning in feedback control. Mach Learn 84(1–2):137–169

    Article  MathSciNet  Google Scholar 

  66. Hanna JP, Stone P (2017) Grounded action transformation for robot learning in simulation. In: Thirty-first AAAI conference on artificial intelligence

  67. Hasselt HV (2010) Double q-learning. In: Advances in neural information processing systems, pp 2613–2621

  68. He X, Wang K, Huang H, Miyazaki T, Wang Y, Guo S (2018) Green resource allocation based on deep reinforcement learning in content-centric IoT. IEEE Trans Emerg Top Comput 8(3):781–796

    Article  Google Scholar 

  69. He Y, Lin J, Liu Z, Wang H, Li L-J, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: Proceedings of the European conference on computer vision (ECCV), pp 784–800

  70. Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D (2018) Deep reinforcement learning that matters. In: Thirty-second AAAI conference on artificial intelligence

  71. Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  72. Hosseini MJ, Hajishirzi H, Etzioni O, Kushman N (2014) Learning to solve arithmetic word problems with verb categorization. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 523–533

  73. Huang D, Shi S, Lin C-Y, Yin J, Ma W-Y (2016) How well do computers solve math word problems? Large-scale dataset construction and evaluation

  74. Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Workshop on faces in ’real-life’ images: detection, alignment, and recognition

  75. Huang W, Mordatch I, Pathak D (2020) One policy to control them all: shared modular policies for agent-agnostic control. arXiv:2007.04976

  76. Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) Zinc: a free tool to discover chemistry for biology. J Chem Inf Model 52(7):1757–1768

    Article  Google Scholar 

  77. Jaafra Y, Laurent JL, Deruyver A, Naceur MS (2019) Seeking for robustness in reinforcement learning: application on Carla simulator. In: International Conference on Machine Learning (ICML) Workshop RL4RealLife Submission. Modified Date: (07 Jun 2019).

  78. Jaques N, Gu S, Turner RE, Eck D (2016) Generating music by fine-tuning recurrent neural networks with reinforcement learning. In: Deep Reinforcement Learning Workshop, NIPS.

  79. Jaritz M, De Charette R, Toromanoff M, Perot E, Nashashibi F (2018) End-to-end race driving with deep reinforcement learning. In: IEEE international conference on robotics and automation (ICRA), pp 070–2075. IEEE

  80. Jesorsky O, Kirchberg KJ, Frischholz RW (2001) Robust face detection using the hausdorff distance. In: International conference on audio-and video-based biometric person authentication, pP 90–95. Springer

  81. Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv:1706.10059

  82. Jin O, El-Saawy H (2016) Portfolio management using reinforcement learning. Technical report, working paper, Stanford University

  83. Johnson AEW, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG (2016) Mimic-iii, a freely accessible critical care database. Sci Data 3:160035

    Article  Google Scholar 

  84. Johnson M, Hofmann K, Hutton T, Bignell DD (2016) The malmo platform for artificial intelligence experimentation. In: IJCAI, pp 4246–4247

  85. Jonsson A (2019) Deep reinforcement learning in medicine. Kidney Dis 5(1):3–7

    Article  Google Scholar 

  86. Juliani A, Berges V-P, Vckay E, Gao Y, Henry H, Mattar M, Lange D (2018) Unity: a general platform for intelligent agents. arXiv:1809.02627

  87. Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V et al (2018) Qt-opt: scalable deep reinforcement learning for vision-based robotic manipulation. arXiv:1806.10293

  88. Kanehira A, Van Gool L, Ushiku Y, Harada T (2018) Viewpoint-aware video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, pp 18–22

  89. Kang K, Belkhale S, Kahn G, Abbeel P, Levine S (2019) Generalization through simulation: integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. arXiv:1902.03701

  90. Kaplan R, Sauer C, Sosa A (2017) Beating Atari with natural language guided reinforcement learning. arXiv:1704.05539

  91. Kauchak D (2013) Improving text simplification language modeling using unsimplified text data. In: Proceedings of the 51st annual meeting of the association for computational linguistics. Long papers, vol 1, pp 1537–1546

  92. Ke J, Xiao F, Yang H, Ye J (2019) Optimizing online matching for ride-sourcing services with multi-agent deep reinforcement learning. arXiv:1902.06228

  93. Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) Vizdoom: a doom-based ai research platform for visual reinforcement learning. In: IEEE conference on computational intelligence and games (CIG), pp 1–8. IEEE

  94. Khosla A, Hamid R, Lin C-J, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2698–2705

  95. Kim M, Kumar S, Pavlovic V, Rowley H (2008) Face tracking and recognition with visual constraints in real-world videos. In: IEEE conference on computer vision and pattern recognition, pp 1–8. IEEE

  96. Koch W (2019) Flight controller synthesis via deep reinforcement learning. arXiv:1909.06493

  97. Kolve E, Mottaghi R, Gordon D, Zhu Y, Gupta A, Farhadi A (2017) Ai2-thor: an interactive 3d environment for visual ai. arXiv:1712.05474

  98. Kristan M et al (2013) The visual object tracking vot2013 challenge results. In: IEEE international conference on computer vision workshops. IEEE

  99. Kristan M et al (2015) The visual object tracking vot2014 challenge results. In: Agapito L, Bronstein M, Rother C (eds) Computer vision—ECCV 2014 workshops. ECCV 2014. Lecture notes in computer science, pp 191–217. Springer, Cham

  100. Kristan M, Matas J, Leonardis A, Felsberg M, Cehovin L, Fernandez G, Vojir T, Hager G, Nebehay G, Pflugfelder R (2015) The visual object tracking vot2015 challenge results. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1–23

  101. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer

  102. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  103. Lange S, Riedmiller M (2010) Deep auto-encoder neural networks in reinforcement learning. In: The international joint conference on neural networks (IJCNN), pp 1–8. IEEE

  104. Lange S, Riedmiller M, Voigtländer A (2012) Autonomous reinforcement learning on raw visual input data in a real world application. In: The international joint conference on neural networks (IJCNN), pp 1–8. IEEE

  105. Lazaric A, Restelli M, Bonarini A (2008) Reinforcement learning in continuous action spaces through sequential Monte Carlo methods. In: Advances in neural information processing systems, pp 833–840

  106. Lee H-Y, Chung P-H, Wu Y-C, Lin T-H, Wen T-H (2018) Interactive spoken content retrieval by deep reinforcement learning. IEEE/ACM Trans Audio Speech Lang Process 26(12):2447–2459

    Article  Google Scholar 

  107. Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: IEEE conference on computer vision and pattern recognition, pp 1346–1353. IEEE

  108. Leuenberger G, Wiering MA (2018) Actor-critic reinforcement learning with neural networks in continuous games. In: ICAART (2), pp 53–60

  109. Leurent E (2018) An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env. Accessed on 1 Jun 2020

  110. Li C-H, Wu S-L, Liu C-L, Lee H (2018) Spoken squad: a study of mitigating the impact of speech recognition errors on listening comprehension. arXiv:1804.00320

  111. Li J, Monroe W, Ritter A, Galley M, Gao J, Jurafsky D (2016) Deep reinforcement learning for dialogue generation. arXiv:1606.01541

  112. Li Y (2017) Deep reinforcement learning: an overview. arXiv:1701.07274

  113. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971

  114. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision, pp 740–755. Springer

  115. Liu H, Liu K, Zhao Q (2011) Logarithmic weak regret of non-Bayesian restless multi-armed bandit. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1968–1971. IEEE

  116. Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access. IEEE Trans Inf Theory 56(11):5547–5567

    Article  MathSciNet  MATH  Google Scholar 

  117. Liu L, Hodgins J (2018) Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning. ACM Trans Graph 37(4):142

    Article  Google Scholar 

  118. Liu S, Ngiam KY, Feng M (2019) Deep reinforcement learning for clinical decision support: A brief survey. arXiv:1907.09475

  119. Liu X, Xu Q, Chau T, Mu Y, Zhu L, Yan S (2018) Revisiting jump-diffusion process for visual tracking: a reinforcement learning approach. IEEE Trans Circuits Syst Video Technol 29(8):2431–2441

    Article  Google Scholar 

  120. Lopez NG, Nuin YLE, Moral EB, Juan LUS, Rueda AS, Vilches VM, Kojcev R (2019) gym-gazebo2, a toolkit for reinforcement learning using ros 2 and gazebo

  121. Lopez-Martinez D, Eschenfeldt P, Ostvar S, Ingram M, Hur C, Picard R (2019) Deep reinforcement learning for optimal critical care pain management with morphine using dueling double-deep q networks. arXiv:1904.11115

  122. Lowe R, Wu Y, Tamar A, Harb J, Abbeel OAIP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390

  123. Lowrey K, Kolev S, Dao J, Rajeswaran A, Todorov E (2018) Reinforcement learning for non-prehensile manipulation: transfer from simulation to physical system. In: IEEE international conference on simulation, modeling, and programming for autonomous robots (SIMPAR), pp 35–42. IEEE

  124. Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2019) End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE Trans Pattern Anal Mach Intell 42:1317–1332

    Article  Google Scholar 

  125. Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y-C, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor 21(4):3133–3174

    Article  Google Scholar 

  126. Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 202–211

  127. Mahmud M, Kaiser MS, Hussain A, Vassanelli S (2018) Applications of deep learning and reinforcement learning to biological data. IEEE Trans Neural Netw Learn Syst 29(6):2063–2079

    Article  MathSciNet  Google Scholar 

  128. Maicas G, Carneiro G, Bradley AP, Nascimento JC, Reid I (2017) Deep reinforcement learning for active breast lesion detection from dce-mri. In: International conference on medical image computing and computer-assisted intervention, pp 665–673. Springer

  129. Man Y, Huang Y, Feng J, Li X, Wu F (2019) Deep q learning driven ct pancreas segmentation with geometry-aware u-net. IEEE Trans Med Imaging 38(8):1971–1980

    Article  Google Scholar 

  130. Manjari K, Verma M, Singal G (2020) A survey on Assistive Technology for visually impaired. Int Things 11:100188

    Article  Google Scholar 

  131. Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks, pp 50–56. ACM

  132. McClymont D, Mehnert A, Trakic A, Kennedy D, Crozier S (2014) Fully automatic lesion segmentation in breast mri using mean-shift and graph-cuts on a region adjacency graph. J Magn Reson Imaging 39(4):795–804

    Article  Google Scholar 

  133. Microsoft (2014) Bonsai: Drl for industrial applications. https://www.bons.ai/ and https://aischool.microsoft.com/en-us/autonomous-systems/learning-paths. Accessed 30 May 2019

  134. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937

  135. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. arXiv:1312.5602

  136. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Article  Google Scholar 

  137. Mordatch I, Lowrey K, Todorov E (2015) Ensemble-cio: full-body dynamic motion planning that transfers to physical humanoids. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5307–5314. IEEE

  138. Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack C, Jagust W, Trojanowski JQ, Toga AW, Beckett L (2005) The Alzheimer’s disease neuroimaging initiative. Neuroimaging Clin 15(4):869–877

    Article  Google Scholar 

  139. Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

  140. Naparstek O, Cohen K (2018) Deep multi-user reinforcement learning for distributed dynamic spectrum access. IEEE Trans Wirel Commun 18(1):310–323

    Article  Google Scholar 

  141. Oh J, Guo X, Lee H, Lewis RL, Singh S (2015) Action-conditional video prediction using deep networks in Atari games. In: Advances in neural information processing systems, pp 2863–2871

  142. Oh J, Guo Y, Singh S, Lee H (2018) Self-imitation learning. arXiv:1806.05635

  143. Oh J, Hessel M, Czarnecki WM, Xu Z, van Hasselt H, Singh S, Silver D (2020) Discovering reinforcement learning algorithms. arXiv:2007.08794

  144. Ortner R, Ryabko D, Auer P, Munos R (2012) Regret bounds for restless Markov bandits. In: International conference on algorithmic learning theory, pp 214–228. Springer

  145. Ota K, Oiki T, Jha DK, Mariyama T, Nikovski D (2020) Can increasing input dimensionality improve deep reinforcement learning? arXiv:2003.01629

  146. Pan L, Cai Q, Fang Z, Tang P, Huang L (2019) A deep reinforcement learning framework for rebalancing dockless bike sharing systems. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 1393–1400

  147. Pan X, Seita D, Gao Y, Canny J (2019) Risk averse robust adversarial reinforcement learning. In: International conference on robotics and automation (ICRA), pp 8522–8528. IEEE

  148. Panse A, Madheshia T, Sriraman A, Karande S (2018) Imitation learning on Atari using non-expert human annotations

  149. Paulus R, Xiong C, Socher R (2017) A deep reinforced model for abstractive summarization. arXiv:1705.04304

  150. Peng XB, Andrychowicz M, Zaremba W, Abbeel P (2018) Sim-to-real transfer of robotic control with dynamics randomization. In: IEEE international conference on robotics and automation (ICRA), pp 1–8. IEEE

  151. Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  152. Pinto L, Andrychowicz M, Welinder P, Zaremba W, Abbeel P (2017) Asymmetric actor critic for image-based robot learning. arXiv:1710.06542

  153. Popova M, Isayev O, Tropsha A (2018) Deep reinforcement learning for de novo drug design. Sci Adv 4(7):eaap7885

    Article  Google Scholar 

  154. Rajeswaran A, Ghotra S, Ravindran B, Levine S (2016) Epopt: learning robust neural network policies using model ensembles. arXiv:1610.01283

  155. Ramani D (2019) A short survey on memory based reinforcement learning. arXiv:1904.06736

  156. Rao Y, Lu J, Zhou J (2017) Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3931–3940

  157. Rawlik K, Toussaint M, Vijayakumar S (2013) On stochastic optimal control and reinforcement learning by approximate inference. In: Twenty-third international joint conference on artificial intelligence

  158. Ray A, Achiam J, Amodei D (2019) Benchmarking safe exploration in deep reinforcement learning

  159. Ren Z, Wang X, Zhang N, Lv X, Li L-J (2017) Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 290–298

  160. Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the rprop algorithm. In: Proceedings of the IEEE international conference on neural networks, vol 1993, pp 586–591. San Francisco

  161. Rochan M, Wang Y (2019) Video summarization by learning from unpaired data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7902–7911

  162. Rohmer E, Singh SPN, Freese M (2013) V-rep: a versatile and scalable robot simulation framework. In: IEEE/RSJ international conference on intelligent robots and systems, pp 1321–1326. IEEE

  163. Ross S, Gordon G, Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 627–635

  164. Roy S, Roth D (2016) Solving general arithmetic word problems. arXiv:1608.01413

  165. Roy S, Vieira T, Roth D (2015) Reasoning about quantities in natural language. Trans Assoc Comput Linguist 3:1–13

    Article  Google Scholar 

  166. Russ S (2018) Open dynamics engine. Accessed 01 June 2019

  167. Russakovsky O, Deng J, Hao S, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  168. Russel RH (2019) A short survey on probabilistic reinforcement learning. arXiv:1901.07010

  169. Syracuse Research Corporation (1994) Physical/chemical property database–(physprop)

  170. Savva M, Kadian A, Maksymets O, Zhao Y, Wijmans E, Jain B, Straub J, Liu J, Koltun V, Malik J, Parikh D, Batra D (2019) Habitat: a platform for embodied AI research. arXiv:1904.01201

  171. Sadeghi F, Levine S (2016) Cad2rl: real single-image flight without a single real image. arXiv:1611.04201

  172. Sato Y (2019) Model-free reinforcement learning for financial portfolios: a brief survey. arXiv:1904.04973

  173. Saunders W, Sastry G, Stuhlmueller A, Evans O (2018) Trial without error: towards safe reinforcement learning via human intervention. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, pp 2067–2069. International foundation for autonomous agents and multiagent systems

  174. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952

  175. Schulman J, Levine S, Abbeel P, Jordan MI, Moritz P (2015) Trust region policy optimization. In: ICML, vol 37, pp 1889–1897

  176. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347

  177. Shibuya T, Yasunobu S (2011) Reinforcement learning with nonstationary reward depending on the episode. In: IEEE international conference on systems, man, and cybernetics, pp 2145–2150. IEEE

  178. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  179. Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms

  180. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354

    Article  Google Scholar 

  181. Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2013) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468

    Google Scholar 

  182. Song X, Chen K, Lei J, Sun L, Wang Z, Xie L, Song M (2016) Category driven deep recurrent neural network for video summarization. In: IEEE international conference on multimedia & expo workshops (ICMEW), pp 1–6. IEEE

  183. Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5179–5187

  184. Stadie BC, Yang G, Houthooft R, Chen X, Duan Y, Wu Y, Abbeel P, Sutskever I (2018) Some considerations on learning to explore via meta-reinforcement learning. arXiv:1803.01118

  185. Suri K, Shi XQ, Plataniotis KN, Lawryshyn YA (2020) Evolve to control: evolution-based soft actor-critic for scalable reinforcement learning. arXiv:2007.13690

  186. Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44

    Article  Google Scholar 

  187. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  188. Talpaert V, Sobh I, Kiran BR, Mannion P, Yogamani S, El-Sallab A, Perez P (2019) Exploring applications of deep reinforcement learning for real-world autonomous driving systems. arXiv:1901.01536

  189. Tassa Y, Doron Y, Muldal A, Erez T, Li Y, de Las Casas D, Budden D, Abdolmaleki A, Merel J, Lefrancq A et al (2018) Deepmind control suite. arXiv:1801.00690

  190. Tassa Y, Tunyasuvunakool S, Muldal A, Doron Y, Liu S, Bohez S, Merel J, Erez T, Lillicrap T, Heess N. dm\_control: software and tasks for continuous control. arXiv:2006.12983

  191. Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(7):1633–1685

  192. Tekin C, Liu M (2011) Online learning in opportunistic spectrum access: a restless bandit approach. In: Proceedings IEEE INFOCOM, pp 2462–2470. IEEE

  193. Tetko IV, Sushko Y, Novotarskyi S, Patiny L, Kondratov I, Petrenko AE, Charochkina L, Asiri AM (2014) How accurately can we predict the melting points of drug-like compounds? J Chem Inf Model 54(12):3320–3329

    Article  Google Scholar 

  194. Thrun SB (1992) Efficient exploration in reinforcement learning. In: Technical Report, CMU-CS-92-102, Computer Science Department, Carnegie Mellon University

  195. Traue A, Book G, Kirchgässner W, Wallscheid O (2019) Towards a reinforcement learning environment toolbox for intelligent electric motor control. In: IEEE Transactions on Neural Networks and Learning Systems

  196. Trnsys (2017) Transient system simulation tool’s webplatform. http://www.trnsys.com/. Accessed 02 June 2019

  197. Uzkent B, Ermon S (2020) Learning when and where to zoom with deep reinforcement learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12345–12354

  198. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Thirtieth AAAI conference on artificial intelligence

  199. Vázquez-Canteli JR, Nagy Z (2019) Reinforcement learning for demand response: a review of algorithms and modeling techniques. Appl Energy 235:1072–1089

    Article  Google Scholar 

  200. Veeramsetty V, Singal G, Badal T (2020) Coinnet: platform independent application to recognize Indian currency notes using deep learning techniques. Multimed Tools Appl 79(31–32):22569–22594

    Article  Google Scholar 

  201. Verma S, Nair HS, Agarwal G, Dhar J, Shukla A (2020) Deep reinforcement learning for single-shot diagnosis and adaptation in damaged robots. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp 82–89

  202. Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, Küttler H, Agapiou J, Schrittwieser J et al (2017) Starcraft ii: a new challenge for reinforcement learning. arXiv:1708.04782

  203. Walraven E (2020) Solvepomdp. https://www.erwinwalraven.nl/solvepomdp/. Accessed 16 June 2020

  204. Wan M, Gangwani T, Peng J (2020) Mutual information based knowledge transfer under state-action dimension mismatch. arXiv:2006.07041

  205. Wang H-M, Chen B, Kuo J-W, Cheng S-S (2005) Matbn: a mandarin Chinese broadcast news corpus. Int J Comput Linguist Chin Lang Process 10(2). Special issue on annotated speech corpora 10(2):219–236

  206. Wang L, Zhang D, Gao L, Song J, Guo L, Shen HT (2018) Mathdqn: solving arithmetic word problems via deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  207. Wang S, Liu H, Gomes PH, Krishnamachari B (2018) Deep reinforcement learning for dynamic multichannel access in wireless networks. IEEE Trans Cognit Commun Netw 4(2):257–265

    Article  Google Scholar 

  208. Wang Y, Bryant SH, Cheng T, Wang J, Gindulyte A, Shoemaker BA, Thiessen PA, He S, Zhang J (2016) Pubchem bioassay: 2017 update. Nucl Acids Res 45(D1):D955–D963

    Article  Google Scholar 

  209. Wang Z, Li L, Yue X, Tian H, Cui S (2018) Handover control in wireless systems via asynchronous multiuser deep reinforcement learning. IEEE Internet of Things J 5(6):4296–4307

    Article  Google Scholar 

  210. Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay. arXiv:1611.01224

  211. Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581

  212. Wen T-H, Lee H-Y, Su P, Lee L-S (2013) Interactive spoken content retrieval by extended query model and continuous state space Markov decision process. In: IEEE international conference on acoustics, speech and signal processing, pp 8510–8514. IEEE

  213. Weng C, Yu D, Watanabe S, Juang B-HF (2014) Recurrent deep neural networks for robust speech recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5532–5536. IEEE

  214. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256

    Article  MATH  Google Scholar 

  215. Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. IEEE, Piscataway

    Book  Google Scholar 

  216. Woodsend K, Lapata M (2011) Learning to simplify sentences with quasi-synchronous grammar and integer programming. In: Proceedings of the conference on empirical methods in natural language processing, pp 409–420. Association for Computational Linguistics

  217. Wu Y-C, Lin T-H, Chen Y-D, Lee H-Y, Lee L-S (2016) Interactive spoken content retrieval by deep reinforcement learning. arXiv:1609.05234

  218. Wu Y, Lim J, Yang M-H (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2411–2418

  219. Wu Y, Lim J, Yang M-H (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848

    Article  Google Scholar 

  220. Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Advances in neural information processing systems, pp 5279–5288

  221. Wu Y, Hu B (2018) Learning to extract coherent summary via deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  222. Xia F, Zamir AR, He Z, Sax A, Malik J, Savarese S (2018) Gibson env: real-world perception for embodied agents. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9068–9079D

  223. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 532–539

  224. Xu W, Callison-Burch C, Napoles C (2015) Problems in current text simplification research: new data can help. Trans Assoc Comput Linguist 3:283–297

    Article  Google Scholar 

  225. Xu Z, Chen J, Tomizuka M (2020) Guided policy search model-based reinforcement learning for urban autonomous driving. arXiv:2005.03076

  226. Yan X, Shao C, Wei C, Wang Y (2018) Look-ahead insertion policy for a shared-taxi system based on reinforcement learning. IEEE Access 6:5716–5726

    Article  Google Scholar 

  227. Ye H, Li GY (2018) Deep reinforcement learning for resource allocation in v2v communications. In: IEEE international conference on communications (ICC), pp 1–6. IEEE

  228. Ye H, Li GY, Juang B-H (2017) Power of deep learning for channel estimation and signal detection in ofdm systems. IEEE Wirel Commun Lett 7(1):114–117

    Article  Google Scholar 

  229. Yu W, Tan J, Liu CK, Turk G (2017) Preparing for the unknown: learning a universal policy with online system identification. arXiv:1702.02453

  230. Yun S, Choi J, Yoo Y, Yun K, Choi JY (2018) Action-driven visual object tracking with deep reinforcement learning. IEEE Trans Neural Netw Learn Syst 29(6):2239–2252

    Article  MathSciNet  Google Scholar 

  231. Zadeh A, Zellers R, Pincus E, Morency L-P (2016) Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages. IEEE Intell Syst 31(6):82–88

    Article  Google Scholar 

  232. Zamora I, Lopez NG, Vilches VM, Cordero AH (2016) Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo. arXiv:1608.05742

  233. Zhang A, Ballas N, Pineau J (2018) A dissection of overfitting and generalization in continuous reinforcement learning. arXiv:1806.07937

  234. Zhang C, Vinyals O, Munos R, Bengio S (2018) A study on overfitting in deep reinforcement learning. arXiv:1804.06893

  235. Zhang F, Leitner J, Milford M, Corke P (2016) Modular deep q networks for sim-to-real transfer of visuo-motor policies. arXiv:1610.06781

  236. Zhang J, Tai L, Yun P, Xiong Y, Liu M, Boedecker J, Burgard W (2019) Vr-goggles for robots: real-to-sim domain adaptation for visual control. IEEE Robot Autom Lett 4(2):1148–1155

    Article  Google Scholar 

  237. Zhang K, Chao W-L, Sha F, Grauman K (2016) Summary transfer: exemplar-based subset selection for video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1059–1067

  238. Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: European conference on computer vision, pp 766–782. Springer

  239. Zhang L, Tan J, Liang Y-C, Feng G, Niyato D (2019) Deep reinforcement learning based modulation and coding scheme selection in cognitive heterogeneous networks. IEEE Trans Wirel Commun 18(6):3281–3294

    Article  Google Scholar 

  240. Zhang X, Lapata M (2017) Sentence simplification with deep reinforcement learning. arXiv:1703.10931

  241. Zhao B, Li X, Lu X (2018) Hsa-rnn: hierarchical structure-adaptive rnn for video summarization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7405–7414

  242. Zhao P, Wang Y, Chang N, Zhu Q, Lin X (2018) A deep reinforcement learning framework for optimizing fuel economy of hybrid electric vehicles. In: 2018 23rd Asia and South Pacific design automation conference (ASP-DAC), pp 196–202. IEEE

  243. Zhao Q, Krishnamachari B, Liu K (2008) On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance. IEEE Trans Wirel Commun 7(12):5431–5440

    Article  Google Scholar 

  244. Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li Z (2018) Drn: a deep reinforcement learning framework for news recommendation. In: Proceedings of the World Wide Web conference on World Wide Web, pp 167–176. International World Wide Web conferences steering committee

  245. Zheng L, Yang J, Cai H, Zhou M, Zhang W, Wang J, Yu Y (2018) Magent: a many-agent reinforcement learning platform for artificial collective intelligence. In: Thirty-second AAAI conference on artificial intelligence

  246. Zhong Z, Yang Z, Feng W, Wei W, Yangyang H, Liu C-L (2019) Decision controller for object tracking with deep reinforcement learning. IEEE Access 7:28069–28079

    Article  Google Scholar 

  247. Zhou K, Qiao Y, Xiang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Thirty-second AAAI conference on artificial intelligence

  248. Zhou Z, Li X, Zare RN (2017) Optimizing chemical reactions with deep reinforcement learning. ACS Cent Sci 3(12):1337–1344

    Article  Google Scholar 

  249. Zhu Z, Bernhard D, Gurevych I (2010) A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd international conference on computational linguistics, pp 1353–1361. Association for Computational Linguistics

Download references

Funding

Not Applicable.

Author information

Authors and Affiliations

Authors

Contributions

Idea for the article: SG, GS, DG. Literature search: SG. Initial draft Prepared by: SG. Critical revisions done by: GS, DG. Contribution in writing future scope and challenges: GS, DG, SG.

Corresponding author

Correspondence to Gaurav Singal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 5 defines key MDP parameters that were considered in diversified applications. This table gives insight into the state, and action that were considered in the literature. This table also signifies that many problems can be solved by considering binary reward function.

Table 5 DRL key MDP parameters for diversified applications

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Singal, G. & Garg, D. Deep Reinforcement Learning Techniques in Diversified Domains: A Survey. Arch Computat Methods Eng 28, 4715–4754 (2021). https://doi.org/10.1007/s11831-021-09552-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11831-021-09552-3

Navigation