ABSTRACT
Reinforcement Learning (RL) is a powerful technology to solve decisionmaking problems such as robotics control. Modern RL algorithms, i.e., Deep Q-Learning, are based on costly and resource hungry deep neural networks. This motivates us to deploy alternative models for powering RL agents on edge devices. Recently, brain-inspired Hyper-Dimensional Computing (HDC) has been introduced as a promising solution for lightweight and efficient machine learning, particularly for classification.
In this work, we develop a novel platform capable of real-time hyperdimensional reinforcement learning. Our heterogeneous CPU-FPGA platform, called DARL, maximizes FPGA's computing capabilities by applying hardware optimizations to hyperdimensional computing's critical operations, including hardware-friendly encoder IP, the hypervector chunk fragmentation, and the delayed model update. Aside from hardware innovation, we also extend the platform to basic single-agent RL to support multi-agents distributed learning. We evaluate the effectiveness of our approach on OpenAI Gym tasks. Our results show that the FPGA platform provides on average 20× speedup compared to current state-of-the-art hyperdimensional RL methods running on Intel Xeon 6226 CPU. In addition, DARL provides around 4.8× faster and 4.2× higher energy efficiency compared to the state-of-the-art RL accelerator while ensuring a better or comparable quality of learning.
- M. Botvinick, S. Ritter, J. X. Wang, Z. Kurth-Nelson, C. Blundell, and D. Hassabis, "Reinforcement learning, fast and slow," Trends in cognitive sciences, vol. 23, no. 5, pp. 408--422, 2019.Google ScholarCross Ref
- P. Henderson et al., "Deep reinforcement learning that matters," in Proceedings of the AAAI conference on artificial intelligence, 2018.Google Scholar
- N. Jay et al., "A deep reinforcement learning perspective on internet congestion control," in International Conference on Machine Learning, PMLR, 2019.Google Scholar
- M. Imani et al., "Control of gene regulatory networks using bayesian inverse reinforcement learning," IEEE/ACM transactions on computational biology and bioinformatics, vol. 16, no. 4, pp. 1250--1261, 2018.Google Scholar
- "Openai gym cartpole-v1." https://gym.openai.com/envs/CartPole-v1/.Google Scholar
- "Openai gym lunarlander." https://gym.openai.com/envs/LunarLander-v2/.Google Scholar
- Y. He et al., "Software-defined networks with mobile edge computing and caching for smart cities: A big data deep reinforcement learning approach," IEEE Communications Magazine, vol. 55, no. 12, pp. 31--37, 2017.Google ScholarDigital Library
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.Google Scholar
- C. J. Watkins et al., "Q-learning," Machine learning, vol. 8, no. 3--4, pp. 279--292, 1992.Google Scholar
- V. François-Lavet et al., "An introduction to deep reinforcement learning," arXiv preprint arXiv:1811.12560, 2018.Google Scholar
- P. Kanerva, "Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors," Cognitive computation, vol. 1, no. 2, pp. 139--159, 2009.Google ScholarCross Ref
- L. Ge and K. K. Parhi, "Classification using hyperdimensional computing: A review," IEEE Circuits and Systems Magazine, vol. 20, no. 2, pp. 30--47, 2020.Google ScholarCross Ref
- P. Poduval, A. Zakeri, F. Imani, H. Alimohamadi, and M. Imani, "Graphd: Graph-based hyperdimensional memorization for brain-like cognitive learning," Frontiers in Neuroscience, p. 5, 2022.Google Scholar
- Y. Ni, D. Abraham, M. Issa, Y. Kim, P. Mercati, and M. Imani, "Qhd: A brain-inspired hyperdimensional reinforcement learning algorithm," arXiv preprint arXiv:2205.06978, 2022.Google Scholar
- A. Hernandez-Cane, N. Matsumoto, E. Ping, and M. Imani, "Onlinehd: Robust, efficient, and single-pass online learning using hyperdimensional system," in 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 56--61, IEEE, 2021.Google Scholar
- Z. Zou, Y. Kim, F. Imani, H. Alimohamadi, R. Cammarota, and M. Imani, "Scalable edge-based hyperdimensional learning system with brain-like neural adaptation," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1--15, 2021.Google Scholar
- M. Imani, Z. Zou, S. Bosch, S. A. Rao, S. Salamat, V. Kumar, Y. Kim, and T. Rosing, "Revisiting hyperdimensional learning for fpga and low-power architectures," in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 221--234, IEEE, 2021.Google Scholar
- M. Imani, A. Rahimi, D. Kong, T. Rosing, and J. M. Rabaey, "Exploring hyperdimensional associative memory," in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 445--456, IEEE, 2017.Google Scholar
- A. Samajdar, P. Mannan, K. Garg, and T. Krishna, "Genesys: Enabling continuous learning through neural network evolution in hardware," in 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 855--866, IEEE, 2018.Google Scholar
- H. Cho, P. Oh, J. Park, W. Jung, and J. Lee, "Fa3c: Fpga-accelerated deep reinforcement learning," in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 499--513, 2019.Google Scholar
- Y. Meng, S. Kuppannagari, and V. Prasanna, "Accelerating proximal policy optimization on cpu-fpga heterogeneous platforms," in 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 19--27, IEEE, 2020.Google Scholar
- Y. Wang, M. Wang, B. Li, H. Li, and X. Li, "A many-core accelerator design for on-chip deep reinforcement learning," in Proceedings of the 39th International Conference on Computer-Aided Design, pp. 1--7, 2020.Google Scholar
- J. Yang, S. Hong, and J.-Y. Kim, "Fixar: A fixed-point deep reinforcement learning platform with quantization-aware training and adaptive parallelism," in 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 259--264, IEEE, 2021.Google Scholar
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "Openai gym," arXiv preprint arXiv:1606.01540, 2016.Google Scholar
- H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proceedings of the AAAI conference on artificial intelligence, vol. 30, 2016.Google Scholar
- M. Imani, S. Pampana, S. Gupta, M. Zhou, Y. Kim, and T. Rosing, "Dual: Acceleration of clustering algorithms using digital-based processing in-memory," in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 356--371, IEEE, 2020.Google Scholar
- S. Salamat, M. Imani, B. Khaleghi, and T. Rosing, "F5-hd: Fast flexible fpga-based framework for refreshing hyperdimensional computing," in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 53--62, 2019.Google Scholar
- H. Y. Ong, K. Chavez, and A. Hong, "Distributed deep q-learning," arXiv preprint arXiv:1508.04186, 2015.Google Scholar
- A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, et al., "Massively parallel methods for deep reinforcement learning," arXiv preprint arXiv:1507.04296, 2015.Google Scholar
- S. Rashidi, S. Sridharan, S. Srinivasan, and T. Krishna, "Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms," in 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 81--92, IEEE, 2020.Google Scholar
- S. Jeaugey, "Nccl 2.0," in GPU Technology Conference (GTC), vol. 2, 2017.Google Scholar
- P. Lotfi-Kamran, B. Grot, and B. Falsafi, "Noc-out: Microarchitecting a scale-out processor," in 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 177--187, IEEE, 2012.Google Scholar
- G. S. Malik and N. Kapre, "Enhancing butterfly fat tree nocs for fpgas with light-weight flow control," in 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 154--162, IEEE, 2019.Google Scholar
- A. Rahimi, P. Kanerva, and J. M. Rabaey, "A robust and energy-efficient classifier using brain-inspired hyperdimensional computing," in Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pp. 64--69, 2016.Google Scholar
- H. Watanabe, M. Tsukada, and H. Matsutani, "An fpga-based on-device reinforcement learning approach using online sequential learning," in 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 96--103, IEEE, 2021.Google Scholar
- M. Rothmann and M. Porrmann, "A survey of domain-specific architectures for reinforcement learning," IEEE Access, vol. 10, pp. 13753--13767, 2022.Google ScholarCross Ref
- L. M. Da Silva, M. F. Torquato, and M. A. Fernandes, "Parallel implementation of reinforcement learning q-learning technique for fpga," IEEE Access, vol. 7, pp. 2782--2798, 2018.Google ScholarCross Ref
- S. Spano, G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino, M. Matta, A. Nannarelli, and M. Re, "An efficient hardware implementation of reinforcement learning: The q-learning algorithm," Ieee Access, vol. 7, pp. 186340--186351, 2019.Google ScholarCross Ref
- R. Rajat, Y. Meng, S. Kuppannagari, A. Srivastava, V. Prasanna, and R. Kannan, "Qtaccel: A generic fpga based design for q-table based reinforcement learning accelerators," in Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 323--323, 2020.Google Scholar
- J. Su, J. Liu, D. B. Thomas, and P. Y. Cheung, "Neural network based reinforcement learning acceleration on fpga platforms," ACM SIGARCH Computer Architecture News, vol. 44, no. 4, pp. 68--73, 2017.Google ScholarDigital Library
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.Google Scholar
- L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al., "Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures," in International Conference on Machine Learning, pp. 1407--1416, PMLR, 2018.Google Scholar
- A. Mitrokhin, P. Sutor, C. Fermüller, and Y. Aloimonos, "Learning sensorimotor control with neuromorphic sensors: Toward hyperdimensional active perception," Science Robotics, 2019.Google Scholar
- P. Poduval, Z. Zou, X. Yin, E. Sadredini, and M. Imani, "Cognitive correlative encoding for genome sequence matching in hyperdimensional system," in 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 781--786, IEEE, 2021.Google Scholar
- Z. Zou, H. Chen, P. Poduval, Y. Kim, M. Imani, E. Sadredini, R. Cammarota, and M. Imani, "Biohd: an efficient genome sequence search platform using hyperdimensional memorization," in Proceedings of the 49th Annual International Symposium on Computer Architecture, pp. 656--669, 2022.Google Scholar
- M. Imani, D. Kong, A. Rahimi, and T. Rosing, "Voicehd: Hyperdimensional computing for efficient speech recognition," in 2017 IEEE international conference on rebooting computing (ICRC), pp. 1--8, IEEE, 2017.Google Scholar
- A. Hernández-Cano, C. Zhuo, X. Yin, and M. Imani, "Reghd: Robust and efficient regression in hyper-dimensional learning system," in 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 7--12, IEEE, 2021.Google Scholar
Index Terms
- DARL: Distributed Reconfigurable Accelerator for Hyperdimensional Reinforcement Learning
Recommendations
DARL: distance-aware uncertainty estimation for offline reinforcement learning
AAAI'23/IAAI'23/EAAI'23: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial IntelligenceTo facilitate offline reinforcement learning, uncertainty estimation is commonly used to detect out-of-distribution data. By inspecting, we show that current explicit uncertainty estimators such as Monte Carlo Dropout and model ensemble are not competent ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Comments