skip to main content
10.1145/3508352.3549437acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

DARL: Distributed Reconfigurable Accelerator for Hyperdimensional Reinforcement Learning

Published:22 December 2022Publication History

ABSTRACT

Reinforcement Learning (RL) is a powerful technology to solve decisionmaking problems such as robotics control. Modern RL algorithms, i.e., Deep Q-Learning, are based on costly and resource hungry deep neural networks. This motivates us to deploy alternative models for powering RL agents on edge devices. Recently, brain-inspired Hyper-Dimensional Computing (HDC) has been introduced as a promising solution for lightweight and efficient machine learning, particularly for classification.

In this work, we develop a novel platform capable of real-time hyperdimensional reinforcement learning. Our heterogeneous CPU-FPGA platform, called DARL, maximizes FPGA's computing capabilities by applying hardware optimizations to hyperdimensional computing's critical operations, including hardware-friendly encoder IP, the hypervector chunk fragmentation, and the delayed model update. Aside from hardware innovation, we also extend the platform to basic single-agent RL to support multi-agents distributed learning. We evaluate the effectiveness of our approach on OpenAI Gym tasks. Our results show that the FPGA platform provides on average 20× speedup compared to current state-of-the-art hyperdimensional RL methods running on Intel Xeon 6226 CPU. In addition, DARL provides around 4.8× faster and 4.2× higher energy efficiency compared to the state-of-the-art RL accelerator while ensuring a better or comparable quality of learning.

References

  1. M. Botvinick, S. Ritter, J. X. Wang, Z. Kurth-Nelson, C. Blundell, and D. Hassabis, "Reinforcement learning, fast and slow," Trends in cognitive sciences, vol. 23, no. 5, pp. 408--422, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  2. P. Henderson et al., "Deep reinforcement learning that matters," in Proceedings of the AAAI conference on artificial intelligence, 2018.Google ScholarGoogle Scholar
  3. N. Jay et al., "A deep reinforcement learning perspective on internet congestion control," in International Conference on Machine Learning, PMLR, 2019.Google ScholarGoogle Scholar
  4. M. Imani et al., "Control of gene regulatory networks using bayesian inverse reinforcement learning," IEEE/ACM transactions on computational biology and bioinformatics, vol. 16, no. 4, pp. 1250--1261, 2018.Google ScholarGoogle Scholar
  5. "Openai gym cartpole-v1." https://gym.openai.com/envs/CartPole-v1/.Google ScholarGoogle Scholar
  6. "Openai gym lunarlander." https://gym.openai.com/envs/LunarLander-v2/.Google ScholarGoogle Scholar
  7. Y. He et al., "Software-defined networks with mobile edge computing and caching for smart cities: A big data deep reinforcement learning approach," IEEE Communications Magazine, vol. 55, no. 12, pp. 31--37, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.Google ScholarGoogle Scholar
  9. C. J. Watkins et al., "Q-learning," Machine learning, vol. 8, no. 3--4, pp. 279--292, 1992.Google ScholarGoogle Scholar
  10. V. François-Lavet et al., "An introduction to deep reinforcement learning," arXiv preprint arXiv:1811.12560, 2018.Google ScholarGoogle Scholar
  11. P. Kanerva, "Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors," Cognitive computation, vol. 1, no. 2, pp. 139--159, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. L. Ge and K. K. Parhi, "Classification using hyperdimensional computing: A review," IEEE Circuits and Systems Magazine, vol. 20, no. 2, pp. 30--47, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  13. P. Poduval, A. Zakeri, F. Imani, H. Alimohamadi, and M. Imani, "Graphd: Graph-based hyperdimensional memorization for brain-like cognitive learning," Frontiers in Neuroscience, p. 5, 2022.Google ScholarGoogle Scholar
  14. Y. Ni, D. Abraham, M. Issa, Y. Kim, P. Mercati, and M. Imani, "Qhd: A brain-inspired hyperdimensional reinforcement learning algorithm," arXiv preprint arXiv:2205.06978, 2022.Google ScholarGoogle Scholar
  15. A. Hernandez-Cane, N. Matsumoto, E. Ping, and M. Imani, "Onlinehd: Robust, efficient, and single-pass online learning using hyperdimensional system," in 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 56--61, IEEE, 2021.Google ScholarGoogle Scholar
  16. Z. Zou, Y. Kim, F. Imani, H. Alimohamadi, R. Cammarota, and M. Imani, "Scalable edge-based hyperdimensional learning system with brain-like neural adaptation," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1--15, 2021.Google ScholarGoogle Scholar
  17. M. Imani, Z. Zou, S. Bosch, S. A. Rao, S. Salamat, V. Kumar, Y. Kim, and T. Rosing, "Revisiting hyperdimensional learning for fpga and low-power architectures," in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 221--234, IEEE, 2021.Google ScholarGoogle Scholar
  18. M. Imani, A. Rahimi, D. Kong, T. Rosing, and J. M. Rabaey, "Exploring hyperdimensional associative memory," in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 445--456, IEEE, 2017.Google ScholarGoogle Scholar
  19. A. Samajdar, P. Mannan, K. Garg, and T. Krishna, "Genesys: Enabling continuous learning through neural network evolution in hardware," in 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 855--866, IEEE, 2018.Google ScholarGoogle Scholar
  20. H. Cho, P. Oh, J. Park, W. Jung, and J. Lee, "Fa3c: Fpga-accelerated deep reinforcement learning," in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 499--513, 2019.Google ScholarGoogle Scholar
  21. Y. Meng, S. Kuppannagari, and V. Prasanna, "Accelerating proximal policy optimization on cpu-fpga heterogeneous platforms," in 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 19--27, IEEE, 2020.Google ScholarGoogle Scholar
  22. Y. Wang, M. Wang, B. Li, H. Li, and X. Li, "A many-core accelerator design for on-chip deep reinforcement learning," in Proceedings of the 39th International Conference on Computer-Aided Design, pp. 1--7, 2020.Google ScholarGoogle Scholar
  23. J. Yang, S. Hong, and J.-Y. Kim, "Fixar: A fixed-point deep reinforcement learning platform with quantization-aware training and adaptive parallelism," in 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 259--264, IEEE, 2021.Google ScholarGoogle Scholar
  24. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "Openai gym," arXiv preprint arXiv:1606.01540, 2016.Google ScholarGoogle Scholar
  25. H. Van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proceedings of the AAAI conference on artificial intelligence, vol. 30, 2016.Google ScholarGoogle Scholar
  26. M. Imani, S. Pampana, S. Gupta, M. Zhou, Y. Kim, and T. Rosing, "Dual: Acceleration of clustering algorithms using digital-based processing in-memory," in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 356--371, IEEE, 2020.Google ScholarGoogle Scholar
  27. S. Salamat, M. Imani, B. Khaleghi, and T. Rosing, "F5-hd: Fast flexible fpga-based framework for refreshing hyperdimensional computing," in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 53--62, 2019.Google ScholarGoogle Scholar
  28. H. Y. Ong, K. Chavez, and A. Hong, "Distributed deep q-learning," arXiv preprint arXiv:1508.04186, 2015.Google ScholarGoogle Scholar
  29. A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, et al., "Massively parallel methods for deep reinforcement learning," arXiv preprint arXiv:1507.04296, 2015.Google ScholarGoogle Scholar
  30. S. Rashidi, S. Sridharan, S. Srinivasan, and T. Krishna, "Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms," in 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 81--92, IEEE, 2020.Google ScholarGoogle Scholar
  31. S. Jeaugey, "Nccl 2.0," in GPU Technology Conference (GTC), vol. 2, 2017.Google ScholarGoogle Scholar
  32. P. Lotfi-Kamran, B. Grot, and B. Falsafi, "Noc-out: Microarchitecting a scale-out processor," in 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 177--187, IEEE, 2012.Google ScholarGoogle Scholar
  33. G. S. Malik and N. Kapre, "Enhancing butterfly fat tree nocs for fpgas with light-weight flow control," in 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 154--162, IEEE, 2019.Google ScholarGoogle Scholar
  34. A. Rahimi, P. Kanerva, and J. M. Rabaey, "A robust and energy-efficient classifier using brain-inspired hyperdimensional computing," in Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pp. 64--69, 2016.Google ScholarGoogle Scholar
  35. H. Watanabe, M. Tsukada, and H. Matsutani, "An fpga-based on-device reinforcement learning approach using online sequential learning," in 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 96--103, IEEE, 2021.Google ScholarGoogle Scholar
  36. M. Rothmann and M. Porrmann, "A survey of domain-specific architectures for reinforcement learning," IEEE Access, vol. 10, pp. 13753--13767, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  37. L. M. Da Silva, M. F. Torquato, and M. A. Fernandes, "Parallel implementation of reinforcement learning q-learning technique for fpga," IEEE Access, vol. 7, pp. 2782--2798, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  38. S. Spano, G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino, M. Matta, A. Nannarelli, and M. Re, "An efficient hardware implementation of reinforcement learning: The q-learning algorithm," Ieee Access, vol. 7, pp. 186340--186351, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  39. R. Rajat, Y. Meng, S. Kuppannagari, A. Srivastava, V. Prasanna, and R. Kannan, "Qtaccel: A generic fpga based design for q-table based reinforcement learning accelerators," in Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 323--323, 2020.Google ScholarGoogle Scholar
  40. J. Su, J. Liu, D. B. Thomas, and P. Y. Cheung, "Neural network based reinforcement learning acceleration on fpga platforms," ACM SIGARCH Computer Architecture News, vol. 44, no. 4, pp. 68--73, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.Google ScholarGoogle Scholar
  42. L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al., "Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures," in International Conference on Machine Learning, pp. 1407--1416, PMLR, 2018.Google ScholarGoogle Scholar
  43. A. Mitrokhin, P. Sutor, C. Fermüller, and Y. Aloimonos, "Learning sensorimotor control with neuromorphic sensors: Toward hyperdimensional active perception," Science Robotics, 2019.Google ScholarGoogle Scholar
  44. P. Poduval, Z. Zou, X. Yin, E. Sadredini, and M. Imani, "Cognitive correlative encoding for genome sequence matching in hyperdimensional system," in 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 781--786, IEEE, 2021.Google ScholarGoogle Scholar
  45. Z. Zou, H. Chen, P. Poduval, Y. Kim, M. Imani, E. Sadredini, R. Cammarota, and M. Imani, "Biohd: an efficient genome sequence search platform using hyperdimensional memorization," in Proceedings of the 49th Annual International Symposium on Computer Architecture, pp. 656--669, 2022.Google ScholarGoogle Scholar
  46. M. Imani, D. Kong, A. Rahimi, and T. Rosing, "Voicehd: Hyperdimensional computing for efficient speech recognition," in 2017 IEEE international conference on rebooting computing (ICRC), pp. 1--8, IEEE, 2017.Google ScholarGoogle Scholar
  47. A. Hernández-Cano, C. Zhuo, X. Yin, and M. Imani, "Reghd: Robust and efficient regression in hyper-dimensional learning system," in 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 7--12, IEEE, 2021.Google ScholarGoogle Scholar

Index Terms

  1. DARL: Distributed Reconfigurable Accelerator for Hyperdimensional Reinforcement Learning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
            October 2022
            1467 pages
            ISBN:9781450392174
            DOI:10.1145/3508352

            Copyright © 2022 Owner/Author

            This work is licensed under a Creative Commons Attribution International 4.0 License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 December 2022

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate457of1,762submissions,26%

            Upcoming Conference

            ICCAD '24
            IEEE/ACM International Conference on Computer-Aided Design
            October 27 - 31, 2024
            New York , NY , USA
          • Article Metrics

            • Downloads (Last 12 months)155
            • Downloads (Last 6 weeks)13

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader