skip to main content
10.1145/3545008.3545067acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Public Access

Accelerating Random Forest Classification on GPU and FPGA

Authors Info & Claims
Published:13 January 2023Publication History

ABSTRACT

Random Forests (RFs) are a commonly used machine learning method for classification and regression tasks spanning a variety of application domains, including bioinformatics, business analytics, and software optimization. While prior work has focused primarily on improving performance of the training of RFs, many applications, such as malware identification, cancer prediction, and banking fraud detection, require fast RF classification.

In this work, we accelerate RF classification on GPU and FPGA. In order to provide efficient support for large datasets, we propose a hierarchical memory layout suitable to the GPU/FPGA memory hierarchy. We design three RF classification code variants based on that layout, and we investigate GPU- and FPGA-specific considerations for these kernels. Our experimental evaluation, performed on an Nvidia Xp GPU and on a Xilinx Alveo U250 FPGA accelerator card using publicly available datasets on the scale of millions of samples and tens of features, covers various aspects. First, we evaluate the performance benefits of our hierarchical data structure over the standard compressed sparse row (CSR) format. Second, we compare our GPU implementation with cuML, a machine learning library targeting Nvidia GPUs. Third, we explore the performance/accuracy tradeoff resulting from the use of different tree depths in the RF. Finally, we perform a comparative performance analysis of our GPU and FPGA implementations. Our evaluation shows that, while reporting the best performance on GPU, our code variants outperform the CSR baseline both on GPU and FPGA. For high accuracy targets, our GPU implementation yields a 5-9 × speedup over CSR, and up to a 2 × speedup over Nvidia’s cuML library.

Skip Supplemental Material Section

Supplemental Material

References

  1. P. Baldi, P. Sadowski, and D. Whiteson. 2014. Searching for exotic particles in high-energy physics with deep learning. Nature Communications 5, 1 (jul 2014). https://doi.org/10.1038/ncomms5308Google ScholarGoogle ScholarCross RefCross Ref
  2. Paul E Black 2020. Dads: The on-line dictionary of algorithms and data structures. NIST: Gaithersburg, MD, USA(2020).Google ScholarGoogle Scholar
  3. Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. 233–244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chuan Cheng and Christos-Savvas Bouganis. 2013. Accelerating Random Forest training process using FPGA. In 2013 23rd International Conference on Field programmable Logic and Applications.Google ScholarGoogle ScholarCross RefCross Ref
  5. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  6. Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. 2013. General Transformations for GPU Execution of Tree Traversals. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Trevor Hastie and Robert Tibshirani. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2 ed.). Springer.Google ScholarGoogle Scholar
  8. Tero Karras. 2012. Thinking Parallel, Part II: Tree Traversal on the GPU. https://developer.nvidia.com/blog/thinking-parallel-part-ii-tree-traversal-gpu/Google ScholarGoogle Scholar
  9. Vinod Kathail. 2020. Xilinx Vitis Unified Software Platform. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Stephen Neuendorffer and Lesley Shannon (Eds.).Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Xiang Lin, R.D. Shawn Blanton, and Donald E. Thomas. 2017. Random Forest Architectures on FPGA for Multiple Applications. In Proceedings of the on Great Lakes Symposium on VLSI 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Diego Marron, Albert Bifet, and Gianmarco De Francisci Morales. 2014. Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams. In Proceedings of the Twenty-First European Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xinxin Mei and Xiaowen Chu. 2017. Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2017), 72–86. https://doi.org/10.1109/TPDS.2016.2549523Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Oyku Melikoglu, Oguz Ergin, Behzad Salami, Julian Pavon, Osman Unsal, and Adrian Cristal. 2019. A Novel FPGA-Based High Throughput Accelerator For Binary Search Trees. https://doi.org/10.48550/ARXIV.1912.01556Google ScholarGoogle Scholar
  14. Hiroki Nakahara, Akira Jinguji, Tomonori Fujii, and Simpei Sato. 2016. An acceleration of a random forest classification using Altera SDK for OpenCL. In 2016 International Conference on Field-Programmable Technology (FPT). 289–292. https://doi.org/10.1109/FPT.2016.7929555Google ScholarGoogle ScholarCross RefCross Ref
  15. Oleksandr Pavlyk and Olivier Grisel. 2020. Accelerate Your scikit-learn Applications. (2020). https://medium.com/intel-analytics-software/accelerate-your-scikit-learn-applications-a06cacf44912Google ScholarGoogle Scholar
  16. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sebastian Raschka, Joshua Patterson, and Corey Nolet. 2020. Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. arXiv preprint arXiv:2002.04803(2020).Google ScholarGoogle Scholar
  18. J.P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. Hennessy. 1995. Load Balancing and Data Locality in Adaptive Hierarchical N-Body Methods: Barnes-Hut, Fast Multipole, and Radiosity. J. Parallel and Distrib. Comput. 27, 2 (1995), 118–141. https://doi.org/10.1006/jpdc.1995.1077Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Brian Van Essen, Chris Macaraeg, Maya Gokhale, and Ryan Prenger. 2012. Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zeyi Wen, Hanfeng Liu, Jiashuai Shi, Qinbin Li, Bingsheng He, and Jian Chen. 2020. ThunderGBM: Fast GBDTs and Random Forests on GPUs.J. Mach. Learn. Res. 21, 108 (2020), 1–5.Google ScholarGoogle Scholar
  21. Hancheng Wu and Michela Becchi. 2017. An Analytical Study of Recursive Tree Traversal Patterns on Multi- and Many-Core Platforms. In 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS). 586–595. https://doi.org/10.1109/ICPADS.2017.00082Google ScholarGoogle Scholar

Index Terms

  1. Accelerating Random Forest Classification on GPU and FPGA

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
      August 2022
      976 pages
      ISBN:9781450397339
      DOI:10.1145/3545008

      Copyright © 2022 ACM

      © 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 January 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate91of313submissions,29%
    • Article Metrics

      • Downloads (Last 12 months)551
      • Downloads (Last 6 weeks)90

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format