ABSTRACT
The lattice Boltzmann method is a highly scalable Navier-Stokes solver that has been applied to flow problems in a wide array of domains. However, the method is bandwidth-bound on modern GPU accelerators and has a large memory footprint. In this paper, we present new 2D and 3D GPU implementations of two different regularized lattice Boltzmann methods, which are not only able to achieve an acceleration of ∼ 1.4 × w.r.t. reference lattice Boltzmann implementations but also reduce the memory requirements by up to 35% and 47% in 2D and 3D simulations respectively. These new approaches are evaluated on NVIDIA and AMD GPU architectures.
- Gérard Dethier, Pierre-Arnoul de Marneffe, and Pierre Marchot. 2011. Lattice Boltzmann Simulation Code Optimization Based on Constant-time Circular Array Shifting. Procedia Computer Science 4 (2011), 1004–1013.Google ScholarCross Ref
- Marco A Ferrari, Waine B de Oliveira Jr, Alan Lugarini, Admilson T Franco, and Luiz A Hegele Jr. 2023. A graphic processing unit implementation for the moment representation of the lattice Boltzmann method. International Journal for Numerical Methods in Fluids (2023).Google ScholarCross Ref
- John Gounley, Madhurima Vardhan, Erik W. Draeger, Pedro Valero-Lara, Shirley V. Moore, and Amanda Randles. 2022. Propagation Pattern for Moment Representation of the Lattice Boltzmann Method. IEEE Trans. Parallel Distributed Syst. 33, 3 (2022), 642–653. https://doi.org/10.1109/TPDS.2021.3098456Google ScholarCross Ref
- Gregory Herschlag, Seyong Lee, Jeffrey S Vetter, and Amanda Randles. 2021. Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm. IEEE Transactions on Parallel and Distributed Systems 32, 10 (2021), 2400–2414.Google ScholarCross Ref
- Jonas Latt and Bastien Chopard. 2006. Lattice Boltzmann method with regularized pre-collision distribution functions. Math. Comput. Simul. 72, 2-6 (2006), 165–168.Google ScholarDigital Library
- Jonas Latt, Bastien Chopard, Orestis Malaspinas, Michel Deville, and Andreas Michler. 2008. Straight velocity boundaries in the lattice Boltzmann method. Phys. Rev. E 77, 5 (2008), 056703.Google ScholarCross Ref
- Jonas Latt, Christophe Coreixas, and Joël Beny. 2021. Cross-platform programming model for many-core lattice Boltzmann simulations. PLOS One 16, 4 (2021), e0250306.Google ScholarCross Ref
- Orestis Malaspinas. 2015. Increasing stability and accuracy of the lattice Boltzmann scheme: recursivity and regularization. arXiv preprint arXiv:1505.06900 (2015).Google Scholar
- Christian Obrecht, Frédéric Kuznik, Bernard Tourancheau, and Jean-Jacques Roux. 2013. Multi-GPU implementation of the lattice Boltzmann method. Computers & Mathematics with Applications 65, 2 (2013), 252–261.Google ScholarDigital Library
- Yue-Hong Qian, Dominique d’Humières, and Pierre Lallemand. 1992. Lattice BGK models for Navier-Stokes equation. EPL (Europhysics Letters) 17, 6 (1992), 479.Google ScholarCross Ref
- Fredrik Robertsén, Jan Westerholm, and Keijo Mattila. 2017. Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver: Read aligned data layouts and asynchronous communication. The International Journal of High Performance Computing Applications 31, 3 (2017), 246–255.Google ScholarDigital Library
- Adriano Tiribocchi, Andrea Montessori, Giorgio Amati, Massimo Bernaschi, Fabio Bonaccorso, Sergio Orlandini, Sauro Succi, and Marco Lauricella. 2023. Lightweight lattice Boltzmann. The Journal of Chemical Physics 158, 10 (2023).Google ScholarCross Ref
- Jonas Tölke. 2010. Implementation of a Lattice Boltzmann kernel using the Compute Unified Device Architecture developed by nVIDIA. Computing and Visualization in Science 13, 1 (2010), 29.Google ScholarDigital Library
- Pedro Valero-Lara. 2016. Leveraging the Performance of LBM-HPC for Large Sizes on GPUs Using Ghost Cells. In Algorithms and Architectures for Parallel Processing - 16th International Conference, ICA3PP 2016, Granada, Spain, December 14-16, 2016, Proceedings(Lecture Notes in Computer Science, Vol. 10048). Springer, 417–430. https://doi.org/10.1007/978-3-319-49583-5_31Google ScholarCross Ref
- Pedro Valero-Lara. 2017. Reducing memory requirements for large size LBM simulations on GPUs. Concurrency and Computation: Practice and Experience 29, 24 (2017).Google Scholar
- Pedro Valero-Lara, Francisco D Igual, Manuel Prieto-Matías, Alfredo Pinelli, and Julien Favier. 2015. Accelerating fluid–solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures. Journal of Computational Science 10 (2015), 249–261.Google ScholarCross Ref
- Pedro Valero-Lara and Johan Jansson. 2015. Multi-domain Grid Refinement for Lattice-Boltzmann Simulations on Heterogeneous Platforms. In 18th IEEE International Conference on Computational Science and Engineering, CSE 2015, Porto, Portugal, October 21-23, 2015. IEEE Computer Society, 1–8. https://doi.org/10.1109/CSE.2015.9Google ScholarDigital Library
- Pedro Valero-Lara and Johan Jansson. 2015. A Non-uniform Staggered Cartesian Grid Approach for Lattice-boltzmann Method. In Proceedings of the International Conference on Computational Science, ICCS 2015, Computational Science at the Gates of Nature, Reykjavík, Iceland, 1-3 June, 2015, 2014(Procedia Computer Science, Vol. 51). Elsevier, 296–305. https://doi.org/10.1016/j.procs.2015.05.245Google ScholarDigital Library
- Pedro Valero-Lara and Johan Jansson. 2017. Heterogeneous CPU+GPU approaches for mesh refinement over Lattice-Boltzmann simulations. Concurr. Comput. Pract. Exp. 29, 7 (2017). https://doi.org/10.1002/cpe.3919Google ScholarCross Ref
- Madhurima Vardhan, John Gounley, Luiz A. Hegele, Erik W. Draeger, and Amanda Randles. 2019. Moment representation in the lattice Boltzmann method on massively parallel hardware. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 1.Google ScholarDigital Library
- Xiaoming Wei, Wei Li, Klaus Mueller, and Arie E Kaufman. 2004. The Lattice-Boltzmann method for simulating gaseous phenomena. IEEE Transactions on Visualization and Computer Graphics 10, 2 (2004), 164–176.Google ScholarDigital Library
- Gerhard Wellein, Thomas Zeiser, Georg Hager, and Stefan Donath. 2006. On the single processor performance of simple lattice Boltzmann kernels. Comput. Fluids 35, 8-9 (2006), 910–919.Google ScholarCross Ref
Index Terms
- Moment Representation of Regularized Lattice Boltzmann Methods on NVIDIA and AMD GPUs
Recommendations
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsThe graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an ...
Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On
High Performance ComputingAbstractEfficiently processing sparse matrices is a central and performance-critical part of many scientific simulation codes. Recognizing the adoption of manycore accelerators in HPC, we evaluate in this paper the performance of the currently best ...
Comments