Minimizing development costs for efficient many-core visualization using MCD3
Introduction
As supercomputers increasingly include many-core architectures, visualization software design must adapt to support these architectures. This is a significant challenge, since existing parallel visualization software, like ParaView [1] and VisIt [2], have primarily focused on MPI-only parallelism. Further, these efforts represent hundreds of person years of effort and contain hundreds of algorithms. As a result, new, many-core capable designs must achieve twin goals: efficient performance and small development time. This latter goal is particularly necessary not only because of the large number of visualization algorithms but also because of the large number of potential hardware architectures. The worst case would require an implementation for every possible pair of algorithm and architecture. Instead, a better scenario is to achieve “portable performance”, i.e., a single, hardware-agnostic implementation for each algorithm that works efficiently on all architectures.
Data-parallel primitives [3] (DPPs) have the potential to achieve the twin goals of efficient performance and small development time, but their application in the visualization space is non-trivial. In particular, the nature of scientific data (mesh-based data) incurs extra burden with DPPs, as most DPPs are designed to operate on arrays of data and are not concerned with issues such as which vertices lie within a cell. This has two unfortunate consequences for implementing visualization algorithms with DPPs. First, these complex data structures add an increased indexing burden on the developer, who has to maintain and follow links in the data. Second, the irregular data typical of visualization algorithms means that the products of a visualization algorithm do not always have a one-to-one relationship with the input. Some inputs contribute nothing to the output whereas other inputs generate multiple things, creating a “jagged” access pattern for the inputs and outputs of a visualization algorithm. As a result, using DPPs could cause more harm than good — visualization algorithm developers could spend more time addressing data model issues than they would gain from DPPs’ benefits.
With this paper, we introduce a design that augments DPPs with several constructs to shield visualization algorithm developers from potential pitfalls. Our design incorporates existing data management practices that separate memory layout from execution space, adding support for many data layouts and mesh types. We also incorporate the common practice of providing convenience routines for common operations, with our unique contribution being a selection of routines useful for scientific visualization algorithms, such as locating which cell contains a point or such as finding the minimum and maximum value for a field. Finally, and most importantly, we introduce a new construct, which we call meta-DPPs, which combine DPPs and data management to address issues such as following links in data and jagged access patterns. In all, we refer to our algorithm development environment as MCD3 — Meta-DPPs, Convenience routines, Data management, DPPs, and Devices.
The fundamental research questions of this paper are on the overall viability of the MCD3 approach for many-core visualization and on evaluating the twin goals of minimizing development costs while providing efficient execution times. These questions are explored by analyzing the VTK-m library [4], which is an open source effort that is using MCD3 for many-core visualization. In all, the contributions of this paper are:
- •
New meta-DPPs designed for common patterns for scientific visualization algorithms;
- •
Evaluation of developer costs by studying 57 visualization algorithms and their usage of MCD3;
- •
Evaluation of efficient execution time by surveying 10 performance studies using MCD3 or DPP-based visualization; and
- •
Evaluation of overall efficacy of MCD3 by studying the use of MCD3 elements in visualization code.
Finally, these contributions combine to form our overall finding: the MCD3 approach is effective for creating a portably-performant many-core visualization library.
Section snippets
Motivation, background, and related work
This section is divided into three parts. First, Section 2.1 summarizes background on scientific visualization on supercomputers, and provides motivation for MCD3’s goals of minimizing developer time and efficient support for many-core architectures. Next, Section 2.2 provides background on data-parallel primitives. Finally, Section 2.3 surveys the works most closely related to our own research.
MCD3 design
The MCD3 design relies on five constructs:
- •
Devices, which enable code to run on a given hardware architecture.
- •
DPPs, which provide parallel processing patterns.
- •
Data management, which insulates algorithms from data layout complexities. These complexities range from how data is organized (e.g., structure-of-arrays vs array-of-structures) to different types of meshes (e.g., unstructured, rectilinear, etc.) to different memory spaces (e.g., host memory, device memory, or unified managed memory).
- •
Results
We organize results into three areas:
- •
Section 4.1 evaluates developer efficiency.
- •
Section 4.2 considers the efficacy of the overall MCD3 system.
- •
Section 4.3 evaluates performance efficiency.
The results in the first two areas consider the usage of MCD3 constructs within the VTK-m library. To generate the data for these results, we analyzed the current collection of 57 algorithms within the VTK-m source code. The process for collecting this data was automated, and is discussed in Appendix A.
Conclusion and future work
Overall, we feel our results demonstrate the efficacy of MCD3 for our twin goals of minimizing developer time while achieving efficient portable performance on many-core architectures. For developer performance, we feel our analysis calculating the effort for a DPP-only equivalent library to VTK-m is compelling. While the 3.1X number has approximations, the number is high enough that we feel it clearly speaks to savings for the VTK-m development team due to MCD3 principles, especially when
Declaration of Competing Interest
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.parco.2021.102834.
Acknowledgments
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration .
References (57)
Scout: a data-parallel programming language for graphics processors
Parallel Comput.
(2007)- et al.
ParaView: An end-user tool for large data visualization
The Vis. Handb
(2005) Visit: An end-user tool for visualizing and analyzing very large data
High Performance Visualization–Enabling Extreme-Scale Scientific Insight
(2012)Vector Models for Data-Parallel Computing
(1990)VTK-M: Accelerating the visualization toolkit for massively threaded architectures
IEEE Comput. Grap. Appl (CG&A)
(2016)High Performance Visualization—Enabling Extreme-Scale Scientific Insight
(2012)The application visualization system: A computational environment for scientific visualization
Comput. Graph. Appli.
(1989)- et al.
An Extended Data-Flow Architecture for Data Analysis and VisualizationResearch report RC 20001 (88338)
(1995) - et al.
The design and implementation of an object-oriented toolkit for 3D graphics and visualization
- et al.
The EnSight visualization application
High Performance Visualization-Enabling Extreme-Scale Scientific Insight
(2012)
Interactive investigation of fluid mechanics data sets
Megamol-a prototyping framework for particle-based visualization
IEEE Trans. Vis. Comput. Graphics
Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation
New J. Phys.
Yt: A multi-code analysis toolkit for astrophysical simulation data
Astrophys. J. Suppl. Ser.
In situ methods, infrastructures, and applications on high performance computing platforms
Comput. Graph. Forum
A terminology for in situ visualization and analysis systems
Int. J. High Perform. Comput. Appl.
Scientific discovery at the exascale: Report for the DOE ASCR workshop on exascale data management, analysis, and visualization
Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources
The Int. J. High Perf. Comput. Appl.
In situ visualization for computational science
IEEE Comput. Graph. Appl.
C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++
Thrust: A productivity-oriented library for CUDA
GPU-Based Interactive Visualization Techniques
GPU-Based visualization
High Performance Visualization—Enabling Extreme-Scale Scientific Insight
A survey of compressed GPU-based direct volume rendering
Eurographics (STARs)
State-of-the-art in GPU-based large-scale volume visualization
Cited by (4)
VTK-m: Visualization for the Exascale Era and Beyond
2023, Proceedings - SIGGRAPH 2023 TalksA Distributed-Memory Parallel Approach for Volume Rendering with Shadows
2023, Proceedings - 2023 IEEE 13th Symposium on Large Data Analysis and Visualization, LDAV 2023The Need for Pervasive In Situ Analysis and Visualization (P-ISAV)
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Hybrid Analysis of Fusion Data for Online Understanding of Complex Science on Extreme Scale Computers
2022, Proceedings - IEEE International Conference on Cluster Computing, ICCC