Elsevier

Parallel Computing

Volume 108, December 2021, 102834
Parallel Computing

Minimizing development costs for efficient many-core visualization using MCD3

https://doi.org/10.1016/j.parco.2021.102834Get rights and content

Highlights

  • It introduces a design for portable performance for scientific visualization.

  • It evaluates performance of the design over architectures and algorithms.

  • It shows the performance is comparable to hardware-specific approaches.

  • It evaluates development time savings for visualization algorithms.

  • It establishes the design is neither too complicated nor too simple.

Abstract

Scientific visualization software increasingly needs to support many-core architectures. However, development time is a significant challenge due to the breadth and diversity of both visualization algorithms and architectures. With this work, we introduce a development environment for visualization algorithms on many-core devices that extends the traditional data-parallel primitive (DPP) approach with several existing constructs and an important new construct: meta-DPPs. We refer to our approach as MCD3Meta-DPPs, Convenience routines, Data management, DPPs, and Devices. The twin goals of MCD3 are to reduce developer time and to deliver efficient performance on many-core architectures, and our evaluation considers both of these goals. For development time, we study 57 algorithms implemented in the VTK-m software library and determine that MCD3 leads to significant savings. For efficient performance, we survey ten studies looking at individual algorithms and determine that the MCD3 hardware-agnostic approach leads to performance comparable to hardware-specific approaches: sometimes better, sometimes worse, and better in the aggregate. In total, we find that MCD3 is an effective approach for scientific visualization libraries to support many-core architectures.

Introduction

As supercomputers increasingly include many-core architectures, visualization software design must adapt to support these architectures. This is a significant challenge, since existing parallel visualization software, like ParaView [1] and VisIt [2], have primarily focused on MPI-only parallelism. Further, these efforts represent hundreds of person years of effort and contain hundreds of algorithms. As a result, new, many-core capable designs must achieve twin goals: efficient performance and small development time. This latter goal is particularly necessary not only because of the large number of visualization algorithms but also because of the large number of potential hardware architectures. The worst case would require an implementation for every possible pair of algorithm and architecture. Instead, a better scenario is to achieve “portable performance”, i.e., a single, hardware-agnostic implementation for each algorithm that works efficiently on all architectures.

Data-parallel primitives [3] (DPPs) have the potential to achieve the twin goals of efficient performance and small development time, but their application in the visualization space is non-trivial. In particular, the nature of scientific data (mesh-based data) incurs extra burden with DPPs, as most DPPs are designed to operate on arrays of data and are not concerned with issues such as which vertices lie within a cell. This has two unfortunate consequences for implementing visualization algorithms with DPPs. First, these complex data structures add an increased indexing burden on the developer, who has to maintain and follow links in the data. Second, the irregular data typical of visualization algorithms means that the products of a visualization algorithm do not always have a one-to-one relationship with the input. Some inputs contribute nothing to the output whereas other inputs generate multiple things, creating a “jagged” access pattern for the inputs and outputs of a visualization algorithm. As a result, using DPPs could cause more harm than good — visualization algorithm developers could spend more time addressing data model issues than they would gain from DPPs’ benefits.

With this paper, we introduce a design that augments DPPs with several constructs to shield visualization algorithm developers from potential pitfalls. Our design incorporates existing data management practices that separate memory layout from execution space, adding support for many data layouts and mesh types. We also incorporate the common practice of providing convenience routines for common operations, with our unique contribution being a selection of routines useful for scientific visualization algorithms, such as locating which cell contains a point or such as finding the minimum and maximum value for a field. Finally, and most importantly, we introduce a new construct, which we call meta-DPPs, which combine DPPs and data management to address issues such as following links in data and jagged access patterns. In all, we refer to our algorithm development environment as MCD3Meta-DPPs, Convenience routines, Data management, DPPs, and Devices.

The fundamental research questions of this paper are on the overall viability of the MCD3 approach for many-core visualization and on evaluating the twin goals of minimizing development costs while providing efficient execution times. These questions are explored by analyzing the VTK-m library [4], which is an open source effort that is using MCD3 for many-core visualization. In all, the contributions of this paper are:

  • New meta-DPPs designed for common patterns for scientific visualization algorithms;

  • Evaluation of developer costs by studying 57 visualization algorithms and their usage of MCD3;

  • Evaluation of efficient execution time by surveying 10 performance studies using MCD3 or DPP-based visualization; and

  • Evaluation of overall efficacy of MCD3 by studying the use of MCD3 elements in visualization code.

Finally, these contributions combine to form our overall finding: the MCD3 approach is effective for creating a portably-performant many-core visualization library.

Section snippets

Motivation, background, and related work

This section is divided into three parts. First, Section 2.1 summarizes background on scientific visualization on supercomputers, and provides motivation for MCD3’s goals of minimizing developer time and efficient support for many-core architectures. Next, Section 2.2 provides background on data-parallel primitives. Finally, Section 2.3 surveys the works most closely related to our own research.

MCD3 design

The MCD3 design relies on five constructs:

  • Devices, which enable code to run on a given hardware architecture.

  • DPPs, which provide parallel processing patterns.

  • Data management, which insulates algorithms from data layout complexities. These complexities range from how data is organized (e.g., structure-of-arrays vs array-of-structures) to different types of meshes (e.g., unstructured, rectilinear, etc.) to different memory spaces (e.g., host memory, device memory, or unified managed memory).

Results

We organize results into three areas:

  • Section 4.1 evaluates developer efficiency.

  • Section 4.2 considers the efficacy of the overall MCD3 system.

  • Section 4.3 evaluates performance efficiency.

The results in the first two areas consider the usage of MCD3 constructs within the VTK-m library. To generate the data for these results, we analyzed the current collection of 57 algorithms within the VTK-m source code. The process for collecting this data was automated, and is discussed in Appendix A.

Conclusion and future work

Overall, we feel our results demonstrate the efficacy of MCD3 for our twin goals of minimizing developer time while achieving efficient portable performance on many-core architectures. For developer performance, we feel our analysis calculating the effort for a DPP-only equivalent library to VTK-m is compelling. While the 3.1X number has approximations, the number is high enough that we feel it clearly speaks to savings for the VTK-m development team due to MCD3 principles, especially when

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.parco.2021.102834.

Acknowledgments

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration .

References (57)

  • McCormickP.

    Scout: a data-parallel programming language for graphics processors

    Parallel Comput.

    (2007)
  • AhrensJ. et al.

    ParaView: An end-user tool for large data visualization

    The Vis. Handb

    (2005)
  • ChildsH.

    Visit: An end-user tool for visualizing and analyzing very large data

    High Performance Visualization–Enabling Extreme-Scale Scientific Insight

    (2012)
  • BlellochG.E.

    Vector Models for Data-Parallel Computing

    (1990)
  • MorelandK.

    VTK-M: Accelerating the visualization toolkit for massively threaded architectures

    IEEE Comput. Grap. Appl (CG&A)

    (2016)
  • High Performance Visualization—Enabling Extreme-Scale Scientific Insight

    (2012)
  • UpsonC.

    The application visualization system: A computational environment for scientific visualization

    Comput. Graph. Appli.

    (1989)
  • AbramG. et al.

    An Extended Data-Flow Architecture for Data Analysis and VisualizationResearch report RC 20001 (88338)

    (1995)
  • SchroederW.J. et al.

    The design and implementation of an object-oriented toolkit for 3D graphics and visualization

  • FrankR. et al.

    The EnSight visualization application

    High Performance Visualization-Enabling Extreme-Scale Scientific Insight

    (2012)
  • LegenskyS.M.

    Interactive investigation of fluid mechanics data sets

  • GrottelS. et al.

    Megamol-a prototyping framework for particle-based visualization

    IEEE Trans. Vis. Comput. Graphics

    (2015)
  • S.G. Parker, C.R. Johnson, SCIRun: a scientific programming environment for computational steering, in: Proceedings of...
  • ClyneJ. et al.

    Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation

    New J. Phys.

    (2007)
  • TurkM.

    Yt: A multi-code analysis toolkit for astrophysical simulation data

    Astrophys. J. Suppl. Ser.

    (2011)
  • BauerA.C.

    In situ methods, infrastructures, and applications on high performance computing platforms

    Comput. Graph. Forum

    (2016)
  • ChildsH.

    A terminology for in situ visualization and analysis systems

    Int. J. High Perform. Comput. Appl.

    (2020)
  • AhernS.

    Scientific discovery at the exascale: Report for the DOE ASCR workshop on exascale data management, analysis, and visualization

    (2011)
  • PeterkaT.

    Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources

    The Int. J. High Perf. Comput. Appl.

    (2020)
  • ChildsH. et al.

    In situ visualization for computational science

    IEEE Comput. Graph. Appl.

    (2019)
  • GregoryK. et al.

    C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++

    (2012)
  • S. Iwasaki, et al. BOLT: Optimizing OpenMP parallel regions with user-level threads, in: International Conference on...
  • J. Szuppe, Boost.Compute: A parallel computing library for C++ based on OpenCL, in: Proceedings of the 4th...
  • BellN. et al.

    Thrust: A productivity-oriented library for CUDA

  • WeiskopfD.

    GPU-Based Interactive Visualization Techniques

    (2007)
  • AmentM. et al.

    GPU-Based visualization

    High Performance Visualization—Enabling Extreme-Scale Scientific Insight

    (2012)
  • RodriguezM.B.

    A survey of compressed GPU-based direct volume rendering

    Eurographics (STARs)

    (2013)
  • BeyerJ. et al.

    State-of-the-art in GPU-based large-scale volume visualization

  • Cited by (4)

    View full text