A component-based framework for certification of components in a cloud of HPC services

https://doi.org/10.1016/j.scico.2019.102379Get rights and content

Highlights

  • HPC Shelf is a component-oriented cloud computing platform for HPC Services.

  • A component-oriented certification framework has been proposed for HPC Shelf.

  • The certification framework integrates formal verification tools.

  • The certification framework supports the VaaS (Verification-as-a-Service) paradigm.

  • SWC2 is a certifier for scientific workflows in HPC Shelf.

Abstract

HPC Shelf is a proposal of a cloud computing platform to provide component-oriented services for High Performance Computing (HPC) applications. This paper presents a Verification-as-a-Service (VaaS) framework for component certification on HPC Shelf. Certification is aimed at providing higher confidence that components of parallel computing systems of HPC Shelf behave as expected according to one or more requirements expressed in their contracts. To this end, new abstractions are introduced, starting with certifier components. They are designed to inspect other components and verify them for different types of functional, non-functional and behavioral requirements. The certification framework is naturally based on parallel computing techniques to speed up verification tasks.

Introduction

HPC Shelf is a cloud computing platform aimed at addressing domain-specific, computationally intensive problems typically emerging from computational science and engineering domains. For this purpose, it provides a range of High Performance Computing (HPC) services based on parallel computing systems. They are built from the orchestration of parallel components representing both software and hardware elements of HPC systems. The hardware elements represent distributed memory parallel computing platforms such as clusters and MPPs.1 Software components, representing parallel computations, attempt to extract the best performance of them.

Parallel computing systems are managed by SAFe (Shelf Application Framework) [1]. By means of SAFe, application providers build applications, through which domain specialists access the services of HPC Shelf. Applications are domain-specific problem-solving environments, such as web portals [2]. They provide a high-level interface through which specialists specify problems. Computational solutions to these problems are automatically generated according to rules programmed by application providers, in the form of parallel computing systems.

Application providers must have technical background to create computational solutions to problems in their application domains. In HPC Shelf, they must be able to identify and combine components to form parallel computing systems. Thus, background on parallel computing platforms and programming for such platforms is not required for application providers. This is a requirement for component developers. Combined with the inherent complexity of parallel system design, this fact implies the need for effective mechanisms to ensure that, in parallel computing systems, components and interactions between them behave as expected by application providers and predicted by component developers. In software engineering, this is a problem known as certification of software components [3], [4], [5], [6], [7]. In the context of HPC Shelf, the certification problem can be seen from two perspectives. In the component perspective, each component implementation is verified against the functional, non-functional, and behavioral requirements declared in its published interface. In the system perspective, typical safety and liveness properties of the component orchestration workflow should be ensured.

The need for rigorous validation, leading to some form of system certification, brings formal methods into the picture in order to identify, and possibly rule out, faulty behaviors in applications. Such methods are, however, not so common in the domain of HPC systems, due to their inherent complexity and the difficulty of their concrete implementation. Indeed, HPC systems are defined by heterogeneous computing platforms composed concurrently. In fact, in spite of more than three decades of active research in formal methods for software development and verification, we are still far from what should be the practice of a true engineering discipline, supported by a stable and sound mathematical basis. In most cases, testing and a posteriori empirical error detection are still dominant, even in scenarios where formal verification is a requirement (e.g. safety-critical systems).

The work reported in this article presents the proposal for a cloud-based general-purpose certification framework for HPC Shelf. Through the proposed framework, components called certifiers may use a set of different certification tools to certify that the components of parallel computing systems meet a certain set of requirements. The case studies used to demonstrate the proposed certification framework are particularly focused on functional and behavioral requirements that can be verified through automated verification methods and tools, such as theorem provers and model checkers. The certification process becomes integrated with the parallel computing systems in a highly modular way, so that new certifier components may be inserted according to the verification tasks required in the certification process.

The certification process may be carried on in parallel. For this, certifier components are defined as parallel certification systems, analogous to parallel computing systems. Parallel certification systems contain a certification-workflow component and a set of tactical components, each one providing the access to an existing certification tool or infrastructure running in a parallel computing platform. Thus, within tactical components, parallel computing may help exploit the maximum performance of the underlying verification infrastructures to accelerate the certification process.

Summing up, the main artifacts produced by the work whose results are reported in this article are the following ones:

  • A general-purpose certification framework for HPC Shelf;

  • A class of certifier components, named C4, for the certification of computation components of HPC Shelf;

  • Another class of certifier components, named SWC2, for the certification of workflow components;

  • A set of tactical components to make the bridge between the above certifier components and existing formal verification tools.

C4 and SWC2 have the purpose of helping proof-of-concept validation of the certification framework of HPC Shelf. It has also been evaluated in the context of other cloud-based software certification initiatives, with emphasis on works related to VaaS, HPC, and automatic software-verification tools. From this assessment, the following outstanding features and contributions have been identified in favor of the certification framework of HPC Shelf:

  • It is general purpose, in the sense that it is not intended to certify a particular requirement, although the case studies presented in this article focus on the verification of functional and behavioral properties through deductive program verification and model checking tools.

  • It does not certify only software components, but any kind of component, including components representing hardware elements, such as parallel computing platforms in HPC Shelf.

  • It is fully component-oriented, with seamless integration with the environment, in the sense that certification is introduced by certifier and tactical components that may encapsulate certification tools.

  • It is the first certification framework in the context of component-based high performance computing (CBHPC), where certification may avoid the wasting of time and financial resources due to delays, crashes, and wrong outputs in the execution of long-running computations.

  • It is the first VaaS framework applied in the context of HPC.

  • It introduces new ideas for VaaS framework design, such as:

    • the use of component-orientation to support a higher level of abstraction with respect to underlying formal verification tools;

    • the clear role separation among certification authorities, component developers, and system builders (application providers);

  • It presents a general method for exploring parallel processing to speedup certification tasks at several levels, by exploring the parallel computing infrastructure where parallel components subject to certification runs.

Article structure.  After a description of HPC Shelf in Section 2, its certification framework is introduced in Section 3. Sections 4 and 5 detail, respectively, the architecture of C4 and SWC2 certifiers. Next, Section 6 presents some case studies to demonstrate the use of C4 and SWC2 certifiers in parallel computing systems. A discussion about related works obtained through a systematic search in scientific databases is presented in Section 7, with emphasis in certification of component-based software systems and VaaS. Finally, Section 8 presents concluding remarks, pointing to further works.

Section snippets

HPC Shelf

HPC Shelf is a cloud computing platform that provides HPC services for providers of domain-specific applications. An application is a problem-solving environment through which specialist users, the end users of HPC Shelf, specify problems and obtain computational solutions for them. It is assumed that these solutions are computationally intensive, thus demanding the use of large-scale parallel computing infrastructure, i.e. comprising multiple parallel computing platforms engaged in a single

The certification framework

For the purpose of leveraging component certification in HPC Shelf, a certification framework is introduced in this section. It encompasses a set of component kinds, composition rules and design patterns. They provide an environment where certification tools can be encapsulated into components to provide some level of assurance to application providers and component developers that components of parallel computing systems meet a predetermined set of requirements prior to their instantiation.

C4: certifiers for computation components

Using the certification framework introduced in Section 3, a class of certifiers for computation components, called C4, is proposed. The name is an acronym for Certifier Components of Computation Components.

The units of a computation component may be viewed as processes running on different processing nodes of a virtual platform. These units can be aggregated into a parallel unit that represents a team of units programmed in the SPMD (Single Program Multiple Data) parallel programming pattern,

SWC2: certifiers for workflow components

As described in Section 2, in the architecture of a parallel computing system, a singleton component called workflow represents the orchestration engine. In the current implementation of HPC Shelf, it may be implemented in two ways:

  • using a host programming language (currently, C#), by activating actions using the ITaskBinding interface of action ports (Listing 1);

  • using SAFeSWL, a specific-purpose scientific workflow language for orchestrating parallel components, driving the execution of

Case studies

In this section, three case studies demonstrate the certification framework of HPC Shelf, as well as the use of C4 and SWC2 certifiers.

Performance evaluation experiments have been performed to evidence that the cost of component certification may be reasonable and will not make it impractical to run a parallel computing system with some certifiable components. In order to isolate the pure verification times from the overheads of parallel certification system deployment, the sequential times

Related work

The certification framework of HPC Shelf has not been designed as an incremental evolution of some pre-existing certification framework that could have been taken as a basis. It has been developed from scratch, under its own assumptions, to meet the particular requirements of HPC Shelf. Thus, with regard to the comparison of the certification framework herein proposed with other related works, our main challenge has been to study the literature to find such related works, and then to study

Conclusions and future work

This article has proposed a certification framework for HPC Shelf, aimed at certifying components of different kinds with respect to given set of requirements. By using it, different sorts of certification components, with interfaces to existing formal verification tools, can be added to a parallel computing system. Certification of isolated components, including workflows that orchestrate parallel computing systems, is therefore provided as another service in the cloud, in a truly reflexive

References (94)

  • P. Calegari et al.

    Web portals for high-performance computing: a survey

    ACM Trans. Web

    (2019)
  • C. Wohlin et al.

    Certification of software components

    IEEE Trans. Softw. Eng.

    (1994)
  • J.M. Voas

    Certifying off-the-shelf software components

    Computer

    (1998)
  • J. Morris et al.

    Software component certification

    Computer

    (2001)
  • A. Alvaro et al.

    Software component certification: a survey

  • J. Boegh

    Certifying software component attributes

    IEEE Softw.

    (2006)
  • F.H. de Carvalho Junior et al.

    Towards an architecture for component-oriented parallel programming

    Concurr. Comput.: Pract. Exp.

    (2007)
  • J. Dean et al.

    MapReduce: simplified data processing on large clusters

    Commun. ACM

    (2008)
  • C.A. Rezende et al.

    MapReduce with components for processing big graphs

  • W.G. Al-Alam et al.

    Contextual contracts for component-based resource abstraction in a cloud of HPC services

  • J. Dongarra et al.

    An Introduction to the MPI Standard

    (Jan. 1995)
  • J.C. Reynolds

    Separation logic: a logic for shared mutable data structures

  • S. Owicki et al.

    An axiomatic proof technique for parallel programs I

    Acta Inform.

    (1976)
  • K.R. Apt

    Correctness proofs of distributed termination algorithms

    ACM Trans. Program. Lang. Syst.

    (1986)
  • H.A. López et al.

    Protocol-based verification of message-passing parallel programs

  • E. Cohen et al.

    VCC: a practical system for verifying concurrent C

  • B. Jacobs et al.

    VeriFast: a powerful, sound, predictable, fast verifier for C and Java

  • P. Cuoq et al.

    Frama-C

  • M. Barnett et al.

    Boogie: a modular reusable verifier for object-oriented programs

  • F. Bobot et al.

    Why3: shepherd your herd of provers

  • F. Bobot et al.

    The Alt-Ergo automated theorem prover

  • C. Barrett et al.

    CVC3

  • C. Barrett et al.

    CVC4

  • L. De Moura et al.

    Z3: an efficient SMT solver

  • S. Schulz

    E – a brainiac theorem prover

    AI Commun.

    (2002)
  • C. Weidenbach et al.

    SPASS version 3.5

  • A. Riazanov et al.

    Vampire

  • The Coq development team

    The Coq Proof Assistant Reference Manual, LogiCal Project, version 8.0

  • T. Nipkow et al.

    Isabelle/HOL: A Proof Assistant for Higher-Order Logic

    (2002)
  • A. Vo et al.

    Formal verification of practical MPI programs

    SIGPLAN Not.

    (2009)
  • S.F. Siegel et al.

    CIVL: the concurrency intermediate verification language

  • F. Bobot et al.

    Let's verify this with Why3

    Int. J. Softw. Tools Technol. Transf.

    (2014)
  • J. Qin et al.

    UML based grid workflow modeling under ASKALON

  • B. Wassermann et al.

    Sedna: A BPEL-Based Environment for Visual Scientific Workflow Modeling

    (2007)
  • B. Ludäscher et al.

    Scientific workflow management and the Kepler system

    Concurr. Comput.: Pract. Exp.

    (2006)
  • E. Deelman et al.

    Pegasus: a framework for mapping complex scientific workflows onto distributed systems

    Sci. Program.

    (2005)
  • K. Wolstencroft et al.

    The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud

    Nucleic Acids Res.

    (2013)
  • Cited by (3)

    View full text