A component-based framework for certification of components in a cloud of HPC services
Introduction
HPC Shelf is a cloud computing platform aimed at addressing domain-specific, computationally intensive problems typically emerging from computational science and engineering domains. For this purpose, it provides a range of High Performance Computing (HPC) services based on parallel computing systems. They are built from the orchestration of parallel components representing both software and hardware elements of HPC systems. The hardware elements represent distributed memory parallel computing platforms such as clusters and MPPs.1 Software components, representing parallel computations, attempt to extract the best performance of them.
Parallel computing systems are managed by SAFe (Shelf Application Framework) [1]. By means of SAFe, application providers build applications, through which domain specialists access the services of HPC Shelf. Applications are domain-specific problem-solving environments, such as web portals [2]. They provide a high-level interface through which specialists specify problems. Computational solutions to these problems are automatically generated according to rules programmed by application providers, in the form of parallel computing systems.
Application providers must have technical background to create computational solutions to problems in their application domains. In HPC Shelf, they must be able to identify and combine components to form parallel computing systems. Thus, background on parallel computing platforms and programming for such platforms is not required for application providers. This is a requirement for component developers. Combined with the inherent complexity of parallel system design, this fact implies the need for effective mechanisms to ensure that, in parallel computing systems, components and interactions between them behave as expected by application providers and predicted by component developers. In software engineering, this is a problem known as certification of software components [3], [4], [5], [6], [7]. In the context of HPC Shelf, the certification problem can be seen from two perspectives. In the component perspective, each component implementation is verified against the functional, non-functional, and behavioral requirements declared in its published interface. In the system perspective, typical safety and liveness properties of the component orchestration workflow should be ensured.
The need for rigorous validation, leading to some form of system certification, brings formal methods into the picture in order to identify, and possibly rule out, faulty behaviors in applications. Such methods are, however, not so common in the domain of HPC systems, due to their inherent complexity and the difficulty of their concrete implementation. Indeed, HPC systems are defined by heterogeneous computing platforms composed concurrently. In fact, in spite of more than three decades of active research in formal methods for software development and verification, we are still far from what should be the practice of a true engineering discipline, supported by a stable and sound mathematical basis. In most cases, testing and a posteriori empirical error detection are still dominant, even in scenarios where formal verification is a requirement (e.g. safety-critical systems).
The work reported in this article presents the proposal for a cloud-based general-purpose certification framework for HPC Shelf. Through the proposed framework, components called certifiers may use a set of different certification tools to certify that the components of parallel computing systems meet a certain set of requirements. The case studies used to demonstrate the proposed certification framework are particularly focused on functional and behavioral requirements that can be verified through automated verification methods and tools, such as theorem provers and model checkers. The certification process becomes integrated with the parallel computing systems in a highly modular way, so that new certifier components may be inserted according to the verification tasks required in the certification process.
The certification process may be carried on in parallel. For this, certifier components are defined as parallel certification systems, analogous to parallel computing systems. Parallel certification systems contain a certification-workflow component and a set of tactical components, each one providing the access to an existing certification tool or infrastructure running in a parallel computing platform. Thus, within tactical components, parallel computing may help exploit the maximum performance of the underlying verification infrastructures to accelerate the certification process.
Summing up, the main artifacts produced by the work whose results are reported in this article are the following ones:
- •
A general-purpose certification framework for HPC Shelf;
- •
A class of certifier components, named C4, for the certification of computation components of HPC Shelf;
- •
Another class of certifier components, named SWC2, for the certification of workflow components;
- •
A set of tactical components to make the bridge between the above certifier components and existing formal verification tools.
C4 and SWC2 have the purpose of helping proof-of-concept validation of the certification framework of HPC Shelf. It has also been evaluated in the context of other cloud-based software certification initiatives, with emphasis on works related to VaaS, HPC, and automatic software-verification tools. From this assessment, the following outstanding features and contributions have been identified in favor of the certification framework of HPC Shelf:
- •
It is general purpose, in the sense that it is not intended to certify a particular requirement, although the case studies presented in this article focus on the verification of functional and behavioral properties through deductive program verification and model checking tools.
- •
It does not certify only software components, but any kind of component, including components representing hardware elements, such as parallel computing platforms in HPC Shelf.
- •
It is fully component-oriented, with seamless integration with the environment, in the sense that certification is introduced by certifier and tactical components that may encapsulate certification tools.
- •
It is the first certification framework in the context of component-based high performance computing (CBHPC), where certification may avoid the wasting of time and financial resources due to delays, crashes, and wrong outputs in the execution of long-running computations.
- •
It is the first VaaS framework applied in the context of HPC.
- •
It introduces new ideas for VaaS framework design, such as:
- –
the use of component-orientation to support a higher level of abstraction with respect to underlying formal verification tools;
- –
the clear role separation among certification authorities, component developers, and system builders (application providers);
- –
- •
It presents a general method for exploring parallel processing to speedup certification tasks at several levels, by exploring the parallel computing infrastructure where parallel components subject to certification runs.
Article structure. After a description of HPC Shelf in Section 2, its certification framework is introduced in Section 3. Sections 4 and 5 detail, respectively, the architecture of C4 and SWC2 certifiers. Next, Section 6 presents some case studies to demonstrate the use of C4 and SWC2 certifiers in parallel computing systems. A discussion about related works obtained through a systematic search in scientific databases is presented in Section 7, with emphasis in certification of component-based software systems and VaaS. Finally, Section 8 presents concluding remarks, pointing to further works.
Section snippets
HPC Shelf
HPC Shelf is a cloud computing platform that provides HPC services for providers of domain-specific applications. An application is a problem-solving environment through which specialist users, the end users of HPC Shelf, specify problems and obtain computational solutions for them. It is assumed that these solutions are computationally intensive, thus demanding the use of large-scale parallel computing infrastructure, i.e. comprising multiple parallel computing platforms engaged in a single
The certification framework
For the purpose of leveraging component certification in HPC Shelf, a certification framework is introduced in this section. It encompasses a set of component kinds, composition rules and design patterns. They provide an environment where certification tools can be encapsulated into components to provide some level of assurance to application providers and component developers that components of parallel computing systems meet a predetermined set of requirements prior to their instantiation.
C4: certifiers for computation components
Using the certification framework introduced in Section 3, a class of certifiers for computation components, called C4, is proposed. The name is an acronym for Certifier Components of Computation Components.
The units of a computation component may be viewed as processes running on different processing nodes of a virtual platform. These units can be aggregated into a parallel unit that represents a team of units programmed in the SPMD (Single Program Multiple Data) parallel programming pattern,
SWC2: certifiers for workflow components
As described in Section 2, in the architecture of a parallel computing system, a singleton component called workflow represents the orchestration engine. In the current implementation of HPC Shelf, it may be implemented in two ways:
- •
using a host programming language (currently, C#), by activating actions using the ITaskBinding interface of action ports (Listing 1);
- •
using SAFeSWL, a specific-purpose scientific workflow language for orchestrating parallel components, driving the execution of
Case studies
In this section, three case studies demonstrate the certification framework of HPC Shelf, as well as the use of C4 and SWC2 certifiers.
Performance evaluation experiments have been performed to evidence that the cost of component certification may be reasonable and will not make it impractical to run a parallel computing system with some certifiable components. In order to isolate the pure verification times from the overheads of parallel certification system deployment, the sequential times
Related work
The certification framework of HPC Shelf has not been designed as an incremental evolution of some pre-existing certification framework that could have been taken as a basis. It has been developed from scratch, under its own assumptions, to meet the particular requirements of HPC Shelf. Thus, with regard to the comparison of the certification framework herein proposed with other related works, our main challenge has been to study the literature to find such related works, and then to study
Conclusions and future work
This article has proposed a certification framework for HPC Shelf, aimed at certifying components of different kinds with respect to given set of requirements. By using it, different sorts of certification components, with interfaces to existing formal verification tools, can be added to a parallel computing system. Certification of isolated components, including workflows that orchestrate parallel computing systems, is therefore provided as another service in the cloud, in a truly reflexive
References (94)
- et al.
An institutional theory for #-components
Electron. Notes Theor. Comput. Sci.
(2008) - et al.
Contextual abstraction in a type system for component-based high performance computing platforms
Sci. Comput. Program.
(2016) - et al.
A case study on expressiveness and performance of component-oriented parallel programming
J. Parallel Distrib. Comput.
(2013) Results on the propositional μ-calculus
Theor. Comput. Sci.
(1983)- et al.
Aeolus: a component model for the cloud
Inf. Comput.
(2014) - et al.
Fault-aware management protocols for multi-component applications
J. Syst. Softw.
(2018) Bigraphs and their algebra
Electron. Notes Theor. Comput. Sci.
(2008)- et al.
A BRS-based approach to model and verify cloud systems elasticity
Proc. Comput. Sci.
(2015) - et al.
Multi-tenant Verification-as-a-Service (VaaS) in a cloud
Simul. Model. Pract. Theory
(2016) - et al.
A Scientific Workflow Management System for orchestration of parallel components in a cloud of large-scale parallel processing services
Sci. Comput. Program.
(2019)
Web portals for high-performance computing: a survey
ACM Trans. Web
Certification of software components
IEEE Trans. Softw. Eng.
Certifying off-the-shelf software components
Computer
Software component certification
Computer
Software component certification: a survey
Certifying software component attributes
IEEE Softw.
Towards an architecture for component-oriented parallel programming
Concurr. Comput.: Pract. Exp.
MapReduce: simplified data processing on large clusters
Commun. ACM
MapReduce with components for processing big graphs
Contextual contracts for component-based resource abstraction in a cloud of HPC services
An Introduction to the MPI Standard
Separation logic: a logic for shared mutable data structures
An axiomatic proof technique for parallel programs I
Acta Inform.
Correctness proofs of distributed termination algorithms
ACM Trans. Program. Lang. Syst.
Protocol-based verification of message-passing parallel programs
VCC: a practical system for verifying concurrent C
VeriFast: a powerful, sound, predictable, fast verifier for C and Java
Frama-C
Boogie: a modular reusable verifier for object-oriented programs
Why3: shepherd your herd of provers
The Alt-Ergo automated theorem prover
CVC3
CVC4
Z3: an efficient SMT solver
E – a brainiac theorem prover
AI Commun.
SPASS version 3.5
Vampire
The Coq Proof Assistant Reference Manual, LogiCal Project, version 8.0
Isabelle/HOL: A Proof Assistant for Higher-Order Logic
Formal verification of practical MPI programs
SIGPLAN Not.
CIVL: the concurrency intermediate verification language
Let's verify this with Why3
Int. J. Softw. Tools Technol. Transf.
UML based grid workflow modeling under ASKALON
Sedna: A BPEL-Based Environment for Visual Scientific Workflow Modeling
Scientific workflow management and the Kepler system
Concurr. Comput.: Pract. Exp.
Pegasus: a framework for mapping complex scientific workflows onto distributed systems
Sci. Program.
The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud
Nucleic Acids Res.
Cited by (3)
Multi-Dimensional Certification of Modern Distributed Systems
2023, IEEE Transactions on Services ComputingNon-Functional Certification of Modern Distributed Systems: A Research Manifesto
2023, Proceedings - 2023 IEEE International Conference on Software Services Engineering, SSE 2023Electricity network constraint management using individualised demand aware price policies
2020, CEUR Workshop Proceedings