PHAST: Protein-like heteropolymer analysis by statistical thermodynamics

https://doi.org/10.1016/j.cpc.2017.01.021Get rights and content

Abstract

PHAST is a software package written in standard Fortran, with MPI and CUDA extensions, able to efficiently perform parallel multicanonical Monte Carlo simulations of single or multiple heteropolymeric chains, as coarse-grained models for proteins. The outcome data can be straightforwardly analyzed within its microcanonical Statistical Thermodynamics module, which allows for computing the entropy, caloric curve, specific heat and free energies. As a case study, we investigate the aggregation of heteropolymers bioinspired on Aβ2533 fragments and their cross-seeding with IAPP2029 isoforms. Excellent parallel scaling is observed, even under numerically difficult first-order like phase transitions, which are properly described by the built-in fully reconfigurable force fields. Still, the package is free and open source, this shall motivate users to readily adapt it to specific purposes.

Program summary

Program Title: PHAST

Program Files doi: http://dx.doi.org/10.17632/ggds2grzw9.1

Licensing provisions: GNU GPL version 3

Programming language: FORTRAN 90, MPICH 3.0.4, CUDA 8.0

Nature of the problem: Nowadays powerful multicore processors (CPUs) and Graphical Processing Units (GPUs) became much popular and cost-effective, so enabling thermostatistical studies of complex molecular systems to be performed even in personal computers. PHAST provides not only an easily-reconfigurable parallel program for Monte Carlo simulations of linear heteropolymers, as coarse-grained models of proteins, but also permits the automatized microcanonical thermodynamic analysis of those systems.

Solution method: PHAST has three main modules for complementary tasks as: to map any .pdb file-sequence to its inner lexicon by using a configurable hydrophobic scale, while the main simulational module performs parallel Monte Carlo simulations in the multicanonical (MUCA) ensemble and, the analysis module which extracts microcanonical observables from MUCA weights.

External routines/libraries: cuRAND (CUDA), Grace-5.1.22 or higher.

Introduction

Linear heteropolymeric chains of amino acids are known as polypeptides. Proteins, on the other hand, are a large class of biological polymers containing at least one long polypeptide. Together, they constitute a set of macromolecules performing a vast array of functions within organisms, whose specificities strongly depend on their detailed intramolecular interactions that leads to three-dimensional (native) structures by a folding mechanism. The thermodynamic hypothesis [1] states that the native structure of a protein is a unique, stable and kinetically accessible minimum of the free energy solely determined by its (primary) sequence of amino acids. However, eventual misfolding may produce dysfunctional proteins, so inducing the formation of cytotoxic amyloid aggregates, which can culminate on degenerative diseases as type 2 Diabetes Mellitus (DM2) [2] or Alzheimer (AD) [3].

Once those biopolymers are typical representatives of the so-called small-systems, where in general the equivalence of statistical ensembles does not hold (see [4] and references therein), the direct computation of their density of states is a well suited investigation method to result on the system microcanonical thermodynamics [5]. Nevertheless, performing such calculations at a high enough accuracy implies on accumulating large amounts of statistical data through Monte Carlo methods, as Wang–Landau [6] or the multicanonical (MUCA) ensemble [7]. Thus, it consists of a huge computational challenge, which can be reasonably alleviated just by conjugating efficient parallel algorithms and coarse-grained molecular force fields.

To accomplish these demands we designed PHAST, a simple and easy-to-use open source package that enables users to simulate and analyze the microcanonical thermostatistics of general models for semi-flexible linear polymers [8], [9], as coarse-grained proteins [10], [11], [12], [13]. The program brings fully reconfigurable built-in force fields that can be readily modified and extended, this makes it specially apt for prototyping of many-body interacting systems, as polymers embedded in crowded solutions [14]. It is written in standard Fortran and possess MPI and CUDA extensions [15], [16], so being able to efficiently run on most modern parallel computer architectures. The distribution is forbidden for commercial or military purposes, but we hope that PHAST is useful and could be eventually incorporated in other programs, while we kindly request to be informed accordingly. Still, it is naturally supposed to be acknowledged in all resulting publications.

The article is organized as follows, in Section 2 we review some simulation background, this includes coarse-grained modeling and the derivation of microcanonical thermostatistics from parallel multicanonical simulations. Section 3 encompasses an in-depth overview of application modules. In Section 4 two concrete case studies involving heteropolymers bioinspired on amyloidogenic proteins are presented, so the aggregation of Aβ2533 fragments and their further cross-seeding with IAPP2029 isoforms are investigated in the context of a minimal coarse-grained model. The parallel scaling of such simulations is rigorously addressed, while their physical significance is outlined, when possible, by comparisons to other results from the literature. Section 5 brings our conclusions and perspectives for further developments.

Section snippets

Modeling proteins and polymers

Coarse-graining is a widely employed modelingstrategy to reduce the degrees of freedom of many-body macromolecular systems as heteropolymers [9], [17]. Moreover, the study of protein folding and aggregation, specially in crowded media [18], has also largely benefited from such approach [17]. In this vein, PHAST incorporates a configurable force field able to describe semi-flexible linear heteropolymers1

Software framework

The PHAST package is constituted by the following modules

  • SET_INPUT: prepares input protein sequences from PDB files.

  • PHAST: the main engine to run parallel multicanonical simulations.

  • ANALYST: computes the microcanonical thermostatistics from MUCA simulations.

It is not possible to give here a highly detailed description of all program routines. Instead, we address the main features and data workflows of aforementioned modules in the following subsections, which

Example runs

As case studies we have investigated two proteic systems by using the AB-model limit2  [10] of the Hamiltonian Eq. (3). Such modeling does not allow for prediction of protein structures, but may provide an useful method to learn about some general thermodynamical mechanisms underlying biological structural phase transitions [12].

Conclusions

We have presented PHAST, a package for simulating coarse-grained models of proteins and bioinspired polymers in the multicanonical ensemble. Despite of its simple structure, the software comprises a reconfigurable force field which allows for an excellent parallel speedup. Additionally, its modules enable users to quickly map proteins downloaded from PDB to an inner AB-lexicon — or its generalizations as [29]— as well as to automatically compute most of systems microcanonical thermostatistics.

Acknowledgments

This is a long-term project whose author has benefited from enlightening discussions with Leandro G. Rizzi, Lieverton H. Queiroz, Mathias S. Costa, Mikhael C. Chrum, and Nelson A. Alves. This manuscript has been substantially improved by constructive suggestions from both anonymous reviewers. The Brazilian laboratories CENAPAD-SP and LNCC-RJ are acknowledged by providing machine time and helpful technical support, respectively on their IBM P750 [34] and Santos Dumont [38] supercomputing

References (41)

  • AlvesN.A. et al.

    Phys. A

    (2016)
    FrigoriR.B. et al.

    Eur. Phys. J. B

    (2010)
  • SchnabelS. et al.

    Phys. Rev. E

    (2011)
  • JunghansC. et al.

    Phys. Rev. Lett.

    (2006)
  • ZierenbergJ. et al.

    J. Chem. Phys.

    (2014)
  • RosemanM.A.

    J. Mol. Biol.

    (1988)
  • BirdR.B. et al.

    Dynamics of Polymeric Liquids

    (1987)
  • MitsutakeA. et al.

    J. Chem. Phys.

    (2003)
  • BerhanuW.M. et al.

    ACS Chem. Neurosci.

    (2013)
    BerhanuW.M. et al.

    PLoS ONE

    (2014)
  • ZierenbergJ. et al.

    Phys. Proc.

    (2014)
  • HollanderP.A.

    Diabetes Care

    (2003)
  • AnfinsenC.B.

    Science

    (1973)
  • MelmedS.

    Williams Textbook of Endocrinology

    (2011)
  • BurnsA. et al.

    BMJ

    (2009)
  • GrossD.H.E.

    Microcanonical Thermodynamics

    (2001)
  • WangF. et al.

    Phys. Rev. Lett.

    (2001)

    Phys. Rev. E

    (2001)

    Comput. Phys. Comm.

    (2002)
  • BergB.A. et al.

    Phys. Lett. B

    (1991)
    BergB.A.

    J. Stat. Phys.

    (1996)
    BergB.A.

    Fields Inst. Commun.

    (2000)
    BergB.A.

    Comput. Phys. Comm.

    (2003)
  • JankeW. et al.

    Soft Matter

    (2016)
  • StillingerF.H. et al.

    Phys. Rev. E

    (1993)
    StillingerF.H. et al.

    Phys. Rev. E

    (1995)
  • FrigoriR.B. et al.

    J. Chem. Phys.

    (2013)
  • FrigoriR.B.

    Phys. Rev. E

    (2014)
  • Cited by (5)

    • PathMolD-AB: Spatiotemporal pathways of protein folding using parallel molecular dynamics with a coarse-grained model

      2020, Computational Biology and Chemistry
      Citation Excerpt :

      Therefore, simulations with the 3D-AB model demand a lower computational cost, compared to atomic models. For instance, in aggregation studies, where it is required a higher computational effort, this model enabled realistic simulations of fibrillar aggregates (Frigori et al., 2013; Frigori, 2014, 2017). Nowadays, the model has been used in many benchmark works for the PSP problem (Lin et al., 2018; Zhou et al., 2018).

    • Failure analysis integrated with prediction model for LNG transport trailer and thermal hazards induced by an accidental VCE: A case study

      2020, Engineering Failure Analysis
      Citation Excerpt :

      Unconfined rapid phase transitions are generally not considered hazardous, however, these can cause structural damage if they are to occur in a confined space. PHAST examines the progress of potential accidents from the initial release to far-field dispersion analysis including the modeling of tank spreading and evaporation, flammable and explosive effects [41–43]. It is considered the world’s most comprehensive and detailed process hazards analysis software system.

    • Massively parallel multicanonical simulations

      2018, Computer Physics Communications

    This paper and its associated computer program are available via the Computer Physics Communication homepage on ScienceDirect (http://www.sciencedirect.com/science/journal/00104655).

    View full text