freud: A software suite for high throughput analysis of particle simulation data,☆☆

https://doi.org/10.1016/j.cpc.2020.107275Get rights and content

Abstract

The freud Python package is a library for analyzing simulation data. Written with modern simulation and data analysis workflows in mind, freud provides a Python interface to fast, parallelized C++ routines that run efficiently on laptops, workstations, and supercomputing clusters. The package provides the core tools for finding particle neighbors in periodic systems, and offers a uniform API to a wide variety of methods implemented using these tools. As such, freud users can access standard methods such as the radial distribution function as well as newer, more specialized methods such as the potential of mean force and torque and local crystal environment analysis with equal ease. Rather than providing its own trajectory data structure, freud operates either directly on NumPy arrays or on trajectory data structures provided by other Python packages. This design allows freud to transparently interface with many trajectory file formats by leveraging the file parsing abilities of other trajectory management tools. By remaining agnostic to its data source, freud is suitable for analyzing any particle simulation, regardless of the original data representation or simulation method. When used for on-the-fly analysis in conjunction with scriptable simulation software such as HOOMD-blue, freud enables smart simulations that adapt to the current state of the system, allowing users to study phenomena such as nucleation and growth.

Program summary

Program Title: freud

Program Files doi: http://dx.doi.org/10.17632/v7wmv9xcct.1

Licensing provisions: BSD 3-Clause

Programming language: Python, C++

Nature of problem: Simulations of coarse-grained, nano-scale, and colloidal particle systems typically require analyses specialized to a particular system. Certain more standardized techniques – including correlation functions, order parameters, and clustering – are computationally intensive tasks that must be carefully implemented to scale to the larger systems common in modern simulations.

Solution method: freud performs a wide variety of particle system analyses, offering a Python API that interfaces with many other tools in computational molecular sciences via NumPy array inputs and outputs. The algorithms in freud leverage parallelized C++ to scale to large systems and enable real-time analysis. The library’s broad set of features encode few assumptions compared to other analysis packages, enabling analysis of a broader class of data ranging from biomolecular simulations to colloidal experiments.

Additional comments including restrictions and unusual features:

1. freud provides very fast parallel implementations of standard analysis methods like RDFs and correlation functions.

2. freud includes the reference implementation for the potential of mean force and torque (PMFT).

3. freud provides various novel methods for characterizing particle environments, including the calculation of descriptors useful for machine learning. The source code is hosted on GitHub (https://github.com/glotzerlab/freud), and documentation is available online (https://freud.readthedocs.io/). The package may be installed via pip install freud-analysis or conda install -c conda-forge freud.

Introduction

Molecular simulation is a crucial pillar in the investigation of scientific phenomena. Increased computational resources, better algorithms, and new hardware architectures have made it possible to simulate complex systems over longer timescales than ever before [1], [2], [3], [4], [5]. The sheer volume of data necessitates computationally efficient analysis tools, while the diversity of data requires flexible tools that can be adapted for specific systems. Additionally, to support scientists with limited prior computing experience, tools must be usable without extensive knowledge of the underlying code.

Numerous software packages that satisfy these requirements have been developed in recent years. Tools such as MDTraj [6], MDAnalysis [7], LOOS [8], MMTK [9], and VMD [10] provide efficient implementations of various standard analysis methods. Although powerful, such tools are generally limited in scope to all-atom simulations, particularly biomolecular simulations. This focus is manifested not only through the features these tools provide, but also in their general design philosophies.

Perhaps the most pronounced characteristic of such tools is a strong emphasis on trajectory management, which includes parsing trajectory files and supporting extensive topology selection features to enable, for instance, selecting all residues or atoms in a protein backbone. Although such tools are crucial for working with topologies in atomistic simulations, they are frequently cumbersome for working with coarse-grained simulation data where the trivial selection (all particles in the system) is the most common selection for various analyses. Moreover, such topology selection tools make assumptions that are inappropriate for non-atomistic systems: “bonding” in colloidal systems, for instance, is typically based on whether two particles are found to be in the same neighborhood by some distance-based metric, not by the presence of a true chemical bond. Since such determination of nearest neighbors is highly dynamic and parameter-dependent, it must be calculated on-the-fly and cannot be stored in a trajectory.

Another inconvenient but almost universal implementation choice is to directly tie analysis methods to trajectories by writing code that acts directly on some in-memory representation of a trajectory. This direct linkage is generally inflexible because it inhibits pre-processing of the data before running the analysis, which is often crucial to analyzing more specialized systems. More importantly, existing tools emphasize implementations of highly specific analyses involving, for instance, hydrogen bonding and protein secondary structure (using, e.g., DSSP [11]), which are far less useful for analyzing non-biomolecular systems. The predominant analyses of coarse-grained, colloidal-scale, or nanoparticle simulations usually involve measurements like numbers of nearest neighbors, diffraction patterns, or bond-orientational order parameters. These analyses bear little relation to the analyses performed for atomistic systems. These considerations suggest a need for a different type of analysis package that offers different methods than most existing tools.

In this paper we introduce freud, an open-source simulation analysis toolkit that addresses these needs. All inputs to and outputs from freud are numerical arrays of data, and the package makes no reference to predefined notions of atoms or molecules. As a result, freud can analyze particle-based data from both experiments and simulations regardless of the specific tools, methods, or software that were used to generate it. The package provides a Python Application Programming Interface (API) for accessing fast methods implemented in C++, and it implements numerous specific methods such as radial distribution functions and correlation functions that are common in the field of soft-matter physics (see Fig. 1). Prior works have used freud for: determining spatial correlation functions and potentials of mean force and torque (PMFTs) in two dimensions [1]; calculating Steinhardt order parameters for identifying solid-like particles [12], [13]; computing spherical harmonics for machine learning on crystal structures [14]; optimizing pair potentials for designing complex crystals [15]; calculating strain fields by finding neighbors of particles against a uniform grid [16]; finding PMFTs in depletion-mediated self-assembly of hard cuboctahedra [17]; measuring rotational degrees of freedom in entropically ordered systems [18]; umbrella sampling of solid–solid phase transitions using Steinhardt order parameters [19]; evaluating PMFTs in analysis of two-dimensional shape allophiles [20]; and more. The freud library is designed to work well with coarse-grained particle models, such as those used in simulations of anisotropic nanoparticles, colloidal crystals, and polymers, and its methods are particularly useful for studies of phase transitions and critical phenomena in such systems. The package is likely to be of greatest interest to scientific communities in materials science, chemical engineering, and physics, though many of its analysis methods would be useful in generic particle systems. The freud library also integrates well into the scientific Python ecosystem, especially in data pipelines for machine learning and visualization [21].

The paper is organized as follows. We first address the core design principles that went into building freud in Section 2. Section 3 focuses more specifically on the details of the code, including information on class structures. Section 4 describes the various analysis methods in freud and details their uses. Finally, in Section 5 we provide some example code demonstrating the usage of freud.1 The figures in this paper are rendered using Matplotlib [22] unless otherwise noted.

Section snippets

Design

Many of the best known tools for analyzing molecular simulations are built into either simulation toolkits (such as LAMMPS [23], GROMACS [24], or the cpptraj [25] plugin to Amber [26]) or visualization toolkits (such as VMD [10], PyMOL [27], or OVITO [28]). Although most of these have introduced varying degrees of scripting support over the years, the analyses built into simulation toolkits are primarily focused on performing one-shot analyses on trajectory files directly from the command line.

Implementation

The freud package is entirely object-oriented, with two core C++ classes: the Box class, which encapsulates all logic associated with periodicity in arbitrary triclinic boxes (boxes with 3 linearly independent basis vectors); and the NeighborQuery class, which facilitates efficiently finding, storing, and iterating over nearest neighbors. In keeping with the Python ethos, box objects in freud may be constructed from a variety of inputs. Any method in freud that accepts a box object also accepts

General utilities

The general utilities in freud are contained in two modules: box and locality. The box module contains the core Box class. The locality module contains the NeighborQuery abstract class, which defines the standardized query API. NeighborQuery results (neighboring particle pairs) can be obtained dynamically or stored in the NeighborList class provided by the locality module.

Box periodicity is built in at the lowest level of the NeighborQuery subclasses, which are highly optimized for this use

Examples

In this section, we demonstrate the use of freud in conjunction with the broader scientific software ecosystem. The code for these examples and many others is available at https://github.com/glotzerlab/freud-examples.

Conclusion

freud is a high-performance Python library for analyzing particle simulations. Among simulation analysis packages, freud is unique due to its emphasis on coarse-grained simulations and its flexibility. Its high-performance C++ back-end makes freud a suitable solution for large-scale, high-throughput simulation analysis, while its simple, compact API is highly amenable to integration with other tools for, e.g., machine learning applications. The package’s API also promotes the prototyping of new

Acknowledgments

Support for the design and development of freud has evolved over time and with programmatic research directions. Conceptualization and early implementations were supported in part by the DOD/ASD(R&E) under Award No. N00244-09-1-0062 and also by the National Science Foundation, Integrative Graduate Education and Research Traineeship, Award # DGE 0903629 (E.S.H. and M.P.S.). A majority of the code development including all public code releases was supported by the National Science Foundation,

References (67)

  • FreddolinoP.L. et al.

    Biophys. J.

    (2008)
  • McGibbonR.T. et al.

    Biophys. J.

    (2015)
  • HumphreyW. et al.

    J. Mol. Graph.

    (1996)
  • PlimptonS.

    J. Comput. Phys.

    (1995)
  • BerendsenH. et al.

    Comput. Phys. Comm.

    (1995)
  • AndersonJ.A. et al.

    Comput. Phys. Comm.

    (2016)
  • AndersonJ.A. et al.

    J. Comput. Phys.

    (2008)
  • GlaserJ. et al.

    Comput. Phys. Comm.

    (2015)
  • AndersonJ.A. et al.

    Comput. Phys. Comm.

    (2016)
  • KeysA.S. et al.

    J. Comput. Phys.

    (2011)
  • AndersonJ.A. et al.

    Phys. Rev. X

    (2017)
  • SimonA.J. et al.

    Nature Chem.

    (2019)
  • NiethammerC. et al.

    J. Chem. Theory Comput.

    (2014)
  • ShawD.E. et al.
  • Michaud-AgrawalN. et al.

    J. Comput. Chem.

    (2011)
  • RomoT. et al.
  • HinsenK.

    J. Comput. Chem.

    (2000)
  • KabschW. et al.

    Biopolymers

    (1983)
  • ReinhartW.F. et al.

    J. Chem. Phys.

    (2018)
  • HowardM.P. et al.

    J. Chem. Phys.

    (2018)
  • SpellingsM. et al.

    AIChE J.

    (2018)
  • AdorfC.S. et al.

    J. Chem. Phys.

    (2018)
  • VansadersB. et al.

    Phys. Rev. Mater.

    (2018)
  • KarasA.S. et al.

    Soft Matter

    (2016)
  • AntonagliaJ.A. et al.

    Mapping disorder in entropically ordered crystals

    (2018)
  • DuC.X. et al.

    Proc. Natl. Acad. Sci. USA

    (2016)
  • HarperE.S. et al.

    Soft Matter

    (2015)
  • DiceB. et al.
  • HunterJ.D.

    Comput. Sci. Eng.

    (2007)
  • RoeD.R. et al.

    J. Chem. Theory Comput.

    (2013)
  • CaseD.A. et al.

    J. Comput. Chem.

    (2005)
  • SchrödingerL.

    The pymol molecular graphics system, version 2.3

    (2019)
  • StukowskiA.

    Modelling Simulation Mater. Sci. Eng.

    (2010)
  • Cited by (167)

    • Phase boundaries of bulk 2D rhombi

      2024, Computational Materials Science
    View all citing articles on Scopus

    The review of this paper was arranged by Prof. D.P. Landau.

    ☆☆

    This paper and its associated computer program are available via the Computer Physics Communication homepage on ScienceDirect (http://www.sciencedirect.com/science/journal/00104655)

    View full text