Key words

1 Introduction

The reduced size of bacterial cells makes the direct observation by microscopy of its inner biological processes more complicated than for the larger animal cells. The construction of in silico models is a promising alternative to the use of experimental techniques. In particular, model simulations can be used to dynamically study in vivo behaviors and test different hypotheses, prior to engagement into wet lab work [1].

Fluorescence in situ hybridization (FISH), which is a largely used and well-known process for the detection of pathogens (in particular bacteria), can be a practical and relevant case study to model. Typically, FISH consists of a fluorochrome-tagged oligonucleotide which binds to a complementary RNA sequence of interest in the bacterial cytosol [2]. However, the cytosol is surrounded by the bacterial envelope, which is a major barrier for the penetration of oligonucleotides. There are still major knowledge gaps on the factors underlying such limited diffusion, most notably, regarding the diffusion fluctuations observed between bacteria of different groups [3]. Revealing this is key to uncover means to improve the internalization of oligonucleotides and, consequently, boost the efficiency of FISH.

Among computational simulation tools, agent-based modeling (ABM) has successfully demonstrated the ability to simulate and observe various fine-grain aspects of biomolecular complexity [4, 5]. In this particular case, ABM may be a valuable tool to bring forward novel insights into the interaction of the oligonucleotides with the bacterial cell envelope and subsequent internalization.

Generally speaking, ABM represents the constituents of the system as autonomous decision-making entities, i.e., agents, and describes their main behavioral features based on theoretical and experimentally derived rules [6]. Agent behavior can range from the motion of a molecule to the binding of two molecules and may adapt depending on the surrounding environment and the perceived situation [4]. In particular, by modeling the behavior of individual agents as well as their social interplay, the behavior of the whole system and emergent, more complex, phenomena can be observed [7]. The challenge is to represent the complexity of the biological system under analysis as faithfully as possible without complicating the model in excess and increasing computational costs [8]. In FISH, it is important to represent the entire cell, including the bacterial envelope, given that the tagged oligonucleotide must cross the envelope to reach its target in the cytosol. Figure 1 summarizes the steps needed to undertake the ABM approach, from model construction to model validation.

Fig. 1
figure 1

Basic data workflow of biomolecular agent-based modeling. Data retrieval, through literature curation and database (DB) queries, is used to gather the data required to describe the molecules and their interaction logic. The simulator is used to validate and execute the model. Firstly, the positions for each agent are set, either by the user or by random assignment, ensuring that for each agent, the allocated position is free (i.e., not occupied by another agent). Once the initial positions are set, the simulator checks the environment, looking for possible interactions in the neighborhood of each agent. Valid interactions are encoded in the behavioral rules. If no interaction rule is found, the simulator reallocates the agents and starts another iteration. The simulation terminates when the number of iterations or any of the monitored variables reaches a preestablished threshold. The modeling outputs should then be analyzed and validated against experimental data through statistical comparison. If in silico results are in accordance with experimental results, the model is considered valid and may be used in new scenarios. If not, the model should be adjusted and reevaluated

In this chapter, we describe the implementation of such data workflow toward the construction of an ABM on FISH. Most notably, we describe the gathering of data and introduce available ABM frameworks and algorithms, pinpointing mandatory information toward both the construction and the simulation of such model. After a general description of the main methods, we introduce a case study of an oligonucleotide targeting the 23S rRNA of Escherichia coli (E. coli).

Fig. 2
figure 2

Representation of the modeling environment and E. coli cell

2 Methods

2.1 Literature Search

FISH involves multiple molecules interacting with each other and culminates in the detection of a fluorescence signal. This allows the detection and identification of bacterial cells, since they become fluorescent if the specific target sequence is present inside the bacterial cell. As a first modeling step, it is necessary to specify the most relevant agents involved in the system and compile data on their abundance, dimensions, shape, location within the cell (e.g., cytoplasmic membrane), and behavior (i.e., direction, speed, and interaction with other molecules), in order to be able to appropriately characterize them in the modeling environment.

Therefore, the proposed ABM considers the diffusion of molecules (in our case oligonucleotides) in the extracellular medium, in the cellular envelope of bacteria and within the cytoplasm. As such, the simulation environment, that is, the cell, including the bacterial cell envelope, must be characterized in terms of cell shape, dimensions, and volume.

2.1.1 Defining the Simulation Environment

The volume and dimensions of the cell are needed to adequately represent the simulation environment. Such parameters might be dependent on the strain, the phase in cell growth, and even the growth conditions. So, the data curation workflow, combining literature and database searches, should account for:

  1. 1.

    The cell’s dimensions, which depend on the cell’s geometrical shape. If the cell is spherical in shape, the radius is enough to characterize the simulation environment; if the cell is cylindrical, both the length and diameter must be defined. Refer to Note 1 in case of missing data.

  2. 2.

    The thickness of each layer of the cell wall and the cytoplasmic membrane. Refer to Note 1 in case of missing data.

  3. 3.

    The dimensions of the cytoplasm, which can be obtained by subtracting the thicknesses of the membranes.

  4. 4.

    The definition of the extracellular volume, which should be sufficiently large to accommodate a statistically meaningful number of molecules.

2.1.2 Defining the Agents

The bacterial envelope and the cytosol contain thousands of molecular species but, due to the inherent high computational costs, biomolecular models typically only represent the most relevant molecules [9]. Therefore, the data curation workflow should proceed by:

  1. 1.

    Listing the oligonucleotide, fluorochrome , target sequence, and additional molecules, such as non-target nucleic acids, membrane phospholipids, and proteins that might affect the oligonucleotide diffusion, which must be part of the modeled environment.

  2. 2.

    Performing independent queries for each molecule , looking for data on abundance, size, shape, volume, diffusion coefficient (Dc), and charge, if applicable. Refer to Notes 13 in case of missing data. Furthermore

    1. (a)

      If the size is not found in the literature, calculate the hydrodynamic radius (Rh). Spherical approximation is typically assumed when representing a molecule in a computational simulation [10] and, in particular, the Rh is an effective approximation of its dimensions. The Rh can be obtained as a function of the molecular weight (Mw) in Daltons. Use Eq. 1 for RNA molecules, Eq. 2 for proteins and Eqs. 3, 4, or 5 for DNA, depending whether its conformation is linear, coiled, or supercoiled [11]. For some modified nucleic acid mimics (NAMs) and some proteins, an ellipsoid-like conformation is more realistic. In this case, use Eq. 6 [12]. The volume of amino acids and sugars can be calculated using Eq. 8 [13]. Typically, molecules are represented as being spherical or ellipsoid. As such, the radii of amino acids and sugars can be extrapolated from the calculated volume.

      Equation 1—RNA molecules

      $$ {R}_{\mathrm{h}}\left({M}_{\mathrm{w}}\right)=0.0515\times {M}_{\mathrm{w}}^{0.38} $$
      (1)

      Equation 2—Proteins

      $$ {R}_{\mathrm{h}}\left({M}_{\mathrm{w}}\right)=0.0515\times {M}_{\mathrm{w}}^{0.329} $$
      (2)

      Equation 3—Linear DNA

      $$ {R}_{\mathrm{h}}\left({M}_{\mathrm{w}}\right)=0.024\times {M}_{\mathrm{w}}^{0.57} $$
      (3)

      Equation 4—Coiled DNA

      $$ {R}_{\mathrm{h}}\left({M}_{\mathrm{w}}\right)=0.0125\times {M}_{\mathrm{w}}^{0.59} $$
      (4)

      Equation 5—Supercoiled DNA

      $$ {R}_{\mathrm{h}}\left({M}_{\mathrm{w}}\right)=0.0145\times {M}_{\mathrm{w}}^{0.57} $$
      (5)

      Equation 6—Ellipsoid molecules

      $$ {R}_{\mathrm{h}}= aA{(p)}^{-1} $$
      (6)

      where p is the ellipsoid aspect ratio (p = ba), a is the measure of the polar semi-axis in nm, and b is the measure of the equatorial semi-axes in nm. A(p) can be calculated using Eq. 7.

      $$ A(p)=\frac{1}{\sqrt{\mid 1-{p}^2\mid }}\Big\{\mathrm{arctanarctan}\ \left(\sqrt{p^2-1}\right)\kern0.5em \left(\mathrm{oblate}:p>1\right)\ \mathrm{lnln}\ \frac{1+\sqrt{1-{p}^2}}{p}\kern0.75em \left(\mathrm{prolate}:p<1\right) $$
      (7)
      $$ V=\sum \mathrm{all}\ \mathrm{atom}\ \mathrm{contributions}-5.92{N}_{\mathrm{B}}-14.7{R}_{\mathrm{A}}-3.8{R}_{\mathrm{NA}} $$
      (8)

      where V is in Å3/molecule , the atom contributions are in cm3/mol, NB is the number of bonds present in the molecule and can be calculated by Eq. 9, RA is the number of aromatic rings, and RNA is the number of nonaromatic rings.

      $$ {N}_{\mathrm{B}}=N-1+{R}_{\mathrm{g}} $$
      (9)

      where Rg is the total number of ring structures and can be calculated as follows (Eq. 10):

      $$ {R}_{\mathrm{g}}={R}_{\mathrm{A}}+{R}_{\mathrm{NA}} $$
      (10)
    2. (b)

      If the Dc is not found in the literature, it can be calculated using the Stokes-Einstein equation (Eq. 11) for spherical particles. Keep in mind that this Equation does not take into account molecular crowding. Nonetheless, it is the best approximation when experimental data are missing. For ellipsoid molecules, use Eq. 12.

      Equation 11—Stokes-Einstein equation

      $$ {D}_{\mathrm{c}}=\frac{K_{\mathrm{b}}\times T}{6\times \pi \times \upeta \times {R}_{\mathrm{h}}} $$
      (11)

      where Kb is the Boltzmann’s constant with a value of 1.3806488 × 10−23 m2 kg/s2/K1, T is the temperature (estimated around 25 °C, equivalent to 298.12 K), and η is the viscosity of the medium in which the particle moves.

      Equation 12—Translational diffusion coefficient

$$ {tD}_{\mathrm{c}}=\frac{K_{\mathrm{b}}T}{6\times \pi \times \upeta \times a}A(p) $$
(12)
  1. 3.

    Calculating the average time (τ) required for a molecule to move from one position in the system to another (Eq. 13) [14]. This depends on the diffusion coefficient and on the size of the molecule

    $$ \tau =\frac{\lambda^2}{6{D}_c} $$
    (13)

    where λ is the longest dimension of the molecule . We will use the Rh multiplied by two as the λ for spherical molecules and the equatorial axis as the λ for ellipsoid molecules.

  2. 4.

    Adjusting agent populations according to the attainable total number of agents. The values of reference for molecule concentrations may be found in databases and papers. These values can be scaled up or down to agree with the volume defined for the simulation, and in particular, the number of agents should be adjusted in proportion to their abundance in the cell

2.1.3 Defining the Interactions Between Agents

To represent the biological system as faithfully as possible, all interactions between the molecules of the system (i.e., the agents) should be described. Therefore, the data curation workflow should proceed by:

  1. 1.

    Describing all the behavioral rules for each agent, including hybridization (i.e., when the tagged oligonucleotide and ribosome collide with each other and become attached), dissociation (i.e., when a tagged oligonucleotide and a ribosome hybridized to each other detach from one another), rebound (i.e., when two molecules collide and their direction must be changed, given that the hybridization interaction does not apply), and movement (i.e., the default action carried out by the molecule to a vacant position if none of the other interactions is possible). Equation 14 represents the hybridization and dissociation interactions

    Equation 14—Hybridization and Dissociation

    $$ {O}_{\mathrm{f}}+{R}_{\mathrm{f}}\leftrightarrow \mathrm{OR} $$
    (14)

    where Of represents a free oligonucleotide, Rf represents a free ribosome, OR represents an oligonucleotide attached (hybridized) to a ribosome.

  2. 2.

    Calculating the melting temperature (Tm) of the free oligonucleotide. The Tm can be defined as the temperature at which half of the oligonucleotides are hybridized and half are not, that is, the temperature at which the dissociation factor is 0.5. As such, the Tm can be used as an affinity parameter between the free oligonucleotide and the target sequence and a dissociation curve can be plotted, in order to get the dissociation factor at the temperature defined for the simulation. However, calculating the Tm is not straightforward and it depends on the type of oligonucleotide; the Tm can be calculated following the mathematical models described in [15]. For simplification, it can be assumed that whenever a collision between a free tagged oligonucleotide and a free ribosome occurs, they bind to each other. This assumption is only valid for a typical FISH temperature

2.2 Simulation

Modeling cellular processes at the single-molecule level is currently feasible but extremely demanding in terms of computing power and software programming expertise. Hence, computational biologists often resort to coarser approaches to simulate large biological processes [4]. In fact, most research facilities only have access to mid-performance computers and cannot realistically simulate highly detailed models [16]. As a possible solution, the use of simulation frameworks can reduce some of the burden of implementing ABM. There are several frameworks which can be applied to model biological processes. These can either be general ABM platforms, such as Swarm [17], or Multi-Agent Simulation of Neighbourhoods (MASON) [18], or specific biologic simulators, such as ReaDDy [19] and Smoldyn [20]. The more general ABM frameworks allow the simulation of three-dimensional environments and of complex systems, involving thousands of heterogeneous agents, but demand skilled and cost-effective programming. Unlike the more general ABM toolkits, most of the specialized biomolecular simulation frameworks do not support three-dimensional modeling, and the ones which do require optimization or extension efforts. Therefore, choosing a modeling platform is not straightforward and should be carefully thought out, based on extensive surveys, experience, and programming knowledge [16]. A review of the available simulation platforms is out of the scope of this work, as several reviews on the topic are already published [21,22,23].

To further reduce the demand in computing power, simpler simulation algorithms can be developed. In the single-thread approach, the agents are introduced in a scheduler, which keeps track of all the agents in the model and executes them one by one. So, at each simulation step, each agent evaluates its neighbors and checks if any of its behavioral rules apply. In an attempt to create even more efficient and less-demanding algorithms, Pérez-Rodriguez et al. developed two alternative approaches [16]. The first alternative is the parallel approach, which introduces only one agent, named the controller, in the scheduler. The controller creates different threads that will handle the population of agents, that is, instead of iterating over one agent at a time, it iterates over multiple agents at the same time. In addition, it uses an optimized map-like structure that stores the agents by their position in one of three axes. The second approach, named the partitioning approach, also introduces only one agent in the scheduler. This agent creates several environment partitions, each to be executed by a different thread and with its own map, which, again, decreases the number of iterations.

The simulation strategy should take into account the following decisions:

  1. 1.

    Choose the simulation framework, notably depending on the computational power at hand and familiarity with the required operating system and programming language.

  2. 2.

    Decide on the simulation approach to be implemented, namely single-thread, parallel, or spatially partitioned, or even, if desired, an in-house devised approach. Refer to Notes 46 for additional information.

  3. 3.

    Set the initial number of agents and their initial positions (if not randomly assigned by the simulator).

  4. 4.

    Make sure that the simulator interprets model data correctly (e.g., the dimensions and diffusion of the agents, as well as main interaction rules).

  5. 5.

    Initiate the simulation.

2.3 Model Validation

Simulation outputs must be validated with experimental data. In the case of an ABM simulating FISH, the simulated diffusion times of the tagged oligonucleotide, both across the bacterial envelope and the cytoplasm, must be compared with experimental values, which can be obtained with techniques such as fluorescence recovery after photobleaching (FRAP) [24]. The optimized simulated concentration of uptake, that is, the minimum concentration, which gives out the maximum fluorescence signal, can also be compared with experimental values, by performing FISH and testing different concentrations. If the modeled diffusion times and the ideal concentration of uptake are statistically similar to the corresponding experimental values, the model is considered valid.

3 Practical Example on the Construction of an ABM on FISH

Herein, we present a practical example of the information that should be gathered to construct an ABM model of FISH. This model only considers one E. coli cell. Data were collected for an oligonucleotide typically used in FISH to target the 23S rRNA of E. coli , extracted from ProbeBase (accession no. pB-115 [25]).

The dimensions of E. coli K12, including the thickness of its cell envelope layers, are based on experimental values, notably a length of 2620 nm and a diameter of 680 nm [26]. Regarding the cell envelope, a lipid bilayer has a thickness of around 3–4 nm. In here, a thickness of 4 nm was considered for both the outer membrane and the cytoplasmic membrane, since a fine-grained model including detailed composition of each membrane is out of the scope of this tutorial. The periplasm has a thickness of 11–15 nm ([3] cited by [27]). In here, a thickness of 14 nm was considered. The dimensions of the cytoplasm can be calculated by subtracting the thickness of each layer of the cell envelope from the length and from the radius of the cell, which gives a cytoplasmic length of 2598 nm and a cytoplasmic radius of 658 nm. The model also accounts for the extracellular volume, where the tagged oligonucleotide is positioned at the start of simulation. The viscosity of the extracellular medium is assumed as equal to the viscosity of water (6.913 × 10−4 Pa/s at 37 °C, [28]). Figure 2 represents the simulation environment, modelled as a parallelepiped set to a length of 3144 nm and a width and height of 1032 nm. The dimensions of the parallelepiped were calculated by adding 20% to the length or width of the E. coli cell.

Fig. 3
figure 3

Interactions rules for the considered system. (1) A free tagged oligonucleotide collides with the environment border, leading to a rebound reaction (red arrow). (2) A free tagged oligonucleotide collides with a bound ribosome, leading to a rebound reaction of both molecules (red arrows). (3) A free tagged oligonucleotide collides with a free ribosome, leading to a binding reaction between both molecules

Table 1 contains the information about the ribosomes and its subunits, the oligonucleotide and the fluorophore , i.e., the agents of the model. All molecules were assumed to be spherical in shape, except for fluorescein and the tagged oligonucleotide, which were assumed to have an oblate shape. Fluorescein was set with 0.7 and 0.2 nm as the equatorial and polar semi-axes, respectively [35]. The equatorial axis of the tagged oligonucleotide was calculated as the sum of the equatorial axis of fluorescein and the diameter of the oligonucleotide, and the polar axis was assumed as equal to the diameter of the oligonucleotide. Moreover, its diffusion coefficient was calculated in the extracellular medium and in the cytoplasm, but not in each layer of the bacterial envelope, since these are composed of heterogeneous molecules which form electrostatic interactions with each other and with the oligonucleotide.

Table 1 Data to be introduced in the ABM representing FISH in E. coli

To calculate the number of oligonucleotides in the extracellular medium at the beginning of the simulation, a concentration of 200 nM of oligonucleotides in hybridization solution was assumed. In addition, it was also assumed that 50 μL of this solution is added to the solution containing the permeabilized bacteria (500 μL). With this, the number of oligonucleotides in the considered extracellular volume can be extrapolated. Throughout the entire simulation, the same number of molecules should be present in the extracellular medium represented, which is achieved by simulating constant reposition of molecules, since in a FISH experiment, there is an excess of oligonucleotides compared to the number of cells.

Regarding the interaction rules (Fig. 3), the following apply:

  1. 1.

    If a free tagged oligonucleotide collides with a free ribosome, both molecules stay attached to each other.

  2. 2.

    If two molecules, other than a free tagged oligonucleotide and a free ribosome, collide with each other, a rebound interaction occurs.

  3. 3.

    If a tagged oligonucleotide collides with the environment borders, a rebound interaction occurs.

4 Notes

Due to the incompleteness of data, especially of experimental diffusion values, and inconsistencies between sources, assumptions have to be made at this stage.

  1. 1.

    In the event that data are not available for the chosen microorganism, information can be obtained by phylogenetic comparison or following other criteria of similarity adequate for that particular information.

  2. 2.

    When data cannot be collected for a certain molecule , the search can be directed to other molecules with similar physicochemical characteristics and size.

  3. 3.

    Data should be scaled to the same units of measure. That is, do not use values in nm and μm in the same simulation.

  4. 4.

    If choosing a parallel or a spatially partitioned approach, select the number of threads, that is, the number of agents the simulator should iterate at the same time.

  5. 5.

    If choosing a spatially partitioned approach, select the number of partitions, that is, the number of divisions of the environment.

  6. 6.

    When applying a partitioned or spatially partitioned approach, it is recommended to select at least as many threads as cores are available in the host machine, since the highest the number of cores, the more set of instructions can the machine process at the same time.