Optimisation and parallelisation strategies for Monte Carlo simulation of HIV infection

https://doi.org/10.1016/j.compbiomed.2006.06.010Get rights and content

Abstract

In recent years, the study of immune response behaviour through mathematical and computational models has attracted considerable efforts. The dynamics of key cell types, and their interactions, has been a primary focus in terms of building a picture of how the immune system responds to a threat. Discrete methods, based on lattice Monte-Carlo (MC) models, with their flexibility and relative simplicity have previously been used to model the immune system behaviour. However, due to speed and memory constraints, large-scale simulations cannot be done on a single computer. Key issues in the reduction of simulation time are code optimisation and code parallelisation. In this paper, optimisation and parallelisation solutions are discussed, with reference to existing MC simulation code for dynamics of HIV infection.

Introduction

Understanding the behaviour of immune system is a key issue for immunologists and medical practitioners in seeking new treatments for persistent diseases. Most of the time, particularly for lethal bacteria or viruses, in vivo experiments cannot be performed, hence, models are built to simulate part of the real-life behaviour of the system. Analysis of the simulation results leads to the discovery of new behavioural rules that can be used in exploring the potential for new drugs and treatments.

The immune response is the natural defense of the body against foreign entities that invade the host cells, causing minor infections to major diseases. From antigen infection to elimination, the immune response is a complicated process that involves the interactions of a variety of immune cells such as: macrophages (responsible for phagocytosis of pathogens, dead cells and cellular debris), cytotoxic T cells and helper T cells (types of white blood cell or leukocyte with surface antigen receptors that can bind to fragments of antigens). Additionally, the immune repertoire includes plasma B cells, secreting antibodies, which destroy antigens by binding to them and making them easier targets for phagocytes. Further defense features include memory B cells that are specifically primed by the antigen(s) encountered during the primary immune response; these memory cells are able to live for a long time and can respond quickly upon second exposure to an antigen for which they are specific. Finally, the system contains the viral cells and antibodies which react to them. The immune system has been modelled in several ways, featuring different subsets of cells, together with interactions for describing progression and inhibition of an infection. See Kuby [1] and Roitt [2] for an in-depth introduction to immunology.

Cellular automata (CA) models were first introduced by Neumann [3] in the late 1940s to describe elementary units that can reproduce themselves. CA are used to mimic the global behaviour of a system from its local rules and gave good results in early studies on biological, chemical and physical systems [4], [5].

Over the last several decades they have been applied to many different problems, such as heat and wave equations [6], forest fire spread [7], railway traffic [8] or even cryptography [9]. Typically defined on an n-dimensional grid (n1), discrete CA are able to describe continuous dynamical systems through a set of simple rules. Lattice CA are characterised by the following properties:

  • An n-dimensional discrete lattice of cells.

  • A cell state chosen from a finite set of states.

  • A neighbourhood (the set of cells that can interact with each given cell).

  • A set of rules determining states of a cell and its neighbours.

  • The evolution, in discrete time steps, of the cellular state.

For probabilistic models, the following property is added:
  • Any rule can depend on a specified probability.

Stochastic CA widen the range of dynamical systems that can be modelled.

Monte-Carlo (MC) methods are statistical simulation methods, using random numbers to give approximate solutions to mathematical problems. Developed by Neumann [3], Ulam [10] and Metropolis [11] during World War Two, this method was first used to model neutron diffusion in fissile material and give an approximation of the eigenvalues of the Schrödinger equation [12]. Since then MC methods have been used to model a large variety of problems, from pi estimation [13] to immune system simulation [14], [15].

In the case of HIV modelling, the MC method can be used with CA to mimic different aspects of immune system behaviour and allow for stochastic updating. Thus, for a 3D lattice of size l, L=l3 sites will be randomly updated each MC run. To produce good statistics, several MC runs for each initial situation of seeding are common. The codes considered in this article [16], [17] typically average over 10 runs which is a good compromise considering both statistics and the increase in execution time. A simple MC–CA HIV simulation structure can then be represented as in Fig. 1.

The nrun loop is the one that repeats the overall simulation several times to ensure good statistics. The nMonteCarloSteps is the loop realising a complete MC simulation, the number of steps determining the evolution period of the simulation system. The allSites loop is the heart of the simulation, in which all sites will be randomly updated. For typical MC HIV simulation:

  • nrun1.

  • nMonteCarloSteps1000.

  • allSites=latticesize10.

Usually these three are chosen as large as possible, and are only limited by available computational power and memory. In a typical HIV-positive person, up to 1010 viruses are produced daily, with 108 new cell infections and 107 virus mutations [18], [19], and a realistic representation of these orders of magnitude are required from the simulation model. Hence, for 2D simulation, the site can be updated from 105 to 1012 times, in 3D, from 106 to 1015 times.

The allSites loop presented here is a basic element; in fact the full update of the lattice during an MC step may require more than one loop that increases further the amount of computation to be done.

Therefore, particularly in models of biosystems such as the immune response, which require large-scale simulation, memory and speed issues are crucial. In this paper some optimisation and parallelisation solutions are presented for existing MC HIV simulation code [16], [17]. These models and corresponding code for these two examples will be referred to as A and B, respectively, in what follows.

Section snippets

Description of the CA models and MC methods considered

Numerous CA models have been built to simulate immune system response to HIV infection. Early models of cell-mediated immune response were based on three key cell types, e.g. Pandey and Stauffer (PS1) [20], Pandey (P1) [21], Kougias and Schulte (KS1) [22], considering the helper cell (H), the cytotoxic killer cell (C) and the viral-infected cell (V). Further cell-mediated models include the five cell Pandey and Stauffer model (PS2) [23] or eight cell Pandey (P2) [24]. Other variants were

Choices of optimisation technique

In this section, we discuss optimisation that can be done on existing code. Of course these are not universal rules as optimisation has proven to be implementation-dependent, with each particular problem being optimisable in a different way. In fact, working on existing code requires us to respect existing algorithms in order to preserve the underlying model and obtain similar results to those of the original code. Even if hardware-optimised code in assembly language offered the best solution

Choices of parallelisation techniques

A look back at the MC loop described in the Introduction, Fig. 1, gives clues on how to parallelise the computation. Due to the independence of each run (each one being performed on a newly seeded lattice with a different sequence of random updates), the nrun loop can be made parallel. This is relatively easy to do, giving each processor a number of runs to execute, then gathering the data of each computation at the end. The nMonteCarloSteps loop cannot be parallelised as each step is

Optimisation efficiency

To obtain an estimate of efficiency, we use a form of speed-up measure, which simply compares execution time of original code and optimised codes:S=TorigTopt.In the case of code A [16], the code optimisation techniques used were the following:

  • Code was rewritten in C, in order to improve optimisation capability.

  • Compiler optimisation -O3 was used, as it was found to be most efficient, -O4 and -O5 giving no further improvement.

  • Four Boolean L2 lattices were replaced by a single char L2 lattice,

Discussion

The optimisation issues described above generalise to many problems, although implementation details vary and are typically problem-specific. In studies on the various types of disorder in materials, for example, the natural length scale is that of the basic constituents of the material and in most solids this is microscopic, with disorder usually topological, while in fibrous composites (mesoscopic scale), disorder depends on fibre orientation and distribution, which leads at larger scale to

Conclusion

The objective of the optimisation and parallelisation of HIV infection simulation code is to find optimal ways to reduce computation time of CA and MC simulations. While this article is not meant to give general optimisation and parallelisation techniques for biocomputational simulations, it does suggest the most effective parallelisation method for parallel execution of CA and MC codes. Methods using spatial parallelisation have a big communication overload as the exchanges done to update each

Acknowledgements

The work described in this paper was supported by a grant from the National Institute for Cellular Biotechnology (NICB) under the Higher Education (HEA) Programme for Research in Third Level Institutions (PRTLI). I would like to thank my supervisors Prof. Heather Ruskin and Dr. Martin Crane for all their patience and advice.

Damien Hecquet received his M.Sc. in Computing (2005) in the Institut National des Sciences Appliquées de Lyon, France. He is currently working as an Information System Software Designer for Capgemini.

References (36)

  • I.M. Roitt

    Essential Immunology

    (1994)
  • J.V. Neumann
  • S. Wolfram

    Cellular Automata and Complexity: Collected Papers

    (1994)
  • H. Gutowitz

    Cellular Automata: Theory and Experiment

    Published in Physica D

    (1990)
  • T. Toffoli

    Cellular automata as an alternative to (rather than an approximation of) differential equations in modeling physics

    Physica D

    (1994)
  • L. KePing et al.

    Cellular automaton model for railway traffic

    J. Comput. Phys.

    (2005)
  • N. Metropolis et al.

    The Monte Carlo method

    J. Am. Stat. Assoc.

    (1949)
  • N. Metropolis

    The beginning of the Monte Carlo method

    Los Alamos Sci.

    (1987)
  • Cited by (8)

    View all citing articles on Scopus

    Damien Hecquet received his M.Sc. in Computing (2005) in the Institut National des Sciences Appliquées de Lyon, France. He is currently working as an Information System Software Designer for Capgemini.

    Heather J. Ruskin received her B.Sc. degree in Physics and M.Sc. in Medical Statistics from London University (Kings/London School of Hygiene and Tropical Medicine) and her Ph.D. in Statistical and Computational Physics from Trinity College Dublin. She is currently a Professor in the School of Computing and Associate Dean of Research in Engineering and Computing in Dublin City University. Her research interests include Computational Models for Complex Systems; spatiotemporal processes and many-body problems in biosystems (biomimetics) and in socioeconomic systems (traffic and finance).

    Martin Crane received his B.A. and B.A.I. (Mechanical Engineering) degrees from Trinity College Dublin in 1989 and his Ph.D. from the same institution in 1993. He has worked in a variety of areas of Computational Science such as CFD, Combustion Modelling, Financial Data Analysis and, more recently, Systems Biology. He has been a College Lecturer in Dublin City University since 1999.

    View full text