A hybrid harmony search algorithm for ab initio protein tertiary structure prediction

Abual-Rub, Mohammed Said; Al-Betar, Mohammed Azmi; Abdullah, Rosni; Khader, Ahamad Tajudin

doi:10.1007/s13721-012-0013-7

A hybrid harmony search algorithm for ab initio protein tertiary structure prediction

Original Article
Published: 04 July 2012

Volume 1, pages 69–85, (2012)
Cite this article

Download PDF

Network Modeling Analysis in Health Informatics and Bioinformatics Aims and scope Submit manuscript

A hybrid harmony search algorithm for ab initio protein tertiary structure prediction

Download PDF

Mohammed Said Abual-Rub¹,
Mohammed Azmi Al-Betar^2,3,
Rosni Abdullah³ &
…
Ahamad Tajudin Khader³

1706 Accesses
16 Citations
Explore all metrics

Abstract

Predicting the tertiary structure of proteins from their linear sequence is a big challenge in biology. The existing computational methods are not powerful enough to search for the precise structure in a huge conformational space. This inadequate capability of the computational methods, however, is a major obstacle when trying to tackle this problem. The observations of some previous studies have revealed much interest in hybridizing a local search-based metahuristic algorithm within the population-based metahuristic algorithm. This study introduces a hybrid harmony search algorithm (HHSA) as a means to solve ab initio protein tertiary structure prediction problem. In HHSA, the iterated local search (ILS) is incorporated with the harmony search algorithm (HSA) to empower it so as to find the local optimal solution within the search space of the new harmony. Furthermore, the global-best concept of particle swarm optimization (PSO) is incorporated in memory consideration as a selection scheme to accelerate the convergence speed. The HHSA predicts the tertiary structure of a protein giving its sequence alone (i.e., from scratch). Our algorithm converges faster than the classical harmony search algorithm. We evaluate our algorithm using two protein sequences. The results show that our algorithm can find more precise solutions than other previous studies.

An Improved Harmony Search Algorithm for Protein Structure Prediction Using 3D Off-Lattice Model

Protein structure prediction based on particle swarm optimization and tabu search strategy

Article Open access 23 August 2022

A Comparative Study of Various Meta-Heuristic Algorithms for Ab Initio Protein Structure Prediction on 2D Hydrophobic-Polar Model

1 Introduction

Bioinformatics refers to the field concerned with the analysis of biological information including link prediction and classification (Almansoori et al. 2012), detecting disease (Tang et al. 2012), and others using computers and statistical techniques. Predicting the three-dimensional structure of a protein from its linear sequence is currently a great challenge in computational biology . The problem can be described as the prediction of the three-dimensional structure of a protein from its amino acid sequence or the prediction of a protein’s tertiary structure from its primary structure. There are two categories of methods for protein structure prediction: experimental and computational. The two main experimental methods available for protein structure prediction are X-ray crystallography and nuclear magnetic resonance (NMR). Unfortunately, these methods are not efficient enough being both expensive and time-consuming (Abual-Rub and Abdullah 2008). There are currently three main categories of computational methods for protein structure prediction. These categories depend mainly on the percentage of similarity of the input protein sequence with other existing sequences in the database. The first is homology modeling—also known as comparative modeling. It is used when there is a similarity between the target sequence and the sequences that already exist in protein database (Chothia and Lesk 1986). The second is fold recognition—also known as protein threading, which is an inverse of the protein folding problem. It is based on the fact that the number of the new folded protein structure is not growing fast compared to the number of new protein sequences, which leads to the observation that any new predicted structure will be almost folded to an existing structure in the database. The third computational prediction category is ab initio modeling. It seeks to predict the tertiary structure of a protein from its amino acid sequence alone—without any knowledge of similar folds. Ab initio—also known as de novo modeling, free modeling, or physics-based modeling (Lee et al. 2009)—is based on the thermodynamic hypothesis which states that the tertiary structure of the protein is the conformation with the lowest free energy (Anfinsen 1973). Ab initio modeling, however, is challenging for the following reasons. First, there is a huge number of proteins that have no homology with any of the known structure proteins. Second, some proteins which show high homology with other proteins have different structures. Third, comparative modeling does not offer any perception of why a protein adopts a specific structure (Helles 2008). A successful ab initio method for protein structure prediction depends on a powerful conformational search method to find the minimum energy for a given energy function. molecular dynamics (MD), Monte Carlo (MC) and genetics algorithm (GA) are common methods to explore protein conformational search space.

A recent meta-heuristic population-based optimization algorithm, which mimics the improvisation process in the musical context (Geem et al. 2001), is a harmony search algorithm (HSA). It has special advantages in comparison with traditional optimization techniques: it requires fewer mathematical requirements without initial value settings for decision variables; it considers all the existing vectors to generate a new vector, whereas the methods like genetic algorithm (GA) only considers the two parent vectors, and HSA does not need to encode and decode the decision variables into binary strings (Mahdavi and Abolhassani 2009). These advantages enable it to be successfully used for a wide variety of optimization problems such as RNA secondary structure prediction (Mohsen et al. 2010), timetabling (Al-Betar and Khader 2012; Al-Betar et al. 2010a, b, c), Structural Engineering (Saka et al. 2011), and many others as overviewed by Ingram and Zhang (2009); Alia and Mandava (2011). Furthermore, the structure and performance of the HSA are under development to be adopted for the ongoing challenges. It is hybridized with other successful optimization techniques such as GA (Nadi et al. 2010), particle swarm optimization (PSO) (Omran and Mahdavi, 2008) and Hill climbing (Al-Betar et al. 2012b). Furthermore, its parameter is deterministically adaptive during the search (Mahdavi et al. 2007; Geem and Sim, 2010; Pan et al, 2010; Alatas, 2010). Quite recently, there have been some mathematical analysis studies to investigate the exploratory power of HSA (Das et al. 2011; Al-Betar et al 2012a).

The main objective of this paper is twofold: (1) to adapt HSA for ab initio protein tertiary structure prediction (PPSP) which can be set as an initial study to apply this algorithm for this problem (henceforth called adaptive harmony search algorithm, AHSA), (2) to hybridize iterated local search (ILS) within the process of the AHSA to improve its local exploitation and incorporate global-best concept of PSO in the memory consideration to improve the convergence speed (henceforth called hybrid harmony search algorithm, HHSA). Using a well-studied benchmark established for PPSP, the results show that the AHSA is, by comparison, able to competitively provide a good quality solution. Interestingly, HHSA is able to yield more precise results than those of the comparative methods.

2 Materials and methods

2.1 Problem description

The PSPP considered in this paper is the ab initio protein tertiary structure prediction, which predicts the tertiary structure of a protein from its amino acids sequence alone. It is based on the thermodynamic hypothesis (Anfinsen 1973) which states that the tertiary structure of the protein is the conformation with the lowest free energy. Thus, the PSPP can be formulated as an optimization problem whose basic objective is to find the conformation that has the lowest energy of the protein. The most important task in solving the protein structure prediction problem using an optimization algorithm is to choose an applicable representation of the conformation and a suitable energy function. The problem modeling is introduced based on these two factors in the following two sections.

2.2 Problem modeling

2.2.1 Problem representation

There are many common representations of polypeptide chains such as (Cutello et al. 2006): Cα coordinates, all-heavy-atom coordinates, all-atom three-dimensional coordinates, backbone atom coordinates + side-chain centroid, and backbone and side-chain torsion angles.

The most detailed representation is the one that includes all atoms of the protein. It is worth mentioning that representing these all atoms with their interactions is computationally expensive though not essential during the search process (Chivian et al. 2003). Hence, to reduce the computational time and space, many researchers, such as Levitt (1976), Abagyan and Maiorov (1988), Hinds and Levitt (1992), Baker (2000), and Dudek and Objects (2007) used a simplified representation. This research, likewise, uses a simplified representation of the polypeptide chain; namely, backbone and side-chain torsion angles representation based on the fact that each residue type requires a fixed number of torsion angles to fix the three-dimensional coordinates of all atoms.

The protein is represented in this research as a vector of amino acids, ${\user2{x}} = ({\user2{x}}_1, {\user2{x}}_2, \ldots, {\user2{x}}_M),$ where ${\user2{x}}_1=\Upphi_{1}({1}), (\Uppsi_{1}({1}), \omega_{1}({1}), x_{1}({1}), x_{1}({2}),\ldots ,x_{1}({N})), {\user2{x}}_2=(\Upphi_{2}({1}), \Uppsi_{2}({1}), \omega_{2}({1}), x_{2}({1}), x_{2}({2}),\ldots, x_{2}({N})),\ldots,{\user2{x}}_M=(\Upphi_{M}({1}), \Uppsi_{M}({1}), \omega_{M}({1}), x_{M}({1}), x_{M}({2}), \ldots, x_{M}({N})).$ However, AHSA deals with this vector as a vector of torsion angles ${\mathbf{x}}_i=(x(1), x(2),\ldots,x(N)),$ where N is the number of torsion angles in the protein, and each angle (x(i)) in this vector can be assigned with a value within the range [−π, π]. Therefore, the solution length is equal to the number of torsion angles in the protein. The amino acid consists of two parts: (1) main chain angles ($\Upphi, \Uppsi, \omega$) and (2) side chain angles ${x}_i=x_i(1), x_i(2),\ldots,x_{{M}}(N)$). Each amino acid essentially includes the main chain, while the number of side chain angles depends on the amino acid type. In other words, this research deals with the side chain vector of amino acids each assigned with a particular torsion angle ${\mathbf{X}}=(x(1), x(2),\ldots,x({N})),$ where N is the number of torsion angles in the protein, and each angle ($x_i(j)$) in this vector can be assigned with a value within the range [$-\pi,\pi$].

2.2.2 Energy function

There are many well-known physics-based force fields including: CHARMM by Brooks et al. (1983), AMBER by Weiner et al. (1984), and OPLS by Jorgensen and Tirado-Rives (1988). Unfortunately, these force field packages are complex and difficult to modify by user to accommodate his/her own algorithm (Eisenmenger et al. 2006). Moreover, many researches need to determine an energy function that can be easily modified and adapted to specific needs of the user; such energy function also needs to be efficient for the evaluation studies. Therefore, the energy function used in this research is simple molecular mechanics for proteins (SMMP), which is a modern package for simulation of proteins. This force field package has been established by Eisenmenger et al. (2001) and has been revised by Eisenmenger et al. (2006). This research uses the revised version.

Arguably, many reasons can be given for such use: First, the program is fast and may be successfully exploited even on a single PC. Second, its code is free and has an open source. Third, the code is simple and can be modified and adapted by users to meet their specific needs. Fourth, the program does not contain any machine-dependent routines. Finally, it has been tested in many simulations of small peptides and has been proved to be a convenient and effective tool for numerical investigations of proteins (Eisenmenger et al. 2006). In SMMP, the protein molecule is described by the set of internal coordinates, in which the dihedral angles ($\Upphi, \Uppsi, \omega$) that describe rotations around the chemical bonds in the backbone of the amino acids, and the dihedral angles ${\user2{X}}_i$ in the side chains, are flexible. A set of energy minimization routines are used based on ECEPP force field with two different parameter sets to calculate the internal energy: ECEPP/2 potential and ECEPP/3 potential.

SMMP used the following energy function (Eisenmenger et al. 2001):

$$ \min {\qquad f({{\mathbf{x}}})=E_{\rm{LJ}}+E_{\rm{el}}+ E_{\rm{hb}}+ E_{\rm{tors}}} $$

(1)

where

$$ E_{\rm{LJ}}= \sum_{j>i}{\left(\frac{{A}_{ij}}{r^{12}_{ij}}-\frac{{B}_{ij}}{r^{6}_{ij}}\right)} $$

(2)

$$ E_{\rm{el}}= 332\sum_{j>i}{ \frac{q_i \times q_j}{\varepsilon \times r_ {ij}}} $$

(3)

$$ E_{\rm{hb}}= \sum_{j>i}{\left(\frac{{C}_{ij}}{r^{12}_{ij}}-\frac{{D}_{ij}}{r^{10}_{ij}}\right)} $$

(4)

$$ E_{\rm{tors}}= \sum_{n}{U_n(1\pm \cos(k_n \times \varphi_n))} $$

(5)

where r _ij refers to the distance in {Å between atoms i and j while {A _ij, {B _ij, {C _ij, {D_ij are the empirical potential parameters. The two variables q _i and q _j refer to the partial charges in the atoms i and $j, \varepsilon$ is the dielectric constant of environment; it is recommended to be $\varepsilon=2.$ The factor (332) in (3) used to describe the energy in kcal/mol. U _n is the energetic torsion barrier of rotation about the bound n, and k _n is multiplicity of the torsion angle $\varphi_n.$ It is important to note here that all torsion angles (main chain + side chain) contribute to this formula. In the end, energy function is a function of N torsion angles for a given protein formula.

2.3 Harmony search algorithm

Harmony search algorithm (HSA) is a soft computing metaheuristic algorithm inspired by the improvisation process of musicians. In a musical improvisation process, a group of musicians play the pitch of their musical instruments seeking a perfect harmony as estimated by esthetics standard. Similarly, in the optimization context, this process is formulated as follows: a set of decision variables assigned by values seeking for a global optimal solutions as evaluated by an objective function. More details about HSA can be found in (Al-Betar and Khader 2012; Al-Betar et al. 2012b.)

HS procedure has five main steps which will be described as follows:

Step 1. Initialize the problem and HSA parameters:

The optimization problem can be modeled as:$\min f({\user2{x}})$ s.t. $x(i)\in {\bf X}(i) .$ where $f({\user2{x}})$ is the objective function, ${\user2{x}}=\{x(i)|i=1,\ldots,N\}$ is the set of decision variables, and N is the number of decision variables. ${\bf X}=\{{\user2{X}}_i|i=1,\ldots,N\}$ contains all the possible values of each decision variable, i.e., $ {\rm LB}(i) \leq {\user2{X}}(i) \leq {\rm UB}(i),$ where LB(i) and UB(i) are lower and upper bound values for x(i).

The parameters of the HSA required to solve the optimization problem are also specified in this step:

Harmony memory size (HMS) which determines the number of initial solutions.
Harmony memory consideration rate (HMCR) which is used to determine whether the value of a decision variable is to be selected from the accumulative search or randomly from its possible range.
Pitch adjustment rate (PAR) which decides whether the decision variables are to be adjusted to a neighboring value or not
Number of improvisations (NI) which is equivalent to the number of iterations in other iterative improvement methods.

Step 2. Initialize the harmony memory.

The harmony memory (HM) is a memory location which stores all the solution vectors determined by HMS. These solution vectors are randomly generated as $x_j(i)= {\rm LB}(i)+ U(0,1) ({\rm UB}(i)-{\rm LB}(i)), \forall i\in(1,2,\ldots, N)$ and $\forall j\in(1,2,\ldots, {\rm HMS}),$ where U(0,1) generate a uniform random number between 0 and 1. These vectors will be sorted in ascending order according to their objective function values [see (6]).

$$ {\rm{HM}}= \left[ \begin{array}{llll} x_{1}(1) & x_{1}(2) & \cdots & x_{1}(N)\\ x_{2}(1) & x_{2}(2) & \cdots & x_{2}(N)\\ \vdots & \vdots & \ddots & \vdots \\ x_{\rm{HMS}}(1) & x_{\rm{HMS}}(2) & \cdots & x_{\rm{HMS}}(N)\\ \end{array} \right] $$

(6)

Step 3. Improvise a new harmony:

In this step, the HSA generates (or improvises) a new harmony vector, ${\user2{x}}^{\prime} = (x^{\prime}_1, x^{\prime}_2,\cdots , x^{\prime}_N),$ based on three mechanisms: (1) memory consideration, (2) random consideration, and (3) pitch adjustment.

1.
Memory consideration: in memory consideration, the value of the first decision variable $x^{\prime}(1)$ is randomly assigned from the historical values stored in HM vectors such that $x^{\prime}(1) \in {\{x_1(1),x_2(1),\ldots,x_{\rm HMS}(1)\} }.$ Values of the other decision variables, $(x^{\prime} (2) , x^{\prime}(3), \ldots ,x^{\prime}(N) ),$ are sequentially assigned in the same manner with probability of HMCR (HMCR $\in (0,1)$). The operation of this operator is similar to the recombination operator in other population-based methods and is a good source of exploitation (Yang 2009).
2.
Random consideration: random consideration is functionally similar to the mutation operator in Genetic Algorithm (Yang 2009); it is a source of global exploration in HSA. In random consideration, the decision variables that are not assigned with values according to memory consideration are assigned with random values using random consideration with a probability of (1-HMCR) according to their possible range, as illustrated in (7).
$$ x^{\prime}(i)\leftarrow \left\{\begin{array}{ll} \in \{x_1(i), \ldots, x_{\rm{HMS}}(i)\} & U(0,1)\leq \rm{HMCR} \\ \in {\user2{x}}(i) & {\rm otherwise} \end{array}\right. $$
(7)
Note that HMCR and PAR are the main parameters used to control the improvisation process. The HMCR parameter is the probability of assigning one value of a decision variable, $x^{\prime}_i,$ based on historical values stored in the HM. For example, if HMCR = 0.80, this means that the probability of assigning the value of each decision variable from historical values stored in the HM vectors is 80 %, while the probability of assigning the value of each decision variable randomly from its possible value range is 20 %.
3.
Pitch adjustment: the value of every decision variable $x^{\prime}_i$ of a new harmony vector, ${\user2{x}}^{\prime}=(x^{\prime}_1,x^{\prime}_2,x^{\prime}_3,\ldots,x^{\prime}_N),$ that has been assigned with a value using memory consideration, is examined to determine whether or not it should be pitch adjusted with the probability of PAR (0 ≤ {PAR ≤ 1) as follows:
$$ \hbox {Adjust $x^{\prime}(i) $?} \leftarrow \left\{\begin{array}{ll} \hbox {Yes} & U(0,1)\leq \rm{PAR} \\ \hbox {No} & {\rm otherwise} \end{array}\right. $$
(8)
If the pitch adjustment decision for $x^{\prime}(i)$ is Yes, the value of $x^{\prime}(i)$ is adjusted to its neighboring value as follows:
$$ x^{\prime}(i)= x^{\prime}(i) \pm U(0,1)\times \rm{BW} $$
(9)
where BW is a parameter (distance bandwidth) for continuous optimization problems (e.g., PPSP) which normally takes a value in advance and remains constant during the search. The BW = 0.01 is recommended (Omran and Mahdavi 2008).

Step 4. Update the harmony memory:

If the new harmony vector, ${\user2{x}}^{\prime} = (x^{\prime}(1), x^{\prime}(2),\ldots , x^{\prime}(N)),$ is better than the worst harmony vector in harmony memory, the new harmony vector replaces the worst harmony vector.

Step 5. Check the stop criterion:

HS algorithm will repeat steps 3 and 4 until maximum number of improvisations determined by NI is met.

Algorithm 1 describes the pseudo-code of HS algorithm.

3 Method

This section provides a description for adapting harmony search algorithm (AHSA) and how the HMCR and PAR parameters are iteratively updated. Thereafter, The way of improving the AHSA is presented by proposing HHSA that incorporates Iterative local search and global best concept of PSO in AHSA.

3.1 AHSA for ab initio PSPP

The protein structure is initialized for the harmony search optimization as follows: For a particular protein sequence which is picked from the protein data bank, some parameters are extracted from the data base. These parameters comprise the number of amino acids (M) and the number of torsion angles (N). For example, for the two proteins experimented in this research; the ‘Met-enkephalin’ has 5 amino acids and 24 torsion angles, while ‘1CRN’ has 46 amino acids and 238 torsion angles. The solution is then represented as a vector of amino acids, ${\user2{x}}=({\user2{x}}_1, {\user2{x}}_2, \ldots, {\user2{x}}_{M}),$ where ${\user2{x}}_1=(\Upphi_{1}({1}), \Uppsi_{1}({1}), \omega_{1}({1}), x_{1}({1}), x_{1}({2}),\ldots ,x_{1}({N})), {\user2{x}}_2=(\Upphi_{2}({1}), \Uppsi_{2}({1}), \omega_{2}({1}), x_{2}({1}), x_{2}({2}),\ldots, x_{2}({N})), \ldots,{\user2{x}}_{M}=(\Upphi_{M}({1}), \Uppsi_{M}({1}), \omega_{M}({1}), x_{M}({1}), x_{M}({2}), \ldots, x_{M}({N})).$However, AHSA deals with this vector as a vector of torsion angles ${\mathbf{x}}_i=$(x(1), x(2), …, x(N)), where N is the number of torsion angles in the protein, and each angle (x(i)) in this vector can be assigned with a value within the range [−π, π]. Therefore, the solution length is equal to the number of torsion angles in the protein.

The Harmony Memory (HM) is initialized with random vectors as determined by HMS. A different random seed is used to generate torsion angles randomly within the range [−π, π]. The objective function $f({\user2{x}})$ in (1) is utilized to calculate the energy value for each vector of torsion angles in HM. The vectors (solutions) in HM are sorted in ascending order based on their energy values, such as $f({\user2{x}}_1) \leq f({\user2{x}}_2)\leq \ldots \leq f({\user2{x}}_{{\rm HMS}}).$ In improvising a new harmony step, the AHSA generates a new harmony vector, ${\user2{x}}^{\prime}=(x^{\prime}(1), x^{\prime}(2),\ldots,x^{\prime}(N)),$ based on three operators discussed in Sect. 2.3: (1) memory consideration, (2) random consideration, and (3) pitch adjustment.

If the new torsion angles vector, ${\user2{x}}^{\prime} = (x^{\prime}(1), x^{\prime}(2),\ldots, x^{\prime}(N)),$ has better energy than the worst harmony vector in HM, the new vector replaces the worst one in HM. This process is repeated until the maximum number of improvisations is met. At the end of the improvisations, the AHSA passes the torsion angles of the best vector to a procedure to represent the structure of this vector that will be stored in a protein structure data base.

It has to be noted that the HSA iterates toward the optimal solution using two main parameters, HMCR and PAR. The optimization process should consider the balance between exploration and exploitation concepts; memory consideration is the source of exploitation in HSA while the random consideration is the source of exploration. Increasing HMCR leads the search to tend toward exploitation (Yang 2009). On the other hand, the pitch adjustment operator performs a set of random local changes for the torsion angles that are assigned by values based on memory consideration. Therefore, the higher the PAR value is, the more the search tends toward the exploration.

In optimization, previous theories indicated that the search should concentrate on the exploration in the early stage of search, while at the final stage, it should concentrate on the exploitation (Blum and Roli 2003). This is the very same idea upon which simulated annealing metaheuristic algorithm is built. In the searching process, the simulated annealing algorithm accepts not only better but also worse neighboring solutions with a certain probability based on two factors: the energy value of the current solution and the temperature (Wang et al. 2001). During the search, simulated annealing reduces the chance of accepting the worse solution until reaching the final stage of search by reducing the value of temperature parameter linearly. In the final stage of search, simulated annealing concentrates on the search space of current solution by means of accepting only the downhill (or better) moves. This idea of simulated annealing is utilized in the proposed AHSA.

In HSA shown in Sect. 2.3, the HMCR and PAR are assigned their values in advance while remaining constant during the search process. However, HMCR and PAR help the algorithm find globally and locally improved solutions, respectively (Lee and Geem 2005). After applying the AHSA to protein structure prediction problem, and after testing different values of HMCR and PAR, it has been observed that when the value of HMCR is high (i.e., 0.99), the AHSA obtains good results and fast convergence rate, because the probability to select the new value from the harmony memory (which has already improved values) will be high. However, this will cause the HSA, most often, to get stuck in local optima because the power of exploration is low. On the other hand, a smaller HMCR value can avoid a premature convergence, but the search will be slow. Moreover, for PAR parameter, it is clear that when the PAR value is high, the value selected from harmony memory will be adjusted with greater chance; while when the PAR value is less, the probability to change this value will be less. Therefore, when the PAR value is small, the results will be better but will also cause the HSA to get stuck in local optima. Moreover, if the value of HMCR is small (i.e., 0.50) and the value of PAR is high (i.e., 0.50), the HSA will not obtain good results.

These observations lead this research to propose a new mechanism for controlling the values of HMCR and PAR. The new mechanism is to update the values of HMCR and PAR dynamically during the search process, rather than fix them in the initial step. After a series of experiments, this study has identified the following assumptions:

Assigning a small value to HMCR and a high value to PAR in the first stage of the search increases the power of exploration of the search space and increases diversity of the solutions in HM.
Increasing the value of HMCR and decreasing the value of PAR gradually during the search process increases the exploitation. Note that exploration is useful in the first stage of search, while exploitation is more useful in the final stage of search.

The AHSA proposes using two values of HMCR; HMCR_min (minimum value of HMCR that the AHSA starts with) and HMCR_max (maximum value of HMCR that the AHSA ends with), and two values of PAR; PAR_min (minimum value of PAR that the AHSA ends with) and PAR_max (maximum value of PAR that the AHSA starts with). However, all these four parameters are initialized in the first step of the AHSA.

Then, the AHSA updates the values of HMCR and PAR dynamically during the search process as follows.

$$ {\rm{HMCR}}_{i+1}= {\rm{HMCR}}_{i}+ (1-{\rm{HMCR}}_{i}) \times \left(1-e^{(-\frac{|(f({\user2{x}}_{\rm{best}})|}{t_i})}\right) $$

(10)

$$ {\rm PAR}_{i+1}={\rm PAR}_{i} \times e^{\left(-\frac{|(f({\user2{x}}_{\rm{best}})|}{t_i}\right)} $$

(11)

where i is the iteration number, ${\user2{x}}_{\rm{best}}$ is the best solution in HM that has the lowest energy value (lowest is best), t _i is a variable equivalent to the temperature in simulated annealing; this variable is reduced linearly by a control variable, α, such that:

$$ t_{i+1} = \alpha t_i $$

where α is a control parameter small but close to 1.

The two previous equations change the values of both HMCR and PAR slowly because they use the best energy value (which is decreased by iterations) in the exponential function. This means that the value of exponential function depends on the best energy of the previous iteration. Therefore, if the value of the best energy is high, the value of the exponential function will be low which means faster change in PAR parameter. When the search proceeds, the energy value will decrease which leads to slower decrease in PAR. However, for HMCR, the change will be slow because the equation uses 1-HMCR to multiply the exponential function. It is important to highlight that the energy starts with very high values for long protein and low values for the short protein. Therefore, to guarantee a slow change in both HMCR, and PAR values, we need to adjust the value of t _i in the denominator of the fraction; the value of t _i is set to a high value for long protein and a smaller value for short protein. This also applies to the value of α, where selecting this value to be a high fraction (close to 1) leads to a slow decrease in t _i, thus, a slow change in both PAR and HMCR.

The above two equations, (10) and (11), are derived based on the experiments and after observing the change of PAR and HMCR values during the optimization process. This way of updating the values of HMCR and PAR throughout the search process is inspired by the Monte Carlo acceptance rule of simulated annealing approach (Kirkpatrick et al. 1983).

Applying the previous way of controlling the values of PAR and HMCR dynamically enables the AHSA to concentrate more on the exploration in the early stages of search by assigning a large value for PAR and a small value of HMCR. Then, during the search, the value of HMCR will be increased exponentially while the value of PAR will be decreased exponentially leading the search to concentrate on exploitation.

In (10), the value of HMCR_min is increased exponentially with iteration number until it reaches the value HMCR_max and in (11), the value PAR_max is decreased exponentially every iteration until it reaches the value PAR_min. Algorithm 2 shows the pseudo code of AHSA steps.

3.2 HHSA for ab initio PSPP

In the AHSA, the memory consideration is responsible for the global improvement. The new harmony can be improved by focusing on the good solutions stored in Harmony Memory by means of the natural selection principle of the ‘survival of the fittest’. Additionally, the source of local improvement in HSA is the pitch adjustment operator that adjusts the value of the variables (i.e., torsion angles) to their neighboring values locally, with the hope to affect the energy function positively. Recall, the pitch adjustment performs a random adjustment that is not guided by energy function. Due to the complex nature of PSPP, the random adjustment cannot be guaranteed to improve the new harmony solution to reach the local optimal solution. This shortcoming in the AHSA has led this research to propose hybridizing a local search method within the AHSA, which is a population-based method. The observations of some previous studies have revealed much interest in hybridizing a local search-based algorithm within the population-based algorithm (Blum and Roli 2003).

Therefore, a HHSA is proposed in this research with two modifications to the AHSA; first, the ILS is incorporated with the AHSA to help find the local optimal solution within the search space of the new harmony. Second, the global-best concept of PSO is incorporated with memory consideration as a selection concept to select the values from the best vector in the harmony memory instead of random selection. These two concepts, ILS and global best of PSO, have enhanced the local exploitation capability and the speed of convergence of AHSA, respectively.

The process of hybridizing the ILS with the AHSA is explained in Sect. 3.2.1 while the global-best memory consideration is explained in Sect. 3.2.2.

3.2.1 Hybridizing with iterated local search

Algorithm 3 illustrates how the ILS is incorporated within the AHSA; the ILS is called after the improvisation step of the adaptive HSA and works as a new operator to fine-tune the new harmony in each improvisation to the local optimal solution in the region to which the HSA converges. The initial solution for ILS is the new harmony solution ${\user2{x}}^{\prime}$ generated by the adaptive HSA operators (i.e., memory consideration, random consideration and pitch adjustment).

The pseudo-code of the ILS function is described in Algorithm 4. During the improvement loop in Line 2, the function $Explore (\mathcal{N}({\user2{x}}^{\prime}))$ navigates the neighboring solutions $\mathcal{N}({\user2{x}}^{\prime})$ of ${\user2{x}}^{\prime},$ and moves to the first neighboring solution ${\user2{x}}^{\prime\prime} \in \mathcal{N}({\user2{x}}^{\prime})$ which has an equal or lower energy value, i.e, $f({\user2{x}}^{\prime\prime})\leq f({\user2{x}}^{\prime}).$ This process is repeated until no further improvement is obtained.

The function $Explore (\mathcal{N}({\user2{x}}^{\prime}))$ explores the search space of ${\user2{x}}^{\prime}$ using the following criteria:

$$ x^{\prime}(i)=x^{\prime}(i) \pm \alpha, \quad \forall i \in (1,2,\ldots,N) $$

where α = the value of bw × U∼(0, 1), bw is an arbitrary distance bandwidth for the continuous variable; and U∼(0,1) generates a uniform distribution number between 0 and 1. Note that bw takes a static value within the range [−π, π]. This is how the ILS provides a guided method for local exploitation.

3.2.2 Hybridizing with global-best memory consideration

The HHSA has incorporated a new parameter called particle swarm rate (PSR) to control the global best memory consideration (GBMC) proposed in HHSA. The GBMC has enhanced the way of selecting the torsion angles values from the solutions stored in HM, and subsequently, enhancing the performance of the HHSA. During the optimization process of HHSA, the new value is selected from either the memory with a probability of HMCR, or randomly with a probability of 1- HMCR. The new parameter PSR is used within the probability of selecting the new value from the harmony memory (i.e., within HMCR). So, instead of picking the value randomly from any vector in the harmony memory, as usual in the classical harmony search, it will be rather selected either from any vector in the harmony memory with a probability of PSR or from the best solution in harmony memory with a probability of (1-PSR); Algorithm 5 describes this procedure within the memory consideration operation of the improvisation step. This new operation has increased the capability of HHSA in terms of the convergence rate.

4 Results and discussion

4.1 Benchmark

The AHSA and HHSA have been evaluated using two protein sequences publicly available at ‘www.pdb.org’; the first protein sequence is called ‘Met-enkephalin’ (or 1 PLW as in PDB). It has been first identified from the enkephalin mixture of brains,and is involved in a variety of physiological processes. This sequence is one of the most used model peptides; it has a short residue sequence of five amino acids: Tyr1-Gly2-Gly3-Phe4-Met5 consists of 75 atoms described by 24 independent backbone and side chain dihedral angles. It is, therefore, used to evaluate many state-of-the-art methods to which the three algorithms proposed in this thesis will be compared. Although it is a small peptide, Met-enkephalin needs a complex conformational space with a total number of more than 10¹¹ local minima (Li and Scheraga 1988). Met-enkephalin has been extensively studied computationally and has been regarded as a benchmark model,and has also been used frequently for testing simulation methods in recent years because of the complexity in its configuration space and the short sequence allowing significant computational studies (Zhan et al. 2006).

The second sequence, namely ‘1CRN’, is a plant seed protein which has 46 amino acids consisting of 238 angles.

4.2 Experimental design

A series of experiments have been conducted to measure the influence of different parameters on the performance of the proposed HHSA. Twelve cases of different parameter settings have been selected, as indicated in Table 1. Each case has been repeated for 30 runs for both benchmark protein sequences. The first six cases have smaller HMS values than the last 6. This is to show the impact of HMS on the behavior of HHSA.

Table 1 Cases used to evaluate the HHSA convergence ability

Full size table

The HMCR_min is selected in different cases with large (i.e., 0.90) and small (i.e., 0.50) values. Recall, the HMCR is the rate of selecting the solution from the harmony memory. The more the value of HMCR is, the more the exploitation increases and the exploration decreases. When HMCR_min has a small value, the algorithm considered the exploration with minimal exploitation at the initial stage of search. In contrast, large HMCR_min leads to better exploitation in the initial stage of search.

PAR is the rate of using pitch adjustment operator. The values of PAR_min and PAR_max are fixed throughout the whole convergence cases. The PAR_max initially takes a high value and is decreased exponentially to reach a high exploitation power at the final stage of search. PAR_min is fixed to 0.05 and PAR_max is fixed to 0.25 for all the 12 convergence cases.

The PSR is the rate of selecting the value of the angles either from the best vector having the lowest energy value in harmony memory, or randomly from any vector in harmony memory. Three different values of PSR have been experimented with: low (i.e., 0.10), medium (i.e., 0.50), and high (i.e., 0.90). The PSR of value = 0.10 means that the memory consideration will select the value of an angle randomly from any solution in harmony memory with a probability of 10 %, and from the best vector in harmony memory with a probability of 90 %.

As for the number of improvisations (NI), it is fixed in all the 12 cases to NI = 100,000 and HMCR_max = 0.99.

4.3 Experimental results

Table 2 shows the best, average, worst, and standard deviation of the energy values obtained of both ‘Met-enkephalin’ and ‘1CRN’ for the 12 cases after 30 runs for each case. The best results are highlighted in bold while the best averages are highlighted in italic.

Table 2 Results of HHSA for 30 runs of the 12 cases

Full size table

The boxplots in Figs. 1 and 2 visualize the distribution of the energy values obtained in the 30 runs of the 12 convergence cases for both ‘Met-enkephalin’ and ‘1CRN’, respectively. The boxplot summarizes the following statistical measures: median, upper and lower quartiles, and minimum and maximum energy values. In the boxplot, the larger the range of the first and third quartile are, the worse the searching accuracy of the convergence case is.

4.4 Discussion

The experimental results in Table 2 and the Boxplots in Figs. 1 and 2 describe the behavior of the HHSA during the search process for all the 12 convergence cases. This section provides a discussion of the behavior of the HHSA based on Table 2 and Figs. 1, 2.

Cases 7–12 are similar to cases 1–6 with different HMS values; while the HMS in the first six cases is set to a small value (i.e 10), it is set to 50 in the last six cases; this is meant to show the impact of HMS on the behavior of the algorithm. In general, the cases with HMS = 10 have obtained better results than those with HMS = 50 while fixing the other parameters. The experiments also show that for the short protein, ‘Met-enkephalin’, the cases with PSR = 0.10 have recorded better results than those with PSR = 0.50 or 0.90. Whereas, for the longer protein, ‘1CRN’, the cases with PSR = 0.90 have recorded better results than those with PSR = 0.10 or 0.50. Recall, PSR = 0.10 means that the torsion angle is selected from the best solution in harmony memory with a probability of 90 %.

Both Table 2 and the Boxplot in Fig. 1 indicate that cases 4 and 5 have obtained the best results for the ‘Met-enkephalin’ protein. Although cases 1, 7, and 10 have obtained the lowest energy recorded for the Met-enkephalin, the boxplot indicates that cases 4 and 5 are better because most of the energy values are within the range −11.3 and −12, and there are not many extreme values far from the average value; Besides, both cases 4 and 5 have obtained better averages with an advantage of case 5 over case 4. For ‘1CRN’, both Table 2 and the Boxplot in Fig. 2 indicate that cases 5 and 6 have obtained the best energy values with an advantage of case 6 over case 5; the distribution of case 6 is better than case 5 because most of the energy values are within the range −140 and −170; besides, the average of case 6 is better than the average of case 5.

Figures 3 and 4 show the convergence behavior of the HHSA of the 12 convergence cases for ‘Met-enkephalin’ during the first 10,000 improvisations. Note that the energy values of each case are obtained using a random run out of the 30 runs. X axis shows the number of iterations, Y axis shows the energy values of each case, and the trends represent the experimented cases. It is apparent that the convergence rate is very fast in the initial iterations as the trend is reduced sharply for almost all cases. However, the convergence rate is gradually reduced throughout the 10,000 improvisations until the equilibrium state is reached. Note that the cases 6, 10, and 11 have obtained the most desired results, and they have almost converged to a similar point in the plot.

Figures 5 and 6 show the convergence behavior of the HHSA of the 12 convergence cases for the longer protein, ‘1CRN’, during the first 10,000 improvisations. The convergence rate is very fast in the initial iterations because the initial energy values are obtained randomly. However, the convergence rate is gradually reduced in the early improvisations until the equilibrium state is reached. For this long protein, cases 5 and 6 have converged faster to the lowest energy values and that is because they have high HMCR values (i.e., 0.90). Comparing cases 5 and 6 with cases 11 and 12 shows the impact of HMS on the behavior of the HHSA; although cases 5 and 6 differ from cases 11 and 12 in HMS only (while they have same HMCR and PSR values), cases 5 and 6 have converged faster to the lowest energy than cases 11 and 12, respectively. This proves that selecting a smaller value of HMS has obtained better results than a larger HMS.

5 Comparative results

5.1 Comparison between AHSA and HHSA

This section provides a comparison between the results of AHSA and HHSA. Cases 3, 4, 7, and 8 of AHSA and cases 4, 5, 10, and 11 of HHSA are used in this comparison. These cases are chosen because they have almost the same parameter design. In Table 3, the best, average, worst, and standard deviation of the energy values are recorded for the 30 runs of every case. The best energy obtained is highlighted in bold.

Table 3 Comparison results between AHSA and HHSA for the two proteins

Full size table

It is clear from Table 3 that only case 3 of AHSA is able to obtain the best energy value for the small protein, while in HHSA, cases 4, 5, and 10 are able to obtain the best energy value, and case 11 is also very close to best energy value. However, for the longer protein, ‘1CRN’, all the four cases of HHSA have obtained better energy values with comparison to those of AHSA.

Moreover, a t test has been performed to find out whether the results of the adaptive HSA and HHSA are significantly different or not. Tables 4 and 5 summarize the p values of the t test for ‘Met-enkephalin’ and ‘1CRN’, respectively. It is clear that for both the ‘Met-enkephalin’ and ‘1CRN’ proteins, the results of the HHSA and the AHSA approaches are statistically different in favor of HHSA.

Table 4 Independent samples test for ‘Met-enkephalin’

Full size table

Table 5 Independent samples test for ‘1CRN’

Full size table

5.2 Comparison between HHSA and other studies

Tables 6 and 7 show the results of HHSA in comparison with some previous studies. The proposed HHSA is able to record the best results obtained until now which are E = −12.43 by Zhan et al. (2006) based on ECEPP/3 force field and E = −12.91 by Zhan et al. (2006), Meirovitch et al. (1994), and Li and Scheraga (1987) based on ECEPP/2 force field. Moreover, two new global optima energy values of the Met-enkephalin protein has been recorded by HHSA based on ECEPP/3 and ECEPP/2 force fields with ω = 180°; the current energy optimum based on ECEPP/3 force field with ω = 180° is E = −10.90 kcal/mol by Zhan et al. (2006) while HHSA records a new energy value, E = −11.26 kcal/mol. Furthermore, based on ECEPP/2 force field with ω = 180°, the current lowest energy is E = −10.72 kcal/mol, while the HHSA obtains E = −11.57 kcal/mol.

Table 6 The lowest energies of Met-enkephalin (in kcal/mol) obtained by HHSA compared with previous studies based on ECEPP/3 force fields

Full size table

Table 7 The lowest energies of Met-enkephalin (in kcal/mol) obtained by HHSA compared with previous studies based on ECEPP/2 force field

Full size table

Tables 6 and 7 show the results of HHSA in comparison with some previous studies. The proposed HHSA is able to record the best results obtained until now which are E = −12.43 by Zhan et al. (2006) based on ECEPP/3 force field and E = −12.91 by Zhan et al (2006), Meirovitch et al (1994), and Li and Scheraga (1987) based on ECEPP/2 force field. Moreover, two new global optima energy values of the Met-enkephalin protein has been recorded by HHSA based on ECEPP/3 and ECEPP/2 force fields with ω = 180°; the current energy optimum based on ECEPP/3 force field with ω = 180° is E = −10.90 kcal/mol by Zhan et al. (2006) while HHSA records a new energy value, E = −11.26 kcal/mol. Furthermore, based on ECEPP/2 force field with ω = 180°, the current lowest energy is E = −10.72 kcal/mol, while the HHSA obtains E = −11.57 kcal/mol.

The barcharts in Figs. 7 and 8 describe the best energies obtained by HHSA based on ECEPP/3 with ω relaxed, and ECEPP/3 with ω = 180°, respectively. With comparison to other studies, the barcharts indicate that HHSA obtains the same optimal energy recorded by Zhan et al. (2006), while it outperforms others. Moreover, based on ECEPP/3 with ω = 180°, HHSA obtains a new energy value and outperforms the previous lowest energies obtained by Zhan et al. (2006) and Eisenmenger and Hansmann (1997).

The barcharts in Figs. 9 and 10 describe the best energies obtained by HHSA based on ECEPP/2 with ω relaxed, and ECEPP/2 with ω = 180°, respectively. With comparison to other studies, the barcharts indicate that HHSA obtains the same optimal energy recorded by Zhan et al. (2006) and Meirovitch et al. (1994), while it outperforms others. Furthermore, based on ECEPP/2 with ω = 180°, HHSA obtains a new energy value and outperforms the previous lowest energies obtained by Zhan et al. (2006) and Eisenmenger and Hansmann (1997).

The coordinates of the lowest energies of the ‘Met-enkephalin’ protein for the results obtained in this study by HHSA and the results obtained in some of the previous studies are indicated in Table 8. The labels E/2 and E/3 mean that the configurations are obtained based on ECEPP/2 and ECEPP/3 force fields, respectively. If the label has a subscript π, it means that the angle ω is fixed at 180°. The lowest is the energy, the more stable is the protein. Table 8 shows the internal coordinates of lowest energy for Met-enkephalin in this study and some previous studies, those coordinates show the values of the torsion angles of the amino acids after the energy is stable on the lowest value.

Table 8 Internal coordinates of lowest energy for Met-enkephalin in this study and some previous studies

Full size table

Moreover, HHSA results are compared to the recent study of Nicosia and Stracquadanio (2009) who have studied some proteins, including ‘Met-enkephalin’, ‘1CRN’, ‘1E0L’, and ‘1IGD’ based on ECEPP/3 force field with explicit solvent term. HHSA has obtained better energy values than the three proteins, ‘Met-enkephalin’, ‘1CRN’, ‘1E0L’, but could not obtain the same energy value for the longer protein 1IGD as can be seen in Table 9 and Fig. 11.

Table 9 The lowest energies (in kcal/mol) obtained for benchmark sequences based on ECEPP/3 force fields with explicit solvent

Full size table

However, for the protein ‘1CRN’, HHSA has obtained a value of RSDM = 6 Å which is considered a successful prediction based on CASP6, where the most successful ab initio methods have presented values of RSMD ranging from 4 to 6 Å for the proteins of length less than 100 residues (Dorn et al. 2008).

It is worthy to mention that all the methods in our comparison are ab initio methods as we cannot compare with other methods for two reasons: first, in the comparison, the energy function should be same as different energy functions give different energy values, and the energy function used in our research is for ab initio methods so we cannot compare with other methods. Second, the ab initio methods predict the tertiary structure of protein from scratch, without any prior knowledge about sequence, while other methods use prior knowledge of protein so no way to compare.

5.3 Structures predicted by HHSA compared to the native structures

This section shows the graphical representation of the structures obtained by the HHSA for the four proteins, ‘Met-enkephalin’, ‘1CRN’, ‘1E0L’, and ‘1IGD’, compared to the graphs of the native structure of the same proteins in PDB. Figures 12, 13, 14, and 15 show the graphical representation of the native and predicted structures of the proteins Met-enkephalin, 1E0L, 1CRN, and 1IGD, respectively. The figures show the similarity between the predicted protein and native one using the same rotation angle, they show the effectiveness of HHSA for proteins up to 50 residues. The figures were generated using PyMol DeLano (2002), as used by Zhan et al. (2006). Moreover, a root mean square deviation (RMSD) is calculated for the three proteins, 1E0L, 1CRN, and 1IGD, to validate the similarity between the native and predicted structures. The HHSA has obtained a value of RMSD = 4.5 Å for the protein 1E0L, RMSD = 6 Å for the protein 1CRN, and RMSD = 6.9 Å for the protein 1IGD. A value of RMSD = 6 Å is considered a successful prediction based on CASP6, where most successful ab initio methods have presented values of RMSD ranging from 4 to 6 Å for the proteins of length less than 100 residues (Dorn et al. 2008).

6 Conclusions and future work

This paper has presented a HHSA for ab initio protein tertiary structure prediction (PSPP). Initially, the harmony search is adapted to PSPP, called AHSA. AHSA is the basic HSA (Geem et al. 2001) with adaptive HMCR and PAR. The adaptation takes into consideration the exploration and exploitation concepts of the search space. The HHSA is the AHSA hybridized with ILS to improve the local exploitation and global best concept of PSO to improve the convergence rate.

A parameter sensitivity analysis for HHSA has been conducted using 12 convergence cases each of which having a particular parameter setting. The results show that, in general, using larger values of HMCR and PSR increases the performance of HHSA. A comparative study between AHSA and HHSA has been carried out; the comparative evaluation shows that the HHSA achieves better results than AHSA. Interestingly, two new best results were obtained by HHSA in comparison with the comparative methods using the same benchmark.

Hybridizing ILS as a local optimizer and global best as convergence accelerator with AHSA as a global optimizer produced a superior method for PSPP. More testing benchmarks can be investigated in future to further investigate the successful performance of HHSA.

Abbreviations

AHSA:: Adaptive harmony search algorithm
HHSA:: Hybrid harmony search algorithm
HMCR:: Harmony memory consideration rate
HSA:: Harmony search algorithm
PAR:: Pitch adjustment rate
PSP:: Protein structure prediction
SMMP:: Simple molecular mechanics for proteins
NMR:: Nuclear magnetic resonance

References

Abagyan R, Maiorov V (1988) A simple qualitative representation of polypeptide chain folds: comparison of protein tertiary structures. J Biomol Struct Dyn 5(6):1267–1279
Article Google Scholar
Abual-Rub M, Abdullah R (2008) A survey of protein fold recognition algorithms. J Comput Sci 4(9):768–776
Article Google Scholar
Al-Betar M, Khader AT (2012) A harmony search algorithm for university course timetabling. Ann Oper Res 194(1):3–31
Article MathSciNet Google Scholar
Al-Betar MA, Khader AT, Liao IY (2010a) A harmony search algorithm with multi-pitch adjusting rate for university course timetabling. In: Geem ZW (ed) Recent advances in harmony search algorithm, SCI, vol 270. Springer, Berlin, pp 147–162
Al-Betar MA, Khader AT, Nadi F (2010b) Selection mechanisms in memory consideration for examination timetabling with harmony search. In: GECCO ’10: proceedings of genetic and evolutionary computation conference. ACM, Portland
Al-Betar MA, Khader AT, Thomas JJ (2010c) A combination of metaheuristic components based on harmony search for the uncapacitated examination timetabling. In: 8th International conference on the practice and theory of automated timetabling (PATAT 2010), Belfast, Northern Ireland
Al-Betar MA, Doush IA, Khader AT, Awadallah MA (2012a) Novel selection schemes for harmony search. Appl Math Comput 218(10):6095 – 6117
Article MATH Google Scholar
Al-Betar MA, Khader AT, Zaman M (2012b) University course timetabling using a hybrid harmony search metaheuristic algorithm. IEEE Trans Syst Man Cybern Part C Appl Rev. doi:10.1109/TSMCC.2011.2174356:1-18
Alatas B (2010) Chaotic harmony search algorithms. Appl Math Comput 216(9):2687–2699
Article MATH Google Scholar
Alia O, Mandava R (2011) The variants of the harmony search algorithm: an overview. Artif Intell Rev 36:49–68
Article Google Scholar
Almansoori W, Gao S, Jarada T, Elsheikh A, Murshed A, Jida J, Alhajj R, Rokne J (2012) Link prediction and classification in social networks and its application in healthcare and systems biology. In: Network modeling and analysis in health informatics and bioinformatics, pp 1–10. http://dx.doi.org/10.1007/s13721-012-0005-7
Androulakis I, Maranas C, Floudas C (1997) Prediction of oligopeptide conformations via deterministic global optimization. J Glob Optim 11(1):1–34
Google Scholar
Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181(4096):223–230
Article Google Scholar
Baker D (2000) A surprising simplicity to protein folding. Nature 405(6782):39–42
Article Google Scholar
Blum C, Roli A (2003) Metaheuristics in combinatorial optimization: overview and conceptual comparison. ACM Comput Surv 35(3):268–308
Article Google Scholar
Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4(2):187–217
Article Google Scholar
Chivian D, Robertson T, Bonneau R, Baker D (2003) Ab initio methods. Methods Biochem Anal 44:547–558
Google Scholar
Chothia C, Lesk A (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5(4):823–826
Google Scholar
Cutello V, Narzisi G, Nicosia G (2006) A multi-objective evolutionary approach to the protein structure prediction problem. J Roy Soc Interface 3(6):139–151
Article Google Scholar
Das S, Mukhopadhyay A, Roy A, Abraham A, Panigrahi BK (2011) Exploratory power of the harmony search algorithm: Analysis and improvements for global numerical optimization. IEEE Trans Syst Man Cybern Part B Cybern 41(1):89–106
Article Google Scholar
DeLano WL (2002) The PyMOL molecular graphics system. http://www.pymol.org
Dorn M, Breda A, Norberto de Souza O (2008) A hybrid method for the protein structure prediction problem. In: Bazzan A, Craven M, Martins N (eds) Advances in bioinformatics and computational biology. Lecture notes in computer science, vol 5167. Springer, Berlin, Heidelberg, pp 47–56
Dudek M, Objects B (2007) Igor, a simple integrable model of a polypeptide chain III. Model parameterization, TechReport. http://biomoleculeobjects.com/paper6/paper3.pdf
Eisenmenger F, Hansmann U (1997) Variation of the energy landscape of a small peptide under a change from the ecepp/2 force field to ECEPP/3. J Phys Chem B 101(16):3304–3310
Google Scholar
Eisenmenger F, Hansmann U, Hayryan S, Hu C (2001) [SMMP] a modern package for simulation of proteins. Comput Phys Commun 138(2):192–212
Article MATH Google Scholar
Eisenmenger F, Hansmann U, Hayryan S, Hu C (2006) An enhanced version of SMMP—open-source software package for simulation of proteins. Comput Phys Commun 174(5):422–429
Article MATH Google Scholar
Geem ZW, Sim KB (2010) Parameter-setting-free harmony search algorithm. Appl Math Comput 217(8):3881–3889
Article MathSciNet MATH Google Scholar
Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony search. Simulation 76(2):60–68
Article Google Scholar
Helles G (2008) A comparative study of the reported performance of ab initio protein structure prediction algorithms. J Roy Soc Interface 5(21):387–396. doi:10.1098/rsif.2007.1278
Google Scholar
Hinds D, Levitt M (1992) A lattice model for protein structure prediction at low resolution. Proc Natl Acad Sci USA 89(7):2536–2540
Article Google Scholar
Ingram G, Zhang T (2009) Overview of applications and developments in the harmony search algorithm. In: Geem ZW (ed) Music-inspired harmony search algorithm. Springer, Berlin, pp 15–37
Jorgensen WL, Tirado-Rives J (1988) The OPLS [optimized potentials for liquid simulations] potential functions for proteins, energy minimizations for crystals of cyclic peptides and crambin. J Am Chem Soc 110(6):1657–1666
Article Google Scholar
Kirkpatrick S, Gelatt J C D, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Article MathSciNet MATH Google Scholar
Lee J, Wu S, Zhang Y (2009) ab initio Protein Structure Prediction, Springer Netherlands, chap From Protein Structure to Function with Bioinformatics, pp 3–25
Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods App Mech Eng 194(36–38):3902–3933. doi:10.1016/j.cma.2004.09.007
Google Scholar
Levitt M (1976) A simplified representation of protein conformations for rapid simulation of protein folding. J Mol Biol 104(1):59–107
Article Google Scholar
Li Z, Scheraga H (1988) Structure and free energy of complex thermodynamic systems. J Mol Struct THEOCHEM 179(1):333–352
Article Google Scholar
Li Z, Scheraga HA (1987) Monte carlo-minimization approach to the multiple-minima problem in protein folding. Proc Natl Acad Sci USA 84(19):6611–6615
Article MathSciNet Google Scholar
Mahdavi M, Abolhassani H (2009) Harmony k-means algorithm for document clustering. Data Min Knowl Discov 18:370–391
Article MathSciNet Google Scholar
Mahdavi M, Fesanghary M, Damangir E (2007) An improved harmony search algorithm for solving optimization problems. Appl Math Comput 188(2):1567–1579
Article MathSciNet MATH Google Scholar
Meirovitch H, Meirovitch E, Michel A, Vasquez M (1994) A simple and effective procedure for conformational search of macromolecules: application to met-and leu-enkephalin. J Phys Chem 98(25):6241–6243
Article Google Scholar
Mohsen A, Khader A, Ramachandram D (2010) An optimization algorithm based on harmony search for rna secondary structure prediction. In: Geem Z (ed) Recent advances in harmony search algorithm, studies in computational intelligence, vol 270. Springer, Berlin, pp 163–174
Nadi F, Khader AT, Al-Betar MA (2010) Adaptive genetic algorithm using harmony search. In: Proceedings of the 12th annual conference on Genetic and evolutionary computation, GECCO ’10. ACM, New York, pp 819–820
Nicosia G, Stracquadanio G (2009) A design-for-yield algorithm to assess and improve the structural and energetic robustness of proteins and drugs. In: Experimental algorithms, pp 245–256
Omran MGH, Mahdavi M (2008) Global-best harmony search. Appl Math Comput 198(2):643–656
Article MathSciNet MATH Google Scholar
Pan QK, Suganthan P, Tasgetiren MF, Liang J (2010) A self-adaptive global best harmony search algorithm for continuous optimization problems. Appl Math Comput 216(3):830–848
Article MathSciNet MATH Google Scholar
Saka M, Aydogdu I, Hasancebi O, Geem Z (2011) Harmony search algorithms in structural engineering. In: Yang XS, Koziel S (eds) Computational optimization and applications in engineering and industry, studies in computational intelligence, vol 359. Springer, Berlin, pp 145–182
Tang W, Yarowsky P, Tasch U (2012) Detecting als and parkinsons disease in rats through locomotion analysis. In: Network modeling and analysis in health informatics and bioinformatics, pp 1–6. http://dx.doi.org/10.1007/s13721-012-0004-8
Wang TY, Wu KB, Liu YW (2001) A simulated annealing algorithm for facility layout problems under variable demand in cellular manufacturing systems. Comput Ind 46(2):181–188
Article Google Scholar
Weiner SJ, Kollman PA, Case DA, Singh UC, Ghio C, Alagona G, Profeta S, Weiner P (1984) A new force field for molecular mechanical simulation of nucleic acids and proteins. J Am Chem Soc 106(3):765–784
Article Google Scholar
Yang XS (2009) Harmony search as a metaheuristic algorithm. In: Geem ZW (ed) Music-inspired harmony search algorithm. Springer, Berlin, pp 1–14
Zhan L, Chen J, Liu W (2006) Conformational study of met-enkephalin based on the ECEPP force fields. Biophys J 91(7):2399–2404
Article Google Scholar

Download references

Acknowledgments

The work in this paper was partly supported by the Universiti Sains Malaysia (USM) Fellowship awarded to the first author. The second author is grateful to be awarded a Postdoctoral Fellowship from the School of Computer Sciences (USM).

Author information

Authors and Affiliations

college of Shari′a and Islamic Studies , Imam Muhammad Ibn Saud Islamic University, AlAhsaa, 31982, Kingdom of Saudi Arabia
Mohammed Said Abual-Rub
Department of Computer Science, Jadara University, PO Box 733, Irbid, Jordan
Mohammed Azmi Al-Betar
School of Computer Sciences, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia
Mohammed Azmi Al-Betar, Rosni Abdullah & Ahamad Tajudin Khader

Authors

Mohammed Said Abual-Rub
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Azmi Al-Betar
View author publications
You can also search for this author in PubMed Google Scholar
Rosni Abdullah
View author publications
You can also search for this author in PubMed Google Scholar
Ahamad Tajudin Khader
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Azmi Al-Betar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abual-Rub, M.S., Al-Betar, M.A., Abdullah, R. et al. A hybrid harmony search algorithm for ab initio protein tertiary structure prediction. Netw Model Anal Health Inform Bioinforma 1, 69–85 (2012). https://doi.org/10.1007/s13721-012-0013-7

Download citation

Received: 10 March 2012
Revised: 20 May 2012
Accepted: 23 May 2012
Published: 04 July 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s13721-012-0013-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A hybrid harmony search algorithm for ab initio protein tertiary structure prediction

Abstract

Similar content being viewed by others

An Improved Harmony Search Algorithm for Protein Structure Prediction Using 3D Off-Lattice Model

Protein structure prediction based on particle swarm optimization and tabu search strategy

A Comparative Study of Various Meta-Heuristic Algorithms for Ab Initio Protein Structure Prediction on 2D Hydrophobic-Polar Model

1 Introduction

2 Materials and methods

2.1 Problem description

2.2 Problem modeling

2.2.1 Problem representation

2.2.2 Energy function

2.3 Harmony search algorithm

3 Method

3.1 AHSA for ab initio PSPP

3.2 HHSA for ab initio PSPP

3.2.1 Hybridizing with iterated local search

3.2.2 Hybridizing with global-best memory consideration

4 Results and discussion

4.1 Benchmark

4.2 Experimental design

4.3 Experimental results

4.4 Discussion

5 Comparative results

5.1 Comparison between AHSA and HHSA

5.2 Comparison between HHSA and other studies

5.3 Structures predicted by HHSA compared to the native structures

6 Conclusions and future work

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A hybrid harmony search algorithm for ab initio protein tertiary structure prediction

Abstract

Similar content being viewed by others

An Improved Harmony Search Algorithm for Protein Structure Prediction Using 3D Off-Lattice Model

Protein structure prediction based on particle swarm optimization and tabu search strategy

A Comparative Study of Various Meta-Heuristic Algorithms for Ab Initio Protein Structure Prediction on 2D Hydrophobic-Polar Model

1 Introduction

2 Materials and methods

2.1 Problem description

2.2 Problem modeling

2.2.1 Problem representation

2.2.2 Energy function

2.3 Harmony search algorithm

3 Method

3.1 AHSA for ab initio PSPP

3.2 HHSA for ab initio PSPP

3.2.1 Hybridizing with iterated local search

3.2.2 Hybridizing with global-best memory consideration

4 Results and discussion

4.1 Benchmark

4.2 Experimental design

4.3 Experimental results

4.4 Discussion

5 Comparative results

5.1 Comparison between AHSA and HHSA

5.2 Comparison between HHSA and other studies

5.3 Structures predicted by HHSA compared to the native structures

6 Conclusions and future work

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation