Abstract
As newly sequenced proteins are deposited into the world's ever-growing archives, they are typically immediately tested by various algorithms for clues as to their biological structure and function. One question about a new protein involves its cellular location — that is, where the protein resides in a living organism (e.g., extracellular, membrane, nuclear). A human-created five-way algorithm for cellular location using statistical techniques with 76% accuracy was recently reported. This paper describes a two-way algorithm that was evolved using genetic programming with 83% accuracy for determining whether a protein is an extracellular protein, 84% for nuclear proteins, 89% for membrane proteins, and 83% for anchored membrane proteins. Unlike the statistical calculation, the genetically evolved programs employ a large and varied arsenal of computational capabilities, including arithmetic functions, conditional operations, subroutines, iterations, named memory, indexed memory, setcreating operations, and look-ahead. The genetically evolved classification program can be viewed as an extension (which we call a programmatic motif) of the conventional notion of a protein motif.
Preview
Unable to display preview. Download preview PDF.
References
Andrade, Miguel A., O'Donoghue, and Rost, Burkhard. 1998. Adaptation of protein surfaces to subcellular location. Journal of Molecular Biology. 276(2): 517–525.
Andre, David and Koza, John R. 1996. Parallel genetic programming: A scalable implementation using the transputer architecture. In Angeline, P. J. and Kinnear, K. E. Jr. (editors). 1996. Advances in Genetic Programming 2. Cambridge: MIT Press.
Anfinsen, C. B. 1973. Principles that govern the folding of protein chains. Science 81: 223–230.
Angeline, Peter J. and Kinnear, Kenneth E. Jr. (editors). 1996. Advances in Genetic Programming 2. Cambridge, MA: The MIT Press.
Bairoch, A. and Boeckmann, B. 1991. The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Research 22(17) 3578–3580.
Bairoch, Amos and Bucher, Philipp. 1994. PROSITE: Recent developments. Nucleic Acids Research 22(17) 3583–3589.
Banzhaf, Wolfgang, Nordin, Peter, Keller, Robert E., and Francone, Frank D. 1998. Genetic Programming — An Introduction. San Francisco, CA: Morgan Kaufmann and Heidelberg: Dpunkt.
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E.J., Jr., Brice, M. D., Rodgets, J. R., Kennard, O. Shimamouchi, T., and Tasumi, M. 1977. The protein data bank: A computer based archival file for macromolecular structures. Journal of Molecular Biology. 112: 535–542.
Cedano, Juan, Aloy, Patrick, Perez-Pons, Josep A., and Querol, Enrique. 1997. Relation between amino acid composition and cellular location of proteins. Journal of Molecular Biology. 266(3) 594–600.
Holland, John H. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: University of Michigan Press.
Kinnear, Kenneth E. Jr. (editor). 1994. Advances in Genetic Programming. Cambridge, MA: MIT Press.
Koza, John R. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press.
Koza, John R. 1994a. Genetic Programming II: Automatic Discovery of Reusable Programs. Cambridge, MA: MIT Press.
Koza, John R. 1994b. Genetic Programming II Videotape: The Next Generation. Cambridge, MA: MIT Press.
Koza, John R. 1994c. Architecture-Altering Operations for Evolving the Architecture of a Multi-Part Program in Genetic Programming. Stanford University Computer Science Department technical report STAN-CS-TR-94-1528. October 21, 1994.
Koza, John R. and Andre, David. 1996. Evolution of iteration in genetic programming. In Fogel, Lawrence J., Angeline, Peter J. and Baeck, T. 1996. Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming. Cambridge, MA: The MIT Press. Pages 469–478.
Koza, John R. and Andre, David. 1997. Automatic discovery of protein motifs using genetic programming. In Yao, Xin (editor). 1997. Evolutionary Computation: Theory and Applications. Singapore: World Scientific. In Press.
Koza, John R., Deb, Kalyanmoy, Dorigo, Marco, Fogel, David B., Garzon, Max, Iba, Hitoshi, and Riolo, Rick L. (editors). 1997. Genetic Programming 1997: Proceedings of the Second Annual Conference, July 13–16, 1997, Stanford University. San Francisco, CA: Morgan Kaufmann.
Koza, John R., Goldberg, David E., Fogel, David B., and Riolo, Rick L. (editors). 1996. Genetic Programming 1996: Proceedings of the First Annual Conference, July 28–31, 1996, Stanford University. Cambridge, MA: MIT Press.
Koza, John R., and Rice, James P. 1992. Genetic Programming: The Movie. Cambridge, MA: MIT Press.
Kyte, J. and Doolittle, R. 1982. A simple method for displaying the hydropathic character of proteins. Journal of Molecular Biology. 157:105–132.
Linder, P., Lasko, P., Ashburner, M., Leroy, P., Nielsen, P. J., Nishi, J., Schneir, J., and Slonimski, P. P. 1989. Birth of the D-E-A-D box. Nature 337: 121–122.
Nakashima, J. and Nishikawa, K. 1994. Discrimination of intercellular and extracellular proteins using amino acid composition and residue-pair frequencies. Journal of Molecular Biology. 238: 54–61.
Rost, B. and Sander, C. 1993. Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology. 232: 584–599.
Stryer, Lubert. 1995. Biochemistry. New York, NY: W. H. Freeman. Fourth Edition.
Teller, A. (1994). The evolution of mental models. In Kinnear, K. E. Jr. (editor). Advances in Genetic Programming. Cambridge, MA: The MIT Press. Pages 199–219.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koza, J.R., Bennett, F.H., Andre, D. (1998). Using programmatic motifs and genetic programming to classify protein sequences as to cellular location. In: Porto, V.W., Saravanan, N., Waagen, D., Eiben, A.E. (eds) Evolutionary Programming VII. EP 1998. Lecture Notes in Computer Science, vol 1447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0040796
Download citation
DOI: https://doi.org/10.1007/BFb0040796
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64891-8
Online ISBN: 978-3-540-68515-9
eBook Packages: Springer Book Archive