Skip to main content

Using programmatic motifs and genetic programming to classify protein sequences as to cellular location

  • Conference paper
  • First Online:
Evolutionary Programming VII (EP 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1447))

Included in the following conference series:

Abstract

As newly sequenced proteins are deposited into the world's ever-growing archives, they are typically immediately tested by various algorithms for clues as to their biological structure and function. One question about a new protein involves its cellular location — that is, where the protein resides in a living organism (e.g., extracellular, membrane, nuclear). A human-created five-way algorithm for cellular location using statistical techniques with 76% accuracy was recently reported. This paper describes a two-way algorithm that was evolved using genetic programming with 83% accuracy for determining whether a protein is an extracellular protein, 84% for nuclear proteins, 89% for membrane proteins, and 83% for anchored membrane proteins. Unlike the statistical calculation, the genetically evolved programs employ a large and varied arsenal of computational capabilities, including arithmetic functions, conditional operations, subroutines, iterations, named memory, indexed memory, setcreating operations, and look-ahead. The genetically evolved classification program can be viewed as an extension (which we call a programmatic motif) of the conventional notion of a protein motif.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Andrade, Miguel A., O'Donoghue, and Rost, Burkhard. 1998. Adaptation of protein surfaces to subcellular location. Journal of Molecular Biology. 276(2): 517–525.

    Google Scholar 

  • Andre, David and Koza, John R. 1996. Parallel genetic programming: A scalable implementation using the transputer architecture. In Angeline, P. J. and Kinnear, K. E. Jr. (editors). 1996. Advances in Genetic Programming 2. Cambridge: MIT Press.

    Google Scholar 

  • Anfinsen, C. B. 1973. Principles that govern the folding of protein chains. Science 81: 223–230.

    Google Scholar 

  • Angeline, Peter J. and Kinnear, Kenneth E. Jr. (editors). 1996. Advances in Genetic Programming 2. Cambridge, MA: The MIT Press.

    Google Scholar 

  • Bairoch, A. and Boeckmann, B. 1991. The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Research 22(17) 3578–3580.

    Google Scholar 

  • Bairoch, Amos and Bucher, Philipp. 1994. PROSITE: Recent developments. Nucleic Acids Research 22(17) 3583–3589.

    Google Scholar 

  • Banzhaf, Wolfgang, Nordin, Peter, Keller, Robert E., and Francone, Frank D. 1998. Genetic Programming — An Introduction. San Francisco, CA: Morgan Kaufmann and Heidelberg: Dpunkt.

    Google Scholar 

  • Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E.J., Jr., Brice, M. D., Rodgets, J. R., Kennard, O. Shimamouchi, T., and Tasumi, M. 1977. The protein data bank: A computer based archival file for macromolecular structures. Journal of Molecular Biology. 112: 535–542.

    Google Scholar 

  • Cedano, Juan, Aloy, Patrick, Perez-Pons, Josep A., and Querol, Enrique. 1997. Relation between amino acid composition and cellular location of proteins. Journal of Molecular Biology. 266(3) 594–600.

    Google Scholar 

  • Holland, John H. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: University of Michigan Press.

    Google Scholar 

  • Kinnear, Kenneth E. Jr. (editor). 1994. Advances in Genetic Programming. Cambridge, MA: MIT Press.

    Google Scholar 

  • Koza, John R. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press.

    Google Scholar 

  • Koza, John R. 1994a. Genetic Programming II: Automatic Discovery of Reusable Programs. Cambridge, MA: MIT Press.

    Google Scholar 

  • Koza, John R. 1994b. Genetic Programming II Videotape: The Next Generation. Cambridge, MA: MIT Press.

    Google Scholar 

  • Koza, John R. 1994c. Architecture-Altering Operations for Evolving the Architecture of a Multi-Part Program in Genetic Programming. Stanford University Computer Science Department technical report STAN-CS-TR-94-1528. October 21, 1994.

    Google Scholar 

  • Koza, John R. and Andre, David. 1996. Evolution of iteration in genetic programming. In Fogel, Lawrence J., Angeline, Peter J. and Baeck, T. 1996. Evolutionary Programming V: Proceedings of the Fifth Annual Conference on Evolutionary Programming. Cambridge, MA: The MIT Press. Pages 469–478.

    Google Scholar 

  • Koza, John R. and Andre, David. 1997. Automatic discovery of protein motifs using genetic programming. In Yao, Xin (editor). 1997. Evolutionary Computation: Theory and Applications. Singapore: World Scientific. In Press.

    Google Scholar 

  • Koza, John R., Deb, Kalyanmoy, Dorigo, Marco, Fogel, David B., Garzon, Max, Iba, Hitoshi, and Riolo, Rick L. (editors). 1997. Genetic Programming 1997: Proceedings of the Second Annual Conference, July 13–16, 1997, Stanford University. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  • Koza, John R., Goldberg, David E., Fogel, David B., and Riolo, Rick L. (editors). 1996. Genetic Programming 1996: Proceedings of the First Annual Conference, July 28–31, 1996, Stanford University. Cambridge, MA: MIT Press.

    Google Scholar 

  • Koza, John R., and Rice, James P. 1992. Genetic Programming: The Movie. Cambridge, MA: MIT Press.

    Google Scholar 

  • Kyte, J. and Doolittle, R. 1982. A simple method for displaying the hydropathic character of proteins. Journal of Molecular Biology. 157:105–132.

    Google Scholar 

  • Linder, P., Lasko, P., Ashburner, M., Leroy, P., Nielsen, P. J., Nishi, J., Schneir, J., and Slonimski, P. P. 1989. Birth of the D-E-A-D box. Nature 337: 121–122.

    Google Scholar 

  • Nakashima, J. and Nishikawa, K. 1994. Discrimination of intercellular and extracellular proteins using amino acid composition and residue-pair frequencies. Journal of Molecular Biology. 238: 54–61.

    Google Scholar 

  • Rost, B. and Sander, C. 1993. Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology. 232: 584–599.

    Google Scholar 

  • Stryer, Lubert. 1995. Biochemistry. New York, NY: W. H. Freeman. Fourth Edition.

    Google Scholar 

  • Teller, A. (1994). The evolution of mental models. In Kinnear, K. E. Jr. (editor). Advances in Genetic Programming. Cambridge, MA: The MIT Press. Pages 199–219.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

V. W. Porto N. Saravanan D. Waagen A. E. Eiben

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Koza, J.R., Bennett, F.H., Andre, D. (1998). Using programmatic motifs and genetic programming to classify protein sequences as to cellular location. In: Porto, V.W., Saravanan, N., Waagen, D., Eiben, A.E. (eds) Evolutionary Programming VII. EP 1998. Lecture Notes in Computer Science, vol 1447. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0040796

Download citation

  • DOI: https://doi.org/10.1007/BFb0040796

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64891-8

  • Online ISBN: 978-3-540-68515-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics