Functional insights from the distribution and role of homopeptide repeat-containing proteins

  1. Noel G. Faux1,2,3,
  2. Stephen P. Bottomley1,2,
  3. Arthur M. Lesk1,2,6,
  4. James A. Irving1,2,3,
  5. John R. Morrison5,
  6. Maria Garcia de la Banda2,3,4,7, and
  7. James C. Whisstock1,2,3,7
  1. 1 Protein Crystallography Unit, Department of Biochemistry and Molecular Biology, School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, VIC 3800, Australia
  2. 2 Victorian Bioinformatics Consortium, School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, VIC 3800, Australia
  3. 3 ARC Centre for Structural and Functional Microbial Genomics, School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, VIC 3800, Australia
  4. 4 School of Computer Science and Software Engineering, Monash University, Clayton Campus, Melbourne, VIC 3800, Australia
  5. 5 Monash Institute of Reproduction and Development, Monash University, Clayton, VIC 3168, Australia
  6. 6 Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA

Abstract

Expansion of “low complex” repeats of amino acids such as glutamine (Poly-Q) is associated with protein misfolding and the development of degenerative diseases such as Huntington's disease. The mechanism by which such regions promote misfolding remains controversial, the function of many repeat-containing proteins (RCPs) remains obscure, and the role (if any) of repeat regions remains to be determined. Here, a Web-accessible database of RCPs is presented. The distribution and evolution of RCPs that contain homopeptide repeats tracts are considered, and the existence of functional patterns investigated. Generally, it is found that while polyamino acid repeats are extremely rare in prokaryotes, several eukaryote putative homologs of prokaryote RCP—involved in important housekeeping processes—retain the repetitive region, suggesting an ancient origin for certain repeats. Within eukarya, the most common uninterrupted amino acid repeats are glutamine, asparagines, and alanine. Interestingly, while poly-Q repeats are found in vertebrates and nonvertebrates, poly-N repeats are only common in more primitive nonvertebrate organisms, such as insects and nematodes. We have assigned function to eukaryote RCPs using Online Mendelian Inheritance in Man (OMIM), the Human Reference Protein Database (HRPD), FlyBase, and Wormpep. Prokaryote RCPs were annotated using BLASTp searches and Gene Ontology. These data reveal that the majority of RCPs are involved in processes that require the assembly of large, multiprotein complexes, such as transcription and signaling.

Footnotes

  • [Supplemental material is available online at www.genome.org.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3096505.

  • 7 Corresponding authors. E-mail James.Whisstock{at}med.monash.edu.au; fax +613 9905 3726. E-mail Maria.GarciadelaBanda{at}infotech.monash.edu.au; fax +613 9905 5146.

    • Accepted January 20, 2005.
    • Received August 3, 2004.
| Table of Contents

Preprint Server