Abstract
The ultimate goal of functional genomics is to define the function of all the genes in the genome of an organism. A large body of information of the biological roles of genes has been accumulated and aggregated in the past decades of research, both from traditional experiments detailing the role of individual genes and proteins, and from newer experimental strategies that aim to characterize gene function on a genomic scale.
It is clear that the goal of functional genomics can only be achieved by integrating information and data sources from the variety of these different experiments. Integration of different data is thus an important challenge for bioinformatics.
The integration of different data sources often helps to uncover non-obvious relationships between genes, but there are also two further benefits. First, it is likely that whenever information from multiple independent sources agrees, it should be more valid and reliable. Secondly, by looking at the union of multiple sources, one can cover larger parts of the genome. This is obvious for integrating results from multiple single gene or protein experiments, but also necessary for many of the results from genome-wide experiments since they are often confined to certain (although sizable) subsets of the genome.
In this paper, we explore an example of such a data integration procedure. We focus on the prediction of membership in protein complexes for individual genes. For this, we recruit six different data sources that include expression profiles, interaction data, essentiality and localization information. Each of these data sources individually contains some weakly predictive information with respect to protein complexes, but we show how this prediction can be improved by combining all of them. Supplementary information is available at http://bioinfo.mbb.yale.edu/integrate/interactions/.Abbreviations: TP: true possitive; TN: true negative; FP: false positive; FN: false negative; Y2H: yeast two-hybrid.
Similar content being viewed by others
References
Ermolaeva, O., Rastogi, M., Pruitt, K.D., Schuler, G.D., Bittner, M.L., Chen, Y., Simon, R., Meltzer, P., Trent, J.M., and Boguski, M.S. (1998) Nat. Genet. 20: 19-23.
Gaasterland, T., and Bekiranov, S. (2000) Nat. Genet. 24: 204-206.
Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J.E., Snesrud, E., Lee, N., and Quackenbush, J. (2000) Biotechniques. 29: 548-550.
Kim, S., Dougherty, E.R., Bittner, M.L., Chen, Y., Sivakumar, K., Meltzer, P., and Trent, J.M. (2000) J. Biomed. Opt. 5: 411-424.
Shalon, D., Smith, S.J., and Brown, P.O. (1996) Genome Res. 6: 639-645.
Ross-Macdonald, P., Coelho, P., Roemer, T., Agarwal, S., Kumar, A., Jansen, R., Cheung, K., Sheehan, A., Symoniatis, D., Umansky, L., Heidtman, M., Nelson, F., Iwasaki, H., Hager, K., Gerstein, M., Miller, P., Roeder, G., and Snyder, M. (1999) Nature. 402: 413-418.
Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H., Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J.D., Bussey, H., Chu, A.M., Connelly, C., Davis, K., Dietrich, F., Dow, S.W., El Bakkoury, M., Foury, F., Friend, S.H., Gentalen, E., Giaever, G., Hegemann, J.H., Jones, T., Laub, M., Liao, H., Davis, R.W., and et al. (1999) Science 285: 901-906.
Zhu, H., Bilgin, M., Bangham, R., Hall, D., Casamayor, A., Bertone, P., Lan, N., Jansen, R., Bidlingmaier, S., Houfek, T., Mitchell, T., Miller, P., Dean, R.A., Gerstein, M. and Snyder, M. (2001) Science 293: 2101-2105.
Zhu, H., Klemic, J.F., Chang, S., Bertone, P., Casamayor, A., Klemic, K.G., Smith, D., Gerstein, M., Reed, M.A., and Snyder, M. (2000) Nat Genet. 26: 283-289.
Ito, T., Chiba, T., Ozawa, R., Yoshida, M., Hattori, M., and Sakaki, Y. (2001) Pr. Natl. Acad. Sci. USA 98: 4569-4574.
Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S., Knight, J.R., Lockshon, D., Narayan, V., Srinivasan, M., Pochart, P., Qureshi-Emili, A., Li, Y., Godwin, B., Conover, D., Kalbfleisch, T., Vijayadamodar, G., Yang, M., Johnston, M., Fields, S., and Rothberg, J.M. (2000) Nature 403: 623-627.
Ben-Dor, A., Shamir, R. and Yakhini, Z. (1999) J. Comput. Biol. 6: 281-297.
Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Jr., and Haussler, D. (2000) Proc. Natl. Acad. Sci. USA 97: 262-267.
Bussemaker, H.J., Li, H. and Siggia, E.D. (2001) Nat. Genet. 27: 167-171.
Ge, H., Liu, Z., Church, G.M., and Vidal, M. (2001) Nat. Genet. 29: 482-486.
Gerstein, M., and Jansen, R. (2000) Curr. Opin. Struct. Biol. 10: 574-584.
Greenbaum, D., Jansen, R., and Gerstein, M. (2002) Bioinformatics 18: 1-12.
Greenbaum, D., Luscombe, N.M., Jansen, R., Qian, J., and Gerstein, M. (2001) Genome Res. 11: 1463-1468.
Gygi, S.P., Rochon, Y., Franza, B.R., and Aebersold, R. (1999) Mol. Cell. Biol. 19: 1720-1730.
Heyer, L.J., Kruglyak, S., and Yooseph, S. (1999) Genome Res. 9: 1106-1115.
Jansen, R., and Gerstein, M. (2000) Nucleic Acids Res. 28: 1481-1488.
Jansen, R., Greenbaum, D., and Gerstein, M. (2002) Genome Res. 12: 37-46.
Qian, J., Dolled-Filhart, M., J., L., Yu, H., and Gerstein, M. (2001a) J. Mol. Biol. 314: 1053-1066.
Qian, J., Stenger, B., Wilson, C.A., Lin, J., Jansen, R., Teichmann, S.A., Park, J., Krebs, W.G., Yu, H., Alexandrov, V., Echols, N., and Gerstein, M. (2001b) Nucleic Acids Res. 29: 1750-1764.
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., and Golub, T.R. (1999) Proc. Natl. Acad. Sci. USA 96: 2907-2912.
Toronen, P., Kolehmainen, M., Wong, G., and Castren, E. (1999) FEBS Lett. 451: 142-146.
Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O., and Eisenberg, D. (1999) Nature 402: 83-86.
Drawid, A., and Gerstein, M. (2000) J. Mol. Biol. 301: 1059-1075.
Drawid, A., Jansen, R., and Gerstein, M. (2000) Trends Genet. 16: 426-430.
Cohen, B., Mitra, R., Hughes, J. and Church, G. (2000) Nat. Genet. 26: 183-186.
Mewes, H.W., Frishman, D., Gruber, C., Geier, B., Haase, D., Kaps, A., Lemcke, K., Mannhaupt, G., Pfeiffer, F., Schuller, C., Stocker, S., and Weil, B. (2000) Nucleic Acids Res. 28: 37-40.
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., and Davis, R.W. (1998) Mol. Cell. 2: 65-73.
Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H.Y., He, Y.D.D., Kidd, M.J., King, A.M., Meyer, M.R., Slade, D., Lum, P.Y., Stepaniants, S.B., Shoemaker, D.D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., and Friend, S.H. (2000) Cell 102: 109-126.
Bairoch, A., and Apweiler, R. (2000) Nucleic Acids Res. 28: 45-48.
Hodges, P.E., McKee, A.H., Davis, B.P., Payne, W.E., and Garrels, J.I. (1999) Nucleic Acids Res. 27: 69-73.
Gerstein, M., Lan, N., and Jansen, R. (2002) Science 295: 284-287.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Jansen, R., Lan, N., Qian, J. et al. Integration of genomic datasets to predict protein complexes in yeast. J Struct Func Genom 2, 71–81 (2002). https://doi.org/10.1023/A:1020495201615
Issue Date:
DOI: https://doi.org/10.1023/A:1020495201615