Abstract
The degree distribution has been viewed as an important characteristic of network data. Many biological networks have been labelled scale-free as their degree distribution can be approximately described by a power-law probability distribution. This chapter presents a formal statistical model selection procedure that can determine which functional form, from a collection of specified models, best describes the degree distribution of network data. The degree distribution found for empirical data is viewed as belonging to a class of probability models and the model which best describes the data is determined in a maximum likelihood framework. In conclusion, it is important to note that these statistical tests do not confirm the true underlying distribution of the observed data, but instead show which models from a chosen set best describe the data. In reality, these approaches should be viewed as providing evidence for which probability models do not adequately (or optimally) describe the data, and give an indication of the underlying sampling and true interaction properties of the system considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sanchez C, Lachaize C, Janody F, Bellon B, Röder L, Euzenat J, Rechenmann F, Jacq B. (1999) Grasping at molecular interactions and genetic networks in drosophila melanogaster using FlyNets, an internet database. Nucleic Acids Res, 27(1):89–94.
Xenarios I, Rice D, Salwinski L, Baron M, Marcotte EM, Eisenberg D. (2000) Dip: the database of interacting proteins. Nucleic Acids Res, 28(1):289–291.
Legrain P, Wojcik J, Gauthier JM. (2001) Protein–protein interaction maps: a lead towards cellular functions. Trends Genet, 17(6):346–352.
Lehner B, Fraser AG. (2004) A first-draft human protein-interaction map. Genome Biol, 5(9):R63.
Zhang J. (2003) Evolution by gene duplication: an update. Trends Ecol Evol, 18(6): 292–298.
Qin H, Lu H, Wu W, Li W-H. (2003) Evolution of the yeast protein interaction network. Proc Natl Acad Sci U S A, 100(22): 12820–12824.
Hakes L, Pinney JW, Robertson DL, Lovell SC. (2008) Protein-protein interaction networks and biology – what’s the connection? Nat Biotechnol, 26(1):69–72.
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci, 98(8):4569–4574.
Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao Y, Ooi C, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley R, White K, Braverman M, Jarvie T, Gold S, Leach M, Knight JR, Shimkets R, McKenna M, Chant J, Rothberg J. (2003) A protein interaction map of drosophila melanogaster. Science, 302(5651):1727–1736.
Tarassov K, Messier V, Landry CR, Radinovic S, Serna Molina MM, Shames I, Malitskaya Y, Vogel J, Bussey H, Michnick SW. (2008) An in vivo map of the yeast protein interactome. Science, 320(5882):1465–1470.
Kelly WP, Stumpf MPH. (2008) Protein-protein interactions: from global to local analyses. Curr Opin Biotechnol, 19:396–403.
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási AL. (2000) The large-scale organization of metabolic networks. Nature, 407(6804):651–654.
Reguly T, Breitkreutz A, Boucher L, Breitkreutz B-J, Hon G, Myers C, Parsons AB, Friesen H, Oughtred R, Tong A, Stark C, Ho Y, Botstein D, Andrews BJ, Boone C, Troyanskya O, Ideker T, Dolinski K, Batada NN, Tyers M. (2006) Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol, 5(4):11.
Kelly WP. (2009) On the analysis of protein interaction networks. PhD thesis.
Doyle J, Alderson D, Li L, Low L, Roughan M, Shalunov S, Tanaka R, Willinger W. (2005) The “robust yetfragile” nature of the internet. Proc Natl Acad Sci, 102(41):14497–14502.
Willinger W, Alderson D, Doyle J. (2009) Mathematics and the internet: a source of enormous confusion and great potential. Not Am Math Soc, 56(5):586–599.
Tanaka R, Yi T-M, Doyle J. (2005) Some protein interaction data do not exhibit power law statistics. FEBS Lett, 579(23):5140–5144.
Stumpf MPH, Ingram P. (2005) Probability models for degree distributions of protein interaction networks. EPL (Europhys Lett), 71(1):152–158.
Stumpf MP, Ingram P, Nouvel I, Wiuf C. (2005) Statistical model selection methods applied to biological networks. Lect Notes Comput Sci, 65–77.
Stumpf MPH, Thorne T. (2006) Multi-model inference of network properties from incomplete data. J Integr Bioinform, 3(2):32.
Burnham K, Anderson DR. (1998) Model Selection and Inference: A Practical Information-Theoretic Approach. Springer, New York.
Akaike H. (1983) Information measures and model selection. Bull Inst Int Statist, 50(1):277–290.
Parrish JR, Yu J, Liu G, Hines JA, Chan JE, Mangiola BA, Zhang H, Pacifico S, Fotouhi F, DiRita VJ, Ideker T, Andrews P, Finley RL. (2007) A proteome-wide protein interaction map for Campylobacter jejuni. Genome Biol, 8(7):R130.
Hirschman JE, Balakrishnan R, Christie KR, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hong EL, Livstone MS, Nash R, Park J, Oughtred R, Skrzypek M, Starr B, Theesfeld CL, Williams J, Andrada R, Binkley G, Dong Q, Lane C, Miyasato S, Sethuraman A, Schroeder M, Thanawala MK, Weng S, Dolinski K, Botstein D, Cherry JM. (2006) Genome snapshot: a new resource at the Saccharomyces genome database (sgd) presenting an overview of the Saccharomyces cerevisiae genome. Nucleic Acids Res, 34(Database issue):D442–D445.
Stark C, Breitkreutz B-J, Reguly T, Boucher L, Breitkreutz A, Tyers M. (2006) Biogrid: a general repository for interaction datasets. Nucleic Acids Res, 34(Database issue):D535–D5396.
Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrín-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie BK, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P, Ghanny S, Lam MHY, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O’Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF. (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature, 440(7084):637–643.
Breitkreutz B-J, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bähler J, Wood V, Dolinski K, Tyers M. (2008) The biogrid interaction database: 2008 update. Nucleic Acids Res, 36(Database issue):D637–D640.
Schwarz G. (1978) Estimating the dimension of a model. Ann Stat, 6(2):461–464.
Acknowledgments
This work has been funded by the Wellcome Trust and the BBSRC. MPHS is a Royal Society Wolfson Research Merit Award holder.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Kelly, W.P., Ingram, P.J., Stumpf, M.P.H. (2012). The Degree Distribution of Networks: Statistical Model Selection. In: van Helden, J., Toussaint, A., Thieffry, D. (eds) Bacterial Molecular Networks. Methods in Molecular Biology, vol 804. Springer, New York, NY. https://doi.org/10.1007/978-1-61779-361-5_13
Download citation
DOI: https://doi.org/10.1007/978-1-61779-361-5_13
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-61779-360-8
Online ISBN: 978-1-61779-361-5
eBook Packages: Springer Protocols