Hostname: page-component-8448b6f56d-dnltx Total loading time: 0 Render date: 2024-04-24T21:40:27.627Z Has data issue: false hasContentIssue false

Generating weighted and thresholded gene coexpression networks using signed distance correlation

Published online by Cambridge University Press:  16 June 2022

Javier Pardo-Diaz
Affiliation:
Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
Philip S. Poole
Affiliation:
Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, UK
Mariano Beguerisse-Díaz
Affiliation:
Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
Charlotte M. Deane
Affiliation:
Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
Gesine Reinert*
Affiliation:
Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
*
*Corresponding author. Email: reinert@stats.ox.ac.uk

Abstract

Even within well-studied organisms, many genes lack useful functional annotations. One way to generate such functional information is to infer biological relationships between genes or proteins, using a network of gene coexpression data that includes functional annotations. Signed distance correlation has proved useful for the construction of unweighted gene coexpression networks. However, transforming correlation values into unweighted networks may lead to a loss of important biological information related to the intensity of the correlation. Here, we introduce a principled method to construct weighted gene coexpression networks using signed distance correlation. These networks contain weighted edges only between those pairs of genes whose correlation value is higher than a given threshold. We analyze data from different organisms and find that networks generated with our method based on signed distance correlation are more stable and capture more biological information compared to networks obtained from Pearson correlation. Moreover, we show that signed distance correlation networks capture more biological information than unweighted networks based on the same metric. While we use biological data sets to illustrate the method, the approach is general and can be used to construct networks in other domains. Code and data are available on https://github.com/javier-pardodiaz/sdcorGCN.

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Action Editor: Christoph Stadtfeld

A preliminary version of this paper was presented at the Ninth International Conference on Complex Networks and their Applications (COMPLEX NETWORKS 2020).

References

Bar-Joseph, Z., Gerber, G. K., Lee, T. I., Rinaldi, N. J., Yoo, J. Y., Robert, F., … Young, R. A. (2003). Computational discovery of gene modules and regulatory networks. Nature Biotechnology, 21(11), 13371342.CrossRefGoogle ScholarPubMed
Bernhardt, B. C., Chen, Z., He, Y., Evans, A. C., & Bernasconi, N. (2011). Graph-theoretical analysis reveals disrupted small-world organization of cortical thickness correlation networks in temporal lobe epilepsy. Cerebral Cortex, 21(9), 21472157.Google ScholarPubMed
Bolstad, B. M., Irizarry, R. A., Åstrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2), 185193.CrossRefGoogle ScholarPubMed
Bozhilova, L. V., Pardo-Diaz, J., Reinert, G., & Deane, C. M. (2020). COGENT: Evaluating the consistency of gene co-expression networks. Bioinformatics, 09. btaa787.CrossRefGoogle Scholar
Chen, X., Yin, J., Qu, J., & Huang, L. (2018). MDHGI: Matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction. PLoS Computational Biology, 14(8), e1006418.CrossRefGoogle ScholarPubMed
Donges, J. F., Zou, Y., Marwan, N., & Kurths, J. (2009). Complex networks in climate dynamics. The European Physical Journal Special Topics, 174(1), 157179.CrossRefGoogle Scholar
George, G., Singh, S., Lokappa, S. B., & Varkey, J. (2019). Gene co-expression network analysis for identifying genetic markers in Parkinson’s disease-a three-way comparative approach. Genomics, 111(4), 819830.CrossRefGoogle ScholarPubMed
Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., Bennett, H. A., … Friend, S. H. (2000). Functional discovery via a compendium of expression profiles. Cell, 102(1), 109126.CrossRefGoogle Scholar
Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y., & Barkai, N. (2002). Revealing modular organization in the yeast transcriptional network. Nature Genetics, 31(4), 370377.CrossRefGoogle ScholarPubMed
Klimm, F., Toledo, E. M., Monfeuga, T., Zhang, F., Deane, C. M., & Reinert, G. (2020). Functional module detection through integration of single-cell RNA sequencing data with protein–protein interaction networks. BMC Genomics, 21(1), 110.CrossRefGoogle ScholarPubMed
Kothapalli, R., Yoder, S. J., Mane, S., & Loughran, T. P. (2002). Microarray results: How accurate are they? BMC Bioinformatics, 3(1), 22.CrossRefGoogle Scholar
Langfelder, P., & Horvath, S. (2008). WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics, 9(1), 559.CrossRefGoogle Scholar
Lee, H. K., Hsu, A. K., Sajdak, J., Qin, J., & Pavlidis, P. (2004). Coexpression analysis of human genes across many microarray data sets. Genome Research, 14(6), 10851094.CrossRefGoogle ScholarPubMed
Li, H., Wang, Y., Jiang, J., Zhao, H., Feng, X., Zhao, B., & Wang, L. (2019). A novel human microbe-disease association prediction method based on the bidirectional weighted network. Frontiers in Microbiology, 10, 676.CrossRefGoogle ScholarPubMed
Magwene, P. M., & Kim, J. (2004). Estimating genomic coexpression networks using first-order conditional independence. Genome Biology, 5(12), R100.CrossRefGoogle ScholarPubMed
Makrodimitris, S., Reinders, M. J. T., & van Ham, R. C. H. J. (2020). Metric learning on expression data for gene function prediction. Bioinformatics, 36(4), 11821190.Google ScholarPubMed
Pardo-Diaz, J., Bozhilova, L. V., Beguerisse-Daz, M., Poole, P. S., Deane, C. M., & Reinert, G. (2021). Robust gene coexpression networks using signed distance correlation. Bioinformatics, 02. btab041.CrossRefGoogle Scholar
Petryszak, R., Keays, M., Tang, Y. A., Fonseca, N. A., Barrera, E., Burdett, T., … Brazma, A. (2016). Expression atlas update–an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Research, 44(D1), D746D752.CrossRefGoogle Scholar
Prieto, C., Risueño, A., Fontanillo, C., & De Las Rivas, J. (2008). Human gene coexpression landscape: Confident network derived from tissue transcriptomic profiles. PloS One, 3(12), e3911.CrossRefGoogle ScholarPubMed
Song, F., Cui, C., Gao, L., & Cui, Q. (2019). miES: Predicting the essentiality of miRNAs with machine learning and sequence features. Bioinformatics, 35(6), 10531054.Google ScholarPubMed
Stuart, J. M., Segal, E., Koller, D., & Kim, S. K. (2003). A gene-coexpression network for global discovery of conserved genetic modules. Science, 302(5643), 249255.CrossRefGoogle Scholar
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 27692794.CrossRefGoogle Scholar
Szklarczyk, D., Gable, A. L., Lyon, D., Junge, A., Wyder, S., Huerta-Cepas, J., … von Mering, C. (2019). STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1), D607D613.CrossRefGoogle ScholarPubMed
Ucar, D., Neuhaus, I., Ross-MacDonald, P., Tilford, C., Parthasarathy, S., Siemers, N., & Ji, R.-R. (2007). Construction of a reference gene association network from multiple profiling data: application to data analysis. Bioinformatics, 23(20), 27162724.CrossRefGoogle ScholarPubMed
van Noort, V., Snel, B., & Huynen, M. A. (2003). Predicting gene function by conserved co-expression. TRENDS in Genetics, 19(5), 238242.CrossRefGoogle ScholarPubMed
Wang, G.-J., Xie, C., & Stanley, H. E. (2018). Correlation structure and evolution of world stock markets: Evidence from Pearson and partial correlation-based networks. Computational Economics, 51(3), 607635.CrossRefGoogle Scholar
Weirauch, M. T. (2011). Gene coexpression networks for the analysis of DNA microarray data. Applied Statistics for Network Biology: Methods in Systems Biology, 1, 215250.CrossRefGoogle Scholar
Wolfe, C. J., Kohane, I. S., & Butte, A. J. (2005). Systematic survey reveals general applicability of “guilt-by-association” within gene coexpression networks. BMC Bioinformatics, 6(1), 227.CrossRefGoogle ScholarPubMed