DiffNet: Automatic differential functional summarization of dE-MAP networks
Introduction
High-throughput mapping of genetic interaction networks of a set of genes is an important and emergent research problem [5]. The networks constructed with these methods, however, only represent a static “snapshot” of the genetic interaction map under a particular context or condition. Recent studies have shown that genetic interaction maps are in fact dynamic and context-dependent [18]. Consequently, there is a growing interest in studying the system-wide responses of interaction networks following environmental or condition change [15], [10]. For instance, one may be interested in elucidating the genetic interaction differences between cancer cells and normal cells. Specifically, some interactions may appear or disappear in the disease state, intensity of some interactions may alleviate or aggravate when in disease state compared to healthy condition, and others may remain strong irrespective of the state.
One representative method that has been recently proposed for mapping the genetic interaction responses following environment change is the dE-MAP approach [2]. In this method, two static gene interaction networks [5] for each condition are first obtained using the epistatic miniarray profile (E-MAP) approach [17], which constructs a quantitative genetic interaction landscape of Saccharomyces cerevisiae by first identifying a set of genes of interest. Double mutant strains of all pairwise genes from this set of genes are then grown and their colony size measured. Genetic interaction occurs between a pair of mutant genes when one observes greater or lesser than expected colony growth rate when compared to their respective single mutant strains. When the growth rate is greater than expected, the interaction is deemed positive (alleviating); when it is lesser, it is deemed negative (aggravating). Using the two static E-MAP networks, a differential network (dE-MAP network) is then computed that maps the interaction differences between the two static networks. For example, in [2], S. cerevisiae E-MAP networks are obtained for cells grown under two conditions: (a) cells which are treated with methyl methanesulfonate (MMS), a well known DNA-damaging agent and (b) cells which are untreated. Large-scale genetic interaction network among 418 yeast genes are quantitatively extracted using the E-MAP method under the MMS-treated condition (stressed) and untreated condition (unstressed) and the differential network that maps the genetic interaction changes due to MMS challenge is computed. Fig. 1 depicts an example of a differential network (partial view) that is obtained from two static E-MAP networks under MMS-treated and untreated condition.
Naturally, it is important to analyze this differential network to investigate the system-wide impact of the DNA-damaging agent on the functional roles of various components. Consequently, the authors obtained physical protein–protein interactions corresponding to these genes and performed graph clustering to find protein complexes1 enriched with differential interactions. The functional identity of each cluster is then manually2 determined. Particularly, the authors concluded that these complexes tend to be stable across conditions and differential interactions largely lie between complexes, rather than within complexes. Unfortunately, modules constructed in this manner poorly represent the functional responses of the differential network. Hence, to find a functional response, the authors manually selected a subset of 31 genes associated with DNA repair to test for differential interaction enrichment, concluding that DNA repair is a pertinent functional response following MMS-treatment. However, it is time-consuming, laborious and error-prone to perform large-scale analysis of dE-MAP interactomes to map all pertinent functional responses. In this paper, we propose a novel technique called DiffNet that addresses this impediment by automatically constructing a high quality differential summary of two E-MAP networks under environmental change. Fig. 2 highlights some of these functional modules that are differentially effected by the DNA-damaging agent.
At first glance, the aforementioned failure of traditional graph clustering techniques to capture differential summaries in its modules may seem surprising. However, as we shall see in Section 4, these techniques are largely designed for static networks and are less suitable for differential networks that contain both positive and negative weights. Furthermore, since most methods rely solely on topology of the network, there is also no guarantee that each cluster corresponds well to a representative biological function response. In fact, as remarked earlier, in [2] the functional identity of each cluster following graph clustering is manually determined. Furthermore, the authors failed to assign function to a significant number of these clusters.
In fact, algorithms that perform genome-wide functional analysis of gene responses under multiple conditions have been proposed in the literature [19], [20], [9]. Particularly, these approaches perform functional analysis based on the expression levels of genes. In contrast, in our problem we focus on genome-wide functional analysis of the gene interactions and their responses.
Given the differential network generated from dE-MAP interactions, DiffNet greedily constructs a differential summary comprising of a set of skewed and coherent functional subgraphs, representing significant functional responses following environment or condition change. Specifically, it leverages Gene Ontology (go) annotations to identify these functional subgraphs, each of which represents a group of interactions corresponding to a specific biological function. A key characteristic of these functional subgraphs is that the interactions together respond significantly in one direction, either positively or negatively, to the condition change. That is, unlike standard graph clustering methods, DiffNet is specifically designed to handle differential interactions, which can be positively or negatively weighted. Fig. 3 illustrates the idea of the DiffNet algorithm. We shall elaborate on it in the next section.
Section snippets
Summary of proposed method
DiffNet is a novel data-driven algorithm that automatically summarize a dE-MAP network to obtain a high-level map of functional responses due to condition change.
- •
Input: A dE-MAP network.
- •
Output: A high-level summary of functional responses (both positive and negative responses) due to condition change.
- •
Tools used in the proposed method: Scala.
- •
Databases, if any, used in the proposed method: Gene Ontology Annotations dataset (goa).
Constructing differential networks
The set of genes of interest together with their genetic interactions can be modeled as a gene–gene interaction network, denoted by , where V is a set of genes selected for E-MAP study, E denotes the pairwise interactions between genes, and w is a function that assigns each pairwise interaction a weight that represents its interaction strength. In E-MAP studies, of is given by its genetic interaction score S-score [17]. A positive S-score indicates the degree of
Results
The DiffNet algorithm is implemented in Scala. We now present experimental results of the performance of DiffNet. The experiments were conducted on a 1.66 GHz Intel Core 2 Duo T5450 machine with 3 GB memory. Unless specified otherwise, we set and .
Conclusions
We propose DiffNet, a novel data-driven algorithm that automatically constructs summaries of differential functional responses of gene interaction networks under environment or condition change. Specifically, it leverages combination of go annotation information and underlying interaction data to greedily identify a set of functional subgraphs that are highly skewed and coherent, representing significant functional responses due to condition change. Our empirical study with a real-world network
References (21)
- et al.
Cell
(2007) - et al.
Methods
(2006) - et al.
BMC Bioinformatics
(2003) - et al.
Science
(2010) - et al.
Bioinformatics
(2004) Math. Oper. Res.
(1979)- et al.
Genome Biol.
(2006) - et al.
Nucleic Acids Res.
(2002) - et al.
Science
(2001) - et al.
Science
(2007)
Cited by (6)
Distributed aggregation-based attributed graph summarization for summary-based approximate attributed graph queries
2021, Expert Systems with ApplicationsCitation Excerpt :For examples, graph summarization in social networks analysis is to investigate the scale-free property of graphs, hop-plots for studying the small world effect, or clustering coefficients for measuring the clumpiness of large graphs (Ravi & Kumar, 2005). It can be also used to analyze the functional relationships in the Alzheimer’s disease network (Seah, Bhowmick, & Dewey, 2014; Seah, Bhowmick, Dewey, & Yu, 2012). Graph summarization can also be naturally coupled with graph visualization methods (Shen et al., 2006; Herman et al., 2000; Wills, 1999) for understanding and visual inspection of large graph data.
Meta-path Enhanced Lightweight Graph Neural Network for Social Recommendation
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Aggregation-Based Attributed Graph Summarization
2020, 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2020Summarizing static and dynamic big graphs
2017, Proceedings of the VLDB EndowmentClustering and Summarizing Protein-Protein Interaction Networks: A Survey
2016, IEEE Transactions on Knowledge and Data Engineering