Introduction

Molecular interaction networks are ubiquitous in biological systems. Over the last decade, interaction networks have advanced our systems-level understanding of biology1. Further, they have enabled discovery of biologically significant, yet previously unmapped relationships2, including drug–target interactions (DTIs)3, drug–drug interactions (DDIs)4, protein–protein interactions (PPIs)5, and gene–disease interactions (GDIs)6. To assist in these discoveries, a plethora of computational methods, primarily optimized for link prediction from networks (e.g.,7), were developed to predict new interactions in molecular networks. Recently, deep learning on graphs has emerged as a dominant class of methods that have revolutionized state-of-the-art in learning and reasoning over network datasets. These methods, often referred to as graph neural networks (GNNs)8 and graph convolutional networks (GCNs)9,10, operate by performing a series of non-linear transformations on the input molecular network, where each transformation aggregates information only from immediate neighbors, i.e., direct interactors in the network. While these methods yield powerful predictors, they explicitly take into account only direct similarity between nodes in the network. Therefore, GNNs are limited at fully capturing important information for prediction that resides further away from a particular interaction in the network that we want to predict11.

Indirect similarity between nodes that do not directly interact, e.g., the similarity in second-order interactions, has proved incredibly useful across a variety of molecular networks, including genetic interaction and protein–protein interaction networks12,13,14,15. This is because interactions can exist between nodes that are not necessarily similar, as illustrated in Fig. 1. For example, in a drug–target interaction (DTI) network, an edge indicates that a drug binds to a target protein. Thus, two drugs are similar because they bind to the same target protein. In contrast, a drug and a target protein are not biologically similar, although they are connected by an edge in the DTI network. This example illustrates the importance of second-order interactions, which we refer to as skip similarity (Fig. 1). For this reason, we need GNNs to predict molecular interactions, not only via direct interactions but also via similarity in second-order interactions.

Figure 1
figure 1

Direct versus skip similarity. (Left) Traditionally, an interaction between nodes A and B implies that A and B are similar and vice versa16. (Right) In contrast, in molecular interaction networks, directly interacting entities are not necessarily similar, which has been observed in numerous networks, including genetic interaction networks12,13 and protein–protein interaction networks14,15.

Present work

Here, we present SkipGNN, a graph neural network (GNN) method for the prediction of molecular interactions. In contrast to existing GNNs, such as GCN9, SkipGNN specifies a neural architecture, in which neural messages are passed not only via direct interactions, referred to as direct similarity, but also via similarity in second-order interactions, referred to as skip similarity (Fig. 1). Importantly, while the principle of skip similarity governs many types of molecular interaction networks, popular GNN methods fail to capture the principle. Because of that, as we show here, they cannot fully utilize molecular interaction networks. SkipGNN takes as input a molecular interaction network and uses it to construct a skip graph. This second-order network representation captures the skip similarity. SkipGNN then uses both the original graph (i.e., the input interaction network) and the skip graph to learn what is the best way to propagate and transform neural messages along edges in each graph to optimize for the discovery of new interactions.

We evaluate SkipGNN on four types of interaction networks, including two homogeneous networks, i.e., drug–drug interaction and protein–protein interaction networks, and two heterogeneous networks, i.e., drug–target interaction and gene–disease interaction networks. SkipGNN outperforms baselines that use random walks, shallow network embeddings, spectral clustering, network metrics and various state-of-the-art graph neural networks11,15,17,18,19,20,21.

By examining SkipGNN ’s performance in increasingly harder prediction settings when large fractions of interactions are removed from the network, we find that SkipGNN achieves robust performance. In particular, across all interaction networks, SkipGNN consistently outperforms all baseline methods, even when interaction networks are highly incomplete (“Predicting molecular interactions, Robust learning on incomplete interaction networ” section). We find that the robust performance of SkipGNN can be explained by the spectral property of skip graph, as it can preserve network structure in the face of incomplete interaction information (Supplementary D), which is also confirmed experimentally (“Ablation studies” section).

Further, we examine embeddings learned by SkipGNN and find that SkipGNN learns biologically meaningful embeddings, whereas a regular GCN does not (“SkipGNN learns meaningful embedding spaces” section). For example, when analyzing a drug–target interaction network, SkipGNN generates the embedding space in which drugs are generally separated from most of proteins while still being positioned close to the proteins to which they directly bind. Lastly, in the case of the drug–drug interaction network, we use the literature search to find evidence for SkipGNN ’s novel drug–drug interaction predictions (“Investigation of SkipGNN’s novel predictions” section).

Related work

Existing link prediction methods belong to one of the following categories. (1) Heuristic or mechanistic methods (e.g.,15,22,23,24) calculate an index similarity score to measure the probability of a link given the network structure around the two target nodes, such as Preferential Attachment (PA)25 and Local Path Index (LP)26. However, these methods usually make strong assumptions about the network structure and hence suffer from instability of performance15,22. (2) Direct embedding methods generate embeddings for every node in the network capturing the node’s local network topology (e.g.,27,28,29). A popular approach is to use random walks with a skip-gram model, such as DeepWalk17, node2vec18, and LINE30. The other popular approach leverages the spectral graph theory to generate a spectral embedding such as spectral clustering20. The generated node embeddings are then fed into a decoder classifier to predict the link existing probability. (3) Neural embedding methods, such as Graph Neural Networks (GNNs)9,31, Variational Graph Autoencoders (VGAE)32,33, and Graph Attention Networks (GAT)10 use neighborhood message passing scheme to generate node embeddings and these embeddings are directly optimized in an end-to-end manner by a link prediction loss (e.g., cross-entropy). GNNs are a powerful class of models in capturing complicated graph topology. Typically, an L-layers GNN is able to propagate information of nodes in the L-hop neighborhoods9,21. However, the messages of nodes farther away from the central node have discounted propagation power. Thus, the vanilla GNN is limited at capturing skip similarity, which is from second-hop neighbors. In contrast, SkipGNN utilizes an additional skip-graph to fully exploit this important quality for biomedical interaction network. Notably, there are recent advancements in GNN such as MixHop11, JK-Net34 which are designed to capture higher order graph structures through skip connections and higher order adjacency matrix. However, they are motivated by general network model and does not propose a solution for the specific challenge of 2-hop skip similarity in biomedical network.

In molecular interaction networks, the goal is to predict if a given pair of biomedical entities such as proteins, drugs or diseases will interact. We can divide methods for interaction prediction into three main groups. (1) Structural representation learning generates embeddings for each entity using the entity’s structural representation, such as a compound’s molecular graph or a protein’s amino acid sequence. The embeddings of two entities are then combined and fed into a decoder for prediction. For example35,36,37, use graph-convolutional (GCN) and convolutional (CNN) networks on molecular graphs and gene sequence data to predict binding of drugs to target proteins. Similarly38,39,40, learn embedding for drugs and concatenate embeddings of drug pairs to predict drug–drug interactions. (2) Similarity-based learning is based on the assumption that entities with similar interaction patterns are likely to interact. These methods devise a similarity measure (e.g., a graphlet-based signature of proteins in the PPI network41) and then use the measure to predict interactions based on how similar a candidate interaction is to known interactions. A variety of techniques are used to aggregate similarity values and score interactions, including matrix factorization42, clustering43, and label propagation44. (3) Finally, network relational learning views the task as a network completion problem. It uses network structure together with side information about nodes to complete the network and predict interactions4,33,45. SkipGNN belongs to the structural representation learning category.

Preliminaries on graph neural networks (GNNs)

Next, we describe graph neural networks as they are one of the state-of-the-art models for link prediction and are also the focus of our study. The input to a GNN is the network, represented by its adjacency matrix \({\mathbf {A}}\). Most often, the goal (output) of the GNN is to learn an embedding for each node in the network by capturing the network structure as well as node attributes. GNN can be represented as a series of neighborhood aggregations layers (e.g.,9): \({\mathbf {H}}^{(l+1)} = \sigma (\widetilde{{\mathbf {D}}}^{-\frac{1}{2}} \widetilde{{\mathbf {A}}}\widetilde{{\mathbf {D}}}^{-\frac{1}{2}} {\mathbf {H}}^{(l)}{\mathbf {W}})\), where \({\mathbf {H}}^{(l)}\) is a matrix of node embeddings at the lth layer, \({\mathbf {H}}^{(0)}\) are input node attributes, \({\mathbf {W}}\) is a trainable parameter matrix, \(\sigma\) is a non-linear activation function, and \(\widetilde{{\mathbf {D}}}\) and \(\widetilde{{\mathbf {A}}}\) are the renormalized degree and adjacency matrices, defined as: \(\widetilde{{\mathbf {A}}} = {\mathbf {A}} + {\mathbf {I}}\) and \(\widetilde{{\mathbf {D}}}_{ii} = \sum _j \widetilde{{\mathbf {A}}}_{ij}\) (\({\mathbf {I}}\) is the identity matrix). The GNN propagates information across network neighborhoods and transforms the information in a way that is most useful for a downstream prediction tasks, such as link prediction.

Methods

SkipGNN is a graph neural network uniquely suited for molecular interactions. SkipGNN takes as input a molecular interaction network and uses it to construct a skip graph, which is a second-order network representation capturing the skip similarity. SkipGNN then specifies a novel graph neural network architecture that fuses the original and the skip graph to accurately and robustly predict new molecular interactions. Notations are described in Table 1.

Table 1 Notation used in SkipGNN.

Problem formulation

Consider an interaction network G on N nodes representing biomedical entities \({\mathscr {V}}\) (e.g., drugs, proteins, or diseases) and M edges \({\mathscr {E}}\) representing interactions between the entities. For example, G can be a drug–target interaction network recording information on how drugs bind to their protein targets3. For every pair of entities i and j, we denote their interaction with a binary indicator \({e}_{ij} \in \{0,1\}\), indicating the experimental evidence that i and j interact (i.e., \({e}_{ij}=1\)) or the absence of evidence for interaction (i.e., \({e}_{ij}=0\)). We denote the adjacency matrix of G as \({\mathbf {A}}\), where \({\mathbf {A}}_{ij}\) is 1 if nodes i and j are connected (\({e}_{ij}=1\)) in the graph and otherwise 0 (\({e}_{ij}=0\)). Further, \({\mathbf {D}}\) is the degree matrix, a diagonal matrix, where \({\mathbf {D}}_{ii}\) is the degree of node i.

Problem

(Molecular Interaction Prediction) Given a molecular interaction network \(G=({\mathscr {V}}, {\mathscr {E}})\), we aim to learn a mapping function \(f: {\mathscr {E}} \rightarrow [0, 1]\) from edges to probabilities such that f(ij) optimizes the probability that nodes i and j interact.

Construction of the skip graph

Next, we describe skip graphs, the key novel representation of interaction networks that allow for effective use of GNNs for predicting interactions. We realize Skip similarity by encouraging the GNN model to embed skipped nodes close together in the embedding space. To do that, we construct skip graph \(G_s\), in two-hop neighbors are connected by edges. This construction creates paths in \(G_s\) along which neural messages can be exchanged between the skipped nodes.

Formally, we use the following operator to obtain the skip graph’s adjacency matrix \({\mathbf {A}}_{{s}}\):

$$\begin{aligned} {\mathbf {A}}_{{s}}^{ij} = \left\{ \begin{array}{ll} 1~~\mathrm {if}~\exists ~k~\mathrm {s.t.}~{(i,k)}~\in ~{\mathscr {E}} ~\mathrm {and}~{(k,j)}~\in ~{\mathscr {E}}\\ 0~~\mathrm {otherwise.} \end{array} \right. \end{aligned}$$

The corresponding degree matrix is \({\mathbf {D}}_{{s}}^{ii} = \sum _{j} {\mathbf {A}}_{{s}}^{ij}.\) An efficient way to implement the skip graph is through matrix multiplication:

$$\begin{aligned} {\mathbf{A}}_{{s}} = {\mathrm{sign}}({\mathbf {A}}{\mathbf{A}}^{\mathrm{T}}), \end{aligned}$$
(1)

where \(\mathrm {sign}(x)\) is the sign function, \(\mathrm {sign}(x) = 1\) if \(x > 0\) and 0 otherwise, which is applied element-wise on \({\mathbf{AA}}^{{\text{T}}}\). It counts the number of two-hop paths from node \(\mathrm {i}\) to \(\mathrm {j}\). Hence, if an entry for node \(\mathrm {i}, \mathrm {j}\) in \({\mathbf{AA}}^{{\text{T}}}\) is larger than 0, it means there exists a skipped node between node ij. Then, we convert the positive entry into 1 to construct the skip graph’s adjacent matrix. Given this skip graph, we proceed to describe the full SkipGNN model.

The SkipGNN  model

In this section, we describe how we leverage the skip graph for link prediction. After we generate the novel skip graph from “Construction of the skip graph” section, we propose an iterative fusion scheme for SkipGNN  to allow the skip graph and the original graph to learn from each other for better integration. Lastly, a decoder is used to output a probability measuring if the given pair of molecular entities interact.

Iterative fusion

We want a model to automatically learn how to balance between direct similarity and skip similarity in the final embedding. We design an iterative fusion scheme with aggregation gates to combine both similarity information. The motivation is that to represent biomedical entity to its fullest extent, node embedding must capture its complicated bioactive functions with skip/direct similarities. Hence, instead of simply concatenating the output node embeddings from the GNN output of the original graph G that captures direct similarity and skip graph \(G_s\) that captures skip similarity, we allow two GNNs on G and \(G_s\) to interact with each other iteratively via the following propagation rules (see Fig. 2):

$$\begin{aligned} {\mathbf {H}}^{(l+1)} = \sigma (\mathrm {AGG}({\mathbf {F}}{\mathbf {H}}^{(l)} {\mathbf {W}}_{o}^{(l)},{\mathbf {F}}_{{s}} {\mathbf {S}}^{(l)}{\mathbf {W}}_{o}^{'{(l)}})) , \quad {\mathbf {S}}^{(l+1)} = \sigma (\mathrm {AGG}({\mathbf {F}}_{{s}} {\mathbf {S}}^{(l)}{\mathbf {W}}_{{s}}^{(l)},{\mathbf {F}}{\mathbf {H}}^{(l+1)} {\mathbf {W}}_{{s}}^{'(l)})), \end{aligned}$$
(2)

where \({\mathbf {F}} = \widetilde{{\mathbf {D}}}^{-\frac{1}{2}}\widetilde{{\mathbf {A}}} \widetilde{{\mathbf {D}}}^{-\frac{1}{2}}, \quad {\mathbf {F}}_{{s}} = \widetilde{{\mathbf {D}}}_{\mathrm {s}}^{-\frac{1}{2}} \widetilde{{\mathbf {A}}}_{{s}}\widetilde{{\mathbf {D}}}_{{s}}^{-\frac{1}{2}}.\) Here, \({\mathbf {H}}^{(l)}, {\mathbf {S}}^{(l)}\) are node embeddings at the lth layer from direct similarity graph G and skip similarity graph \(G_S\), respectively. \({\mathbf {F}}, {\mathbf {F}}_{{s}}\) are the re-normalized adjacency matrices from direct similarity and skip similarity, respectively. And \({\mathbf {W}}_{o}^{(l)}, {\mathbf {W}}_{o}^{'(l)}, {\mathbf {W}}_{{s}}^{(l)}, {\mathbf {W}}_{{s}}^{'(l)}\) are the transformed weights for layer l. \({\mathbf {H}}^{(0)}\) and \({\mathbf {S}}^{(0)}\) are set to be \({\mathbf {X}}\), the input node attributes generated from node2vec. The aggregate gate \(\mathrm {AGG}\) in Eq. (2) can be a summation, a Hadamard product, max-pooling, or some other aggregation operator46. Empirically, we find that summation gate has the best performance. \(\sigma ()\) is the activation function and we use \(\mathrm {ReLU}(\cdot ) = \max (\cdot , 0)\) to add non-linearity in the propagation.

Figure 2
figure 2

Neural architecture of SkipGNN. (Left) SkipGNN  constructs skip graph \(G_s\) (denoted by adjacency matrix \({\mathbf {A}}_{s}\)) based on the input graph G (denoted by adjacency matrix \({\mathbf {A}}\)) using Eq. (1). (Middle) Initial node embeddings, \({\mathbf {H}}^{(0)}\) and \({\mathbf {S}}^{(0)}\), are specified using side information (e.g., gene expression vectors if nodes represent genes) or generated using node2vec18. In SkipGNN, node embeddings are then propagated along edges of \(G_s\) and G and transformed through a series of computations (layers), which output powerful embeddings that can then be used for downstream prediction of interactions. Illustrated is a two-layer iterative fusion scheme. In the first layer, two GNNs with parameter weight matrices \({\mathbf {W}}_{o}^{(0)}\) and \({\mathbf {W}}_{s}^{(0)}\) (operating on \({\mathbf {A}}\) and \({\mathbf {A}}_{s}\), respectively) are fused via weight matrices \({\mathbf {W}}_{o}^{'(0)}\) and \({\mathbf {W}}_{s}^{'(0)}\) based on Eq. (2). This completes computations in the first layer of SkipGNN, producing embeddings \({\mathbf {H}}^{(1)}\) and \({\mathbf {S}}^{(1)}\). In the second layer, those embeddings are transformed via \({\mathbf {W}}_{o}^{(1)}\) and \({\mathbf {W}}_{s}^{(1)}\) using Eq. (3), resulting in final embeddings \({\mathbf {E}}\). (Right) Embeddings \({\mathbf {E}}_i\) and \({\mathbf {E}}_j\) of target nodes i and j are retrieved, concatenated, and then fed into a decoder (parameterized by \({\mathbf {W}}_{d}\)). Decoder returns \({p}_{ij}\), representing the probability that nodes i and j interact.

In each iteration, the node embedding for original graph \({\mathbf {H}}^{(l+1)}\) is first updated with its previous layer’s node embedding \({\mathbf {H}}^{(l)}\), combined with skip graph embedding \({\mathbf {S}}^{(l)}\). After obtaining the updated original graph embedding \({\mathbf {H}}^{(l+1)}\), we then update the skip graph embedding \({\mathbf {S}}^{(l+1)}\) in a similar fashion.

This update rule is very different from simple concatenation as it is an iterative process where each update of the node embedding for each graph is affected by the most recent node embedding from both graphs. This way, two embedding are learned to find the best dependency structure between each other and fuse into one final embedding instead of a simple concatenation. In the last layer, final node embedding \({\mathbf {E}}\) is obtained through:

$$\begin{aligned} {\mathbf {E}} = \mathrm {AGG}({\mathbf {F}}{\mathbf {H}}^{({1})} {\mathbf {W}}_{o}^{({1})},~{\mathbf {F}}_{{s}} {\mathbf {S}}^{({1})}{\mathbf {W}}_{s}^{({1})}), \end{aligned}$$
(3)

where (1) is the index for the last layer and \(\mathrm {AGG}\) is the summation gate. As in the motivation, we are interested only in up to second order neighbors, thus we use two layers GNN, see Fig. 2. We don’t use activation function here as it does not require an extra non-linear transformation to be fed into the decoder network. Empirically, we show this fusion scheme boosts predictive performance in “Ablation studies” section.

SkipGNN decoder

Given the target nodes (ij) and their corresponding node embedding \({\mathbf {E}}_{i}, {\mathbf {E}}_{j}\), we implement a neural network as a decoder to first combine \({\mathbf {E}}_{i}, {\mathbf {E}}_{j}\) to obtain an input embedding through a \(\mathrm {COMB}\) function (e.g., concatenation, sum, Hadamard product). Then, the combined embedding is fed into a neural network parametrized by weight \({\mathbf {W}}_{d}\) and bias b as a binary classifier to obtain probability \({p}_{ij}\):

$$\begin{aligned} {p}_{ij} = \sigma ({\mathbf {W}}_{d} \mathrm {COMB}({\mathbf {E}}_{i}, {\mathbf {E}}_{j}) + {b}), \end{aligned}$$
(4)

where \({p}_{ij}\) represents the probability that nodes i and j interact (i.e., f(ij). We use concatenation as the \(\mathrm {COMB}\) function as it consistently yield the best performance across different types of networks.

figure a

The SkipGNN algorithm

The overall algorithm is shown in Algorithm 1. Here, we only leverage accessible network information (adjacent matrix \({\mathbf {A}}\) of the network G) to predict links. In all experiments, we initialize embeddings using node2vec18 as: \({\mathbf {X}} = \mathrm {node2vec}({\mathbf {A}}).\)

Second, we construct the skip graph with adjacent matrix \({\mathbf {A}}_{s}\) via Eq. (1) to capture the skip-similarity principle. Next, at every step, a mini-batch of interaction pairs \({\mathscr {M}}\) with labels y is sampled. Then, two graph convolutions networks are used for the original graph and the skip graph respectively. In the propagation step, we use iterative fusion (Eq. (2)) to naturally combine embeddings convolved on the original graph and on the skip graph, corresponding to direct and skip similarity, respectively. In the last layer, embeddings are stored in \({\mathbf {E}}\). We then retrieve the embeddings for each node in the mini-batched pairs \({\mathscr {M}}\) and concatenate them to feed into decoder (Eq. (4)).

During training, we optimize the SkipGNN ’s parameters \({\mathbf {W}}_{o}^{(l)}\), \({\mathbf {W}}_{{o}}^{'(l)}, {\mathbf {W}}_{{s}}^{(l)}\), \({\mathbf {W}}_{{s}}^{'(l)}\), \({\mathbf {W}}_{d}\), b in an end-to-end manner through a binary cross-entropy loss: \({\mathscr {L}} = \sum _{{(i,j)} \in {\mathscr {M}}} {y}_{ij}~\mathrm {log}~{p}_{ij} + (1 - {y}_{ij}) ~\mathrm {log}~({1-{p}_{ij}}),\) where \({y}_{ij}\) is the true label for nodes i and j that are sampled during training via mini-batching, \({(i,j)} \in {\mathscr {M}}\), and \({\mathscr {M}}\) is a mini-batch of interaction pairs. After the model is trained, it can be used to make predictions. Given two entities i and j, the model predicts probability f(ij) that i and j interact.

Results

We conduct a variety of experiments to investigate the predictive power of SkipGNN (“Predicting molecular interactions” section). We then study the method’s robustness to noise and missing data (“Robust learning on incomplete interaction networ” section) and demonstrate the skip similarity principle (“SkipGNN learns meaningful embedding spaces” section). Next, we conduct ablation studies to examine contributions of each of SkipGNN ’s components towards the final SkipGNN performance (“Ablation studies” section). Finally, we investigate novel predictions made by SkipGNN (“Investigation of SkipGNN’s novel predictions” section).

Data and experimental setup

Next we provide details on molecular interaction datasets, baseline methods, and experimental setup.

Molecular interaction networks

We consider four publicly-available network datasets. (1) BIOSNAP-DTI47 contains 5,018 drugs that target 2,325 protein through 15,139 drug–target (DTI) interactions. (2) BIOSNAP-DDI47 consists of 48,514 drug–drug interactions (DDIs) between 1,514 drugs extracted from drug labels and biomedical literature. (3) HuRI-PPI48 is the human reference protein–protein interaction network generated by multiple orthogonal the high-throughput yeast two-hybrid screens. We use HI-III network, which has 5,604 proteins and 23,322 interactions. (4) Finally, we consider DisGeNET-GDI49 collects curated gene–disease interactions (GDIs) from GWAS studies, animal models and scientific literature. The dataset has 81,746 interactions between 9,413 genes and 10,370 diseases. Dataset statistics are described in Table 2.

Table 2 Data statistics. ‘A’ indicates average node degree.

SkipGNN implementation and hyperparameters

We implemented SkipGNN  using PyTorch deep learning framework (The source code implementation of SkipGNN is available at https://github.com/kexinhuang12345/SkipGNN). We use a server with 2 Intel Xeon E5-2670v2 2.5GHZ CPUs, 128GB RAM and 1 NVIDIA Tesla P40 GPU. We set optimization parameters as follows: learning rate is 5e−4 using the Adam optimizer50, mini-batch size is \(|{\mathscr {M}}| = 256\), epoch size is 15, and dropout rate is 0.1. We set hyper-parameters using 10 runs random search based on best average prediction performance on validation set of DTI task. We find the setup is robust in other datasets. The ranges of hyper-parameters are set as follows: learning rate: [1e−3, 5e−4, 1e−4, 5e−5]; mini-batch size [32, 64, 128, 256, 512]; dropout rate [0, 0.05, 0.1, 0.2]; hidden size [16, 32, 64, 128]. Specifically, we set hidden size in the first layer as \(d^{(1)}=64\) and hidden size in the second layer as \(d^{(2)}=16\).

Baseline methods

We compare SkipGNN to seven powerful predictors of molecular interactions from network science and graph machine-learning fields. From machine learning, we use three direct network embedding methods: DeepWalk17, node2vec18, and we also include struc2vec19. The latter method is conceptually distinct by leveraging local network structural information, while the former methods use random walks to learn embeddings for nodes in the network. Further, we examine five graph neural networks: VGAE32, GCN9, GIN21, JK-Net34 and MixHop11. They all use the same input encoding as SkipGNN. From network science, we consider Spectral Clustering20. We also use L315 heuristic, which was recently shown to outperform over 20 network science methods for the problem of PPI prediction. Further details on baseline methods, their implementation and parameter selection are in supplementary.

Experimental setup

In all our experiments, we follow an established evaluation strategy for link prediction (e.g.,4,51). We divide each dataset into train, validation, and test sets in a 7:1:2 ratio, which yields positive examples (molecular interactions). We generate negative counterparts by sampling the complement set of positive examples. The cardinality of negative samples are set to be the same as positive data points. For every experiment, we conduct five independent runs with different random splits of the dataset. We select the best performing model based on the loss value on the validation set. The performance of selected model is calculated on the test set. To calculate prediction performance, we use: (1) area under precision-recall curve (PR-AUC): \(\text {PR-AUC} = \sum _{k = 1}^{n} \mathrm {Prec}(k) \Delta \mathrm {Rec}(k),\) where k is kth precision/recall operating point (\(\mathrm {Prec}(k), \mathrm {Rec}(k)\)); and (2) area under the receiver operating characteristics curve (ROC-AUC): \(\text {ROC-AUC} = \sum _{k = 1}^{n} \mathrm {TP}(k) \Delta \mathrm {FP}(k),\) where k is kth true-positive and false-positive operating point (\(\mathrm {TP}(k), \mathrm {FP}(k)\)). Higher values of PR-AUC and ROC-AUC indicate better predictive performance. In addition to the PR-AUC and ROC-AUC, we rank each method in each dataset based on its PR-AUC and provide the average rank of a method across four datasets. The rank suggests the overall performance of the method compared to others. To further show the performance gain of SkipGNN, we resort to statistical test. For each method, we take the ROC-AUC and PR-AUC of each run for each dataset as the data samples. Then, we compute the p value for Wilcoxon signed-rank test between SkipGNN and the compared method.

Predicting molecular interactions

We start by evaluating SkipGNN on four distinct types of molecular interactions, including drug–target interactions, drug–drug interactions, protein–protein interactions, and gene–disease interactions, and we then compare SkipGNN ’s performance to baseline methods.

In each interaction network, we randomly mask 30% interactions as the holdout validation (20%) and test (10%) sets. The remaining 70% interactions are used to train the SkipGNN and each of the baselines. After training, each method is asked to predict whether pairs of entities in the test set will likely interact.

We report results in Table 3 and the method rank, along with the p values for statistical test are provided in Table 4. We see that SkipGNN  is the top performing method out of 11 methods across all molecular interaction networks. SkipGNN has the best predictive performance for DTI and PPI datasets and has the second best performance in DDI and GDI datasets, with an average rank of 1.5. In contrast, the best performing baseline MixHop has average rank of 2.5, as it sometimes is worse than JK-Net and GIN. We also see that SkipGNN’s improvement over all baselines is statistically significant (\(<.05\)). To show the usefulness of skip graph, we compare with GCN-backend baselines GCN and VGAE. We see up to 2.7% improvement of SkipGNN  over GCN and up to 8.8% improvement over VGAE on PR-AUC. Since GCN and VGAE can only use direct similarity, this finding provides evidence that considering skip similarity and direct similarity together, as is made possible by SkipGNN, is important to be able to accurately predict a variety of molecular interactions. Compared to direct embedding methods, SkipGNN has up to 28.8% increase over DeepWalk, 20.4% increase over node2vec, and 15.6% over spectral clustering on PR-AUC. These results support previous observations4 that graph neural networks can learn more powerful network representations than direct embedding methods. Finally, all baselines vary in performance across datasets/tasks while SkipGNN consistently yields the most powerful predictor.

Table 3 Predictive performance. SkipGNN  achieves the best performance across all metrics and tasks compared to baselines. Results of five independent runs on DDI, PPI, DTI and GDI tasks on state of the art link prediction algorithms.
Table 4 Predictive performance ranking and statistical testing. We rank each tested method’s PR-AUC in each dataset and computes the average rank and also computes the performance difference from SkipGNN using Wilcoxon signed-rank test. SkipGNN  has the highest rank compared with 10 other baselines and its performance gain is statistically significant.

Robust learning on incomplete interaction networks

Next, we test SkipGNN ’s performance on incomplete interaction networks. Due to knowledge gaps in biology, many of today’s interaction networks are incomplete and thus it is crucial that methods are robust and able to perform well even when many interactions are missing.

In this experiment, we let each method be trained on 10%, 30%, 50%, and 70% of edges in the DTI, DDI, and PPI datasets and predict on the rest of the data (we use 10% of test edges as validation set for early stopping).

Results in Fig. 3 show that SkipGNN  gives the most robust results among all the methods. In all tasks, SkipGNN achieves strong performance even when having access to only 10% of the interactions. Further, in almost every percentage point, SkipGNN  is better than the baselines. In addition, we see that VGAE is not robust as its performance dropped to around 0.5 PR-AUC in highly-incomplete settings on DTI and DDI tasks. Performance of node2vec and GCN steadily improve as the percentage of seen edges increases. Further, while spectral clustering is robust to incomplete data, its performance varies substantially with tasks. We conclude that SkipGNN  is robust and is especially appropriate for data-scarce networks.

Figure 3
figure 3

Predictive performance as a function of network incompleteness. SkipGNN  provides robust result in varying fraction of missing edges. Fivefold average with 95% confidence interval for PR-AUC against various fractions of missing edges on four prediction tasks: drug–target interaction prediction (DTI), drug–drug interaction prediction (DDI), protein–protein interaction prediction (PPI) and gene–disease interaction prediction (GDI) on node2vec, Spectral Clustering (SC), Variational Graph Auto-Encoder (VGAE), Graph Convolutional Network (GCN), and SkipGNN. We omit DeepWalk as it has similar performance as node2vec. SkipGNN consistently shows the best performance even when networks are highly incomplete.

SkipGNN learns meaningful embedding spaces

Next, we visualize embeddings learned by GCN and SkipGNN in an effort to investigate whether SkipGNN can better capture the structure of interaction networks than GCN. For that, we use DTI and GDI networks in which drugs/diseases are linked to associated proteins/genes. We use t-SNE52 and visualize the learned embeddings in Fig. 4 (DTI network) and Fig. 5 (GDI network). Note that both GCN and SkipGNN uses the same input embedding, which means the only difference is whether or not skip similarity is used.

Figure 4
figure 4

Visualizations of drug–target interaction network. GCN does not distinguish drug and target gene as it only captures direct similarity whereas SkipGNN is able to distinct drug and target gene embeddings, confirming its ability to capture skip similarity. We use GCN and SkipGNN on the drug–target interaction dataset to learn drug/target embeddings, which are visualized using t-SNE.

Figure 5
figure 5

Visualizations of gene–disease interaction network. GCN does not distinguish disease and gene as it only captures direct similarity whereas SkipGNN is able to distinct disease and gene embeddings, confirming its ability to capture skip similarity. We use GCN and SkipGNN on the gene–disease interaction dataset to learn gene/disease embeddings, which are visualized using t-SNE.

First, we observe that GCN cannot distinguish between different types of biomedical entities (i.e., drugs vs. proteins and disease vs. genes). In contrast, SkipGNN can successfully separate the entities, as evidenced by distinguishable groups of points of the same color in the t-SNE visualizations. This observation confirms that SkipGNN has a unique ability to capture the skip similarity whereas GCN cannot. This is because GCN forces embeddings of connected drug-protein/gene–disease pairs to be similar and thus it embeds those pairs close together in the embedding space. However, by doing so, GCN conflates drugs with proteins and genes with diseases. In contrast, SkipGNN generates a biologically meaningful embedding space in which drugs are distinguished from proteins (or, genes from diseases) while drugs are still positioned in the embedding space close to proteins to which they bind (or, in the case of GDI network, diseases are positioned close to relevant disease-associated genes).

We also calculate the silhouette score of the t-SNE plot, which measures the inter-cluster and intra-cluster distance and is used to calculate the goodness of a clustering technique. A higher value indicates that the sample is better matched to its own cluster and poorly matched to neighboring clusters. Here SkipGNN has a silhouette score of 0.114 for DTI whereas GCN has a score of 0.014 for DTI. For GDI, SkipGNN has a score 0.079 and GCN has a score 0.018. The up to 8 times increase in silhouette scores suggest that SkipGNN can better distinguish the entities than GCN.

Further, we find that GCN and its graph convolutional variants cannot capture skip similarity because they aggregate neural messages only from direct (i.e., immediate) neighbors in the interaction network. SkipGNN solves this problem by passing and aggregating neural message from direct as well as in-direct neighbors, thereby explicitly capturing skip similarity.

Ablation studies

To show that each component of SkipGNN has an important role in the final performance of SkipGNN, we conduct a series of ablation studies. SkipGNN has four key components, and we study how the metho performance changes when we remove each of the components:

  • -fusion replaces SkipGNN ’s fusion scheme with a simple concatenation of node embeddings generated by GCN.

  • -skipGraph removes skip graph and degenerates to GCN.

  • -Weighted-L1 uses weighted-L1 gate in Eq. (2) as \(\mathrm {AGG}(A, B) = \vert A-B \vert\), where \(\vert \cdot \vert\) is the absolute value operator.

  • -Hadamard replaces the summation gate with Hadamard operator ‘\(*\)‘ in Eq. (2) such that \(\mathrm {AGG}(A,B) = A * B\).

Table 5 show results of deactivating each of these components, one at a time. We find that -fusion outperforms -skipGraph (i.e., GCN) by a large margin. This finding identifies skip graph as a key driver of performance improvement. Further, we find that our iterative fusion scheme is important, indicating that successful methods need to integrate both direct and skip similarity in interaction networks. Next, we see that weighted \(L_1\) gate has comparable or worse performance than the summation gate and Hadamard operator performs the worst, suggesting that SkipGNN ’s summation gate is the best-performing aggregation function. Altogether, we conclude that all SkipGNN ’s components are necessary for its strong performance.

Table 5 Results of ablation experiments. SkipGNN’s model components setup achieve the best result. Ablation study result of five independent runs on DDI, PPI and DTI tasks.

Investigation of SkipGNN ’s novel predictions

The main goal of link prediction on graphs is to find novel hits that do not exist in the dataset. We conduct a literature search and find SkipGNN is able to discover novel hits. We select pairs that are not interacted in the original dataset but are flagged as interaction from our model. We then pick the top 10 confident interactions and feed them into literature database and see if there are evidence supporting our findings. We find promising result for the DDI task (Table 6). Out of the 10 top-ranked interaction pairs, we are able to find 6 pairs that have literature evidence support.

Table 6 Novel predictions of drug–drug interactions. Shown are top-10 predicted drug–drug interactions together with the relevant literature providing evidence for predictions.

For example, for the interaction between Warfarin and Calozapine53, reports that “Clozapine increase the concentrations of commonly used drugs in elderly like digoxin, heparin, phenytoin and Warfarin by displacing them from plasma protein. This can lead to increase in respective adverse effects with these medications.” Also, the manufacturer59 also reports that “Clozapine may displace Warfarin from plasma protein-binding sites. Increased levels of unbound Warfarin could result and could increase the risk of hemorrhage.” Take another example between Warfarin and Ivacaftor54, conducts a DDI study and reports that “caution and appropriate monitoring are recommended when concomitant substrates of CYP2C9, CYP3A and/or P-gp are used during treatment with Ivacaftor, particularly drugs with a narrow therapeutic index, such as Warfarin.” Finally, we provide the top 10 outputs for DTI, PPI, and GDI tasks in Appendix 3.

Discussion

We introduced SkipGNN, a novel graph neural network for predicting molecular interactions. The architecture of SkipGNN is motivated by a principle of connectivity, which we call skip similarity. Remarkably, we found that skip similarity allows SkipGNN to much better capture structural and evolutionary forces that govern molecular interaction networks that what is possible with current graph neural networks. SkipGNN achieves superior and robust performance on a variety of key prediction tasks in interaction networks and performs well even when networks are highly incomplete.

There are several future directions. We focused here on networks in which all edges are of the same type. As SkipGNN is a general graph neural network, it would be interesting to adapt SkipGNN to heterogeneous networks, such as drug-gene–disease networks. Another fruitful direction would be to implement skip similarity in other types of biological networks.