SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization

Lei Li; Zhen Gao; Yu-Tian Wang; Ming-Wen Zhang; Jian-Cheng Ni; Chun-Hou Zheng; Yansen Su

doi:10.1371/journal.pcbi.1009165

Abstract

miRNAs belong to small non-coding RNAs that are related to a number of complicated biological processes. Considerable studies have suggested that miRNAs are closely associated with many human diseases. In this study, we proposed a computational model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to effectively combine different disease and miRNA similarity data, we applied similarity network fusion algorithm to obtain integrated disease similarity (composed of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity) and integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity). In addition, the L₂ regularization terms and similarity constraint terms were added to traditional Nonnegative Matrix Factorization algorithm to predict disease-related miRNAs. SCMFMDA achieved AUCs of 0.9675 and 0.9447 based on global Leave-one-out cross validation and five-fold cross validation, respectively. Furthermore, the case studies on two common human diseases were also implemented to demonstrate the prediction accuracy of SCMFMDA. The out of top 50 predicted miRNAs confirmed by experimental reports that indicated SCMFMDA was effective for prediction of relationship between miRNAs and diseases.

Author summary

Considerable studies have suggested that miRNAs are closely associated with many human diseases, so predicting potential associations between miRNAs and diseases can contribute to the diagnose and treatment of diseases. Several models of discovering unknown miRNA-diseases associations make the prediction more productive and effective. We proposed SCMFMDA to obtain more accuracy prediction result by applying similarity network fusion to fuse multi-source disease and miRNA information and utilizing similarity constrained matrix factorization to make prediction based on biological information. The global Leave-one-out cross validation and five-fold cross validation were applied to evaluate our model. Consequently, SCMFMDA could achieve AUCs of 0.9675 and 0.9447 that were obviously higher than previous computational models. Furthermore, we implemented case studies on significant human diseases including colon neoplasms and lung neoplasms, 47 and 46 of top-50 were confirmed by experimental reports. All results proved that SCMFMDA could be regard as an effective way to discover unverified connections of miRNA-disease.

Citation: Li L, Gao Z, Wang Y-T, Zhang M-W, Ni J-C, Zheng C-H, et al. (2021) SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization. PLoS Comput Biol 17(7): e1009165. https://doi.org/10.1371/journal.pcbi.1009165

Editor: Quan Zou, University of Electronic Science and Technology, CHINA

Received: April 5, 2021; Accepted: June 8, 2021; Published: July 12, 2021

Copyright: © 2021 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting Information files.

Funding: This work was supported by the National Natural Science Foundation of China through grants 61873001 (C.Z., Y.W.), U19A2064 (C.Z.), 61872220 (C.Z.) and 11701318 (Y.W.), the Natural Science Foundation of Shandong Province grant ZR2020KC022 (J.N., Y.W., Z.G., L.L) and the Open Project of Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University, No. MMC202006 (Y.W., L.L). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

MicroRNAs (miRNAs) are a number of 17-24nt non-coding RNAs, which act a pivotal part in controlling the expression of gene through RNA cleavage or translation repression [1–3]. Lin-4 was the first miRNA inspected in experiment by Lee et al. [4] in 1993. Since that time, a large amount of miRNAs was discovered by researchers in experiments [4,5]. Researchers have sought out generous miRNAs from various of species that included viruses, animals and plants [6]. Because miRNAs regulated the expression of a great quantity of target genes, the total miRNA pathway played a key role in gene expression control [7–9]. miRNAs are bound up with several crucial biological processes, such as cell development, cell differentiation, cell proliferation and so on [10]. Developmental defects can be the result of the dysregulation of miRNAs that also associate with progression of diseases [11]. In the meantime, considerable studies have indicated that miRNAs are connected with a serious of human neoplasms, which include lung neoplasms [12], prostate neoplasms [13] and so on. Hence distinguishing miRNAs associated with diseases can deepen understanding of the genetic causes of complex diseases. Massive connections between miRNAs and diseases have been found by a variety of traditional experiments in the past few years [14,15]. Traditional manual models can infer the connections between miRNA and disease, but which are time-consuming, laborious and high failure rate. Therefore, showing the potential relationship between miRNAs and diseases in need of computational methods with effectiveness and stability, as they can obtain increasing reliable miRNA-disease connections [16].

In the past period of time, a great deal of computation-based algorithms and methods have been applied to predict potential relationship of miRNA-disease [17,18]. For example, Jiang et al. [19] proposed a model that applied the human phenome-microRNAome network to predict potential interactions between miRNAs with similar function and diseases with similar phenotypic. However, the predictive performance of the model was not as decent as expected due to be affected by high false positive and false negative rates existing in the associations between miRNAs and targets. Later, the model WBSMDA [20] introduced the Gaussian interaction profile similarity to enrich similarity information of miRNA and disease. The WBSMDA could also predict potential relationship between new miRNAs and new diseases without any verified correlative information. The collaborative matrix factorization method was applied to predict the relationship of miRNA-diseases in CMFMDA [21], which also could utilize plentiful biological information observe unknown interactions. The model EGBMMDA [22] began to take advantage of decision tree learning to discover novel miRNA-disease interaction by integrating verified miRNA-disease connections, miRNA functional similarity and disease semantic similarity. The informative feature vector was constructed by multi-measures to train the regression tree under the gradient boosting framework. Zhao et al. [23] applied adaptive boosting to observe unverified miRNA-disease association in ABMDA model. And they utilized k-means clustering on negative samples to perform random sampling, which could control the balance between positive samples and negative samples. The BHCMDA [24] model utilized biased heat conduction (BHC) algorithm to predict unknown connections between miRNAs and diseases though combining miRNA similarity matrix, disease similarity matrix and miRNA-disease association matrix. The probabilistic matrix factorization (PMF) algorithm was used in IMIPMF [25] model to infer potential miRNA-disease interactions. The PMF was widely used in recommender systems, so it could effectively make use of all information to recommend miRNAs which are strongly associated with the disease.

Recently, the methods based on random walk were gradually proposed and more accuracy prediction results were obtained. Chen et al. [26] utilized the random walk with restart algorithm to construct RWRMDA model. Because the prediction performance calculated by global network similarity was better than local network [27,28], RWRMDA employed global network similarity to gain the feasible interactions between miRNAs and diseases. Unfortunately, RWRMDA was inappropriate to the diseases without known associated miRNAs. Shi et al. [29] utilized the function links between human disease genes and miRNA targets to devise a novel model. Random walk algorithm and global network distance measurement were applied to search feasible relationship between miRNAs and diseases. Liu et al. [30] also implemented random walk with restart algorithm in the model to make prediction results to a higher degree. They employed random walk with restart algorithm on a heterogeneous graph established by utilizing disease similarity and miRNA similarity. Luo et al. [31] employed imbalanced bi-random walk method on a heterogeneous network with information of miRNAs and diseases to identify feasible interactions of miRNA-disease. Niu et al. [32] applied random walk with restart algorithm to extract miRNA features from integrated miRNA similarity network in RWBRMDA model. Then these miRNA features were utilized by binary logistic regression algorithm to predict potential miRNA-disease associations.

For the sake of obtaining reliable and accurate predictive performance, machine learning-based methods gradually were utilized to predict unknown miRNA-disease associations. For instance, the model RBMMMDA [33] utilized restricted Boltzmann machine to predict miRNA-disease multi-type associations. The RBMMMDA could gain not only novel associations between miRNAs and diseases, but also corresponding association types. The model PBMDA [34] constructed a heterogeneous graph including different interlinked sub-graphs and further adopted depth-first search algorithm to seek potential miRNA-disease associations. PBMDA could function as a useful calculation tool to accelerate the prediction of miRNA-disease interactions. The model DNRLMF-MDA [35] integrated dynamic neighborhood regularized and logistic matrix factorization to predict potential relationship of miRNA-disease. DNRLMF-MDA applied logistic matrix factorization algorithm to association probability between miRNAs and diseases. Then implementing dynamic neighborhood regularized algorithm to improve predictive performance. Peng et al. [36] proposed the model MDA-CNN for miRNA-disease connection identification. The miRNA-disease interaction features were firstly captured by a three-layer network. Then an auto-encoder was employed to identify obvious miRNA-disease feature combinations. After these feature representations were reduced, the convolutional neural network utilized them to predict the final results. The significant machine learning-based model MLMDA [37] was proposed by Zheng et al. to predict unknown relationship of miRNA-disease. The k-mer sparse matrix was used to extract miRNA sequence information. Then integrating miRNA sequence information, miRNA and disease similarity information to construct feature vectors. The deep auto-encoder neural network (AE) and random forest classifier made full use of feature vectors to calculate the prediction probability. The NCMCMDA [38] model integrated neighborhood constraint with matrix completion algorithm to change the recovery task into an optimization problem. This model applied the fast iterative shrinkage-thresholding algorithm to recover missing interactions between miRNAs and diseases. Zhang et al. [39] proposed the computational model MSFSP to achieve a more accuracy predictive performance of miRNA-disease interactions. The MSFSP firstly integrated various similarity information of miRNA and disease to construct the similarity of miRNA and disease. Then miRNA and disease similarity matrices and verified miRNA-disease association matrix were utilized to constitute the weighted network of miRNA-disease connections. The final prediction labels were calculated by weighting miRNA and disease space projection scores. Ji et al. [40] proposed SVAEMDA model to infer more disease-related miRNAs, which used miRNA similarity and disease similarity to obtain the representations of miRNA and disease. In addition, the variational autoencoder based predictor was trained to predict unknown interactions of miRNA-disease, which combined verified miRNA-disease interactions with the representations of miRNA and disease to generate the feature vectors of miRNA and disease.

Because there were several limitations in previous models, we presented a novel model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to obtain plentiful disease similarity data, we applied similarity network fusion algorithm to integrate various disease similarities, which consisted of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity. Similarly, miRNA similarity data was obtained by applying similarity network fusion to integrate miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity. In addition, we added L₂ regularization terms and similarity constraint terms to standard Nonnegative Matrix Factorization (NMF) method to predict more unknown miRNA-disease associations. To evaluate the effectiveness of SCMFMDA, global Leave-one-out cross validation and five-fold cross validation were carried out on the verified miRNA-disease association data downloaded from HMDD v2.0 [41]. As a result, SCMFMDA achieved AUC values of 0.9675 and 0.9447, respectively. Furthermore, we performed case studies on colon neoplasms and lung neoplasms. Consequently, the miR2Disease [42] and dbDEMC v2.0 [43] databases were utilized to validate results of case studies, which achieved high confirmation ratios. Experimental results showed that SCMFMDA was effective for inferring possible relationship between miRNAs and diseases.

Materials

Human miRNA-disease associations

In this study, we downloaded verified human miRNA-disease association information from HMDD v2.0 database, which included 5430 known associations between 383 diseases and 495 miRNAs. For the sake of making calculation convenient, we made an adjacency matrix A∈R^nd×nm to indicate the verified miRNA-disease associations. The nd and nm mean the number of diseases and miRNAs, respectively. We used a_ij to represent the (i,j)th element of matrix A. Specifically, The element a_ij is set to 1 if disease d_i is related to miRNA m_j; and otherwise, it is set to 0.

Disease functional similarity

The phenotypically similar diseases tend to associate with similar genes. Therefore, we could calculate disease functional similarity based on the functional information of gene. The log-likelihood score (LLS) represents the probability of a functional linkage between different genes, which can be downloaded from the HumanNet database [44] and be normalized as follows: (1) where LLS(g_a, g_b) denotes the LLS between gene g_a and gene g_b, LLS_max and LLS_min are the maximum LLS and minimum LLS in HumanNet database; LLS_n(g_a, g_b) represents the normalized LLS.

Then, the gene functional similarity score can be calculated by the below equation: (2) where S_HumanNet represents the link set that contains whole links between genes in HumanNet database; e(a,b) indicates the link between gene g_a and gene g_b.

Furthermore, the functional similarity score between gene g and gene set G is defined as follows: (3)

The SIDD [45] can be utilized to obtain disease-gene association data, which are involved in calculating disease functional similarity SD₁ by the following equation: (4)

Disease semantic similarity

On the basis of previous study [46], the medical subject headings (Mesh) descriptors could be implemented to calculate disease semantic similarity. Here, the Directed Acyclic Graph (DAG) could be adopted to indicate the specific relationship of different diseases. Concretely, the DAG(D) = (D,T(D),E(D)) represents the DAG of disease D, in which T(D) denotes the node set containing D itself and its ancestor nodes, E(D) denotes the relevant edge set including edges from parent nodes to their child nodes directly. Then the semantic value of disease D can be calculated as below: (5) where the semantic contribution of disease d to D can be calculated as follows: (6) here, Δ is the semantic contribution factor that is set to 0.5 based on previous literature [47].

On the basis of assumption that various diseases tend to be regarded as similar diseases if the large parts of their DAGs are same. Therefore, the semantic similarity DS₁(d_i, d_j) between disease d_i and disease d_j can be defined as follows: (7)

Based on the previous study [48], diseases appear in less DAGs may be more specific, these diseases ought to gain a higher semantic contribution in DAGs. Therefore, different diseases located in the same layer of one DAG, which may obtain the different contribution value. Specifically, the semantic contribution of disease d to D can be calculated in different way as below: (8)

Correspondingly, the semantic score of disease D and semantic similarity DS₂(d_i, d_j) between disease d_i and disease d_j can be calculated as follows: (9) (10)

Finally, we integrated DS₁ and DS₂ to calculate final disease semantic similarity SD₂(d_i, d_j) between disease d_i and disease d_j in following equation: (11)

miRNA functional similarity

Based on the calculation method of miRNA functional similarity [49,50], assuming that functionally similar miRNAs tend to be linked with phenotypically similar diseases and vice versa. We downloaded miRNA functional similarity data from http://www.cuilab.cn/files/images/cuilab/misim.zip. Here, we constructed the matrix SM₁ with nm rows and nm columns for storing the corresponding information. The element SM₁(m_i, m_j) represents the relevant functional similarity score between miRNA m_i and miRNA m_j.

miRNA sequence similarity

We utilized the Needleman-Wunsch Algorithm to calculate miRNA sequence similarity, and corresponding miRNA sequence information can be obtained from miRBase database [51]. Be similar to miRNA functional similarity, we also constructed a matrix SM₂∈R^nm×nm to store sequence similarity information, where SM₂(m_i, m_j) was the relevant sequence similarity score between miRNA m_i and miRNA m_j.

Gaussian interaction profile kernel similarity for diseases and miRNAs

On the basis of previous study [49,50], because miRNAs with similar function are likely to be linked with diseases with similar phenotypes, the Gaussian interaction profile (GIP) kernel similarity can be calculated and applied to stand for the miRNA similarity and disease similarity. Concretely, the binary vector K(d_i) is constructed to indicate the interaction profile of disease d_i in accordance with whether d_i possesses known association with each miRNA or not. Here, the GIP kernel similarity SD₃(d_i, d_j) between disease d_i and disease d_j can be calculated as below equations: (12) (13)

In the same light, the GIP kernel similarity SM₃(m_i, m_j) between miRNA m_i and miRNA m_j can be calculated by the following formulas: (14) (15) where the binary vector K(m_i) indicates the interaction profile of miRNA m_i in accordance with whether m_i has known association with each disease or not, the parameter ρ_m is utilized to control kernel bandwidth.

Methods

Overview

The SCMFMDA includes two major parts: similarity network fusion is applied to obtain integrated disease similarity and integrated miRNA similarity; known miRNA-diseases associations and integrated similarities are adopted in similarity constrained matrix factorization to infer unknown associations of miRNA-disease. The specific flow chart of SCMFMDA is shown in Fig 1.

Download:

Fig 1. Flow chart of SCMFMDA.

https://doi.org/10.1371/journal.pcbi.1009165.g001

Integrating similarity for diseases and miRNAs

The similarity between two diseases can use disease functional similarity, disease semantic similarity and disease GIP kernel similarity to represent. Similarly, miRNA functional similarity, miRNA sequence similarity and miRNA GIP kernel similarity can be utilized to indicate similarity between different miRNAs. Here, the similarity network fusion (SNF) [52] method is applied to integrate various similarities for disease and miRNA. According to previous study, the process of SNF can be expressed as iterative update of similarity matrices. The main steps of utilizing SNF to integrate different disease similarities SD_n, n = 1,2,3 are introduced as follows.

In the first step, we calculated normalized weight matrix P_n of each similarity network as follows: (16)

In the second step, we utilized k nearest neighbor (KNN) algorithm to measure the local relationship of each similarity network. The specific process to obtain corresponding matrix K_n is displayed as follows: (17) where the N_i indicates the number of neighbors in the disease.

In the third step, we applied SNF to integrate normalized weight matrix P_n and local relationship matrix K_n as follows: (18)

Because we had three different disease similarity networks (disease functional similarity, disease semantic similarity and disease GIP kernel similarity), the m was equal to 3. After iterative update, the ultimate disease similarity matrix S^d could be obtained as follows: (19)

Similarly, we could apply SNF algorithm to obtain final miRNA similarity matrix S^m.

Similarity constrained matrix factorization

After obtaining processed disease similarity and miRNA similarity, similarity constrained matrix factorization method is adopted to observe more unknown interactions of miRNA-disease, and Fig 2 shows concrete details of it. The SCMFMDA factorized the matrix A∈R^nd×nm into U∈R^nd×γ and V∈R^nm×γ, where γ denoted the dimension of disease feature and miRNA feature in the low-rank spaces. To be specific, the association of miRNA-disease roughly equal to the inner product between the disease feature vector and the miRNA feature vector: , where u_i and v_j represent the ith row of U and the jth row of V, respectively. The corresponding objective function is shown as follows: (20)

Then, the L₂ regularization terms of u_i and v_j are added to the Eq (20) for solving overfitting problem.

Download:

Fig 2. The details of similarity constrained matrix factorization.

https://doi.org/10.1371/journal.pcbi.1009165.g002

(21) where σ is the regularization parameter for u_i and v_j.

On the basis of previous study [53,54], the geometric properties of data points may be kept when they are mapped from high-rank space into low-rank space. Disease similarity S^d and miRNA similarity S^m can indicate geometric structure of data points, so we present similarity constraint terms S_U and S_V as follows: (22) (23) where represents the similarity between disease d_i and disease d_j, denotes the similarity between miRNA m_i and miRNA m_j, respectively. Considering the similarity degree between two data points is up to the distance of them, so S_U will incur a heavy penalty if the distance of d_i and d_j are close in disease feature space. Therefore, we could keep the geometric structure of disease data points by minimizing S_U, which would cause that disease d_i and disease d_j were mapped closely in low dimensional space. For miRNA, it is the same situation. Hence, the objective function of SCMFMDA are proposed by adding S_U and S_V to Eq (21) as follows: (24) where ε is regarded as hyper parameter which can availably control the smoothness of similarity consistency.

Optimization algorithm

In this section, we proposed an efficacious optimization algorithm to calculate the objective function of SCMFMDA. First, the partial derivatives of L in regard to u_i and v_j are calculated as follows: (25) where A(i,:) denotes the ith row of matrix A. (26) where A(:,j) denotes the jth column of matrix A.

Then, the second derivatives of L in regard to u_i and v_j are calculated by the below equations: (27) (28)

According to Newton’s method, u_i and v_j can be executed iterative update as follows: (29) (30)

Hence, u_i and v_j can be updated by the following formulas: (31) (32)

When the convergence condition is met, the update of u_i and v_j will stop. The prediction matrix can be obtained by updated u_i and v_j.

(33)

The value of denotes the association probability between disease d_i and miRNA m_j. The more likely the association is, if the score is higher.

Results

Parameters optimization

In this section, parameters γ, σ and ε are quantitatively analyzed to research their effect on the prediction performance. γ represents the dimension of diseases and miRNAs in low-rank spaces, and γ<min (nd, nm) that can be considered as the percentage of min (nd, nm). Parameters σ and ε denote the regularization parameters. The AUC value of 5-CV is applied to evaluate influence of the choice of parameters on the performance of model. And after generous test experiments were conducted, we could get the conclusion that the value of γ would affect the experiment individually. For this reason, we fixed σ and ε in a suitable combination to test the most suitable value of γ∈{0,10%,…,1} in SCMFMDA. In order to ensure the correctness of the test, σ and ε are fixed in different combination. From Fig 3A, we could see that SCMFMDA obtained the best performance when γ = 50%. In addition, the γ = 50% is fixed so that the effect of regularization parameters σ and ε can be clearly evaluated. We utilized all combinations of σ∈{2⁻³,2⁻²,…,2³} and ε∈{2⁻³,2⁻²,…,2³} to construct SCMFMDA. From Fig 3B, we could discover that SCMFMDA acquired best AUC value of 0.9447 when σ = 2² and ε = 2⁰. In summary, γ, σ and ε are set to 50%, 2² and 2⁰ in our model, respectively.

Download:

Fig 3.

The influence of parameters on SCMFMDA: (A) the influence of γ; (B) the influence of σ and ε.

https://doi.org/10.1371/journal.pcbi.1009165.g003

Model comparison

In order to evaluate the prediction ability of SCMFMDA, we compared several previous computational methods that were proposed to predict unknown miRNA-disease associations. We applied same dataset (HMDD v2.0 database) to train these methods so that comparison results could be considered as fairness. The specific information of these methods are shown as follows.

MSCHLMDA [55] is a multi-similarity based combinative hypergraph learning model (published in 2020).
ICFMDA [56] is an improved collaborative filtering-based computational model (published in 2018).
SACMDA [57] is short acyclic connections-based computational model (published in 2018).
GRNMF [58] is a graph regularized non-negative matrix factorization-based model (published in 2018).
GRL_2,1−NMF [59] is a graph Laplacian regularized L_2,1-nonnegative matrix factorization-based computational model (published in 2020).
NPCMF [60] is a nearest profile-based collaborative matrix factorization model (published in 2019).
KBMFMDA [61] is a kernelized Bayesian matrix factorization-based computational model (published in 2020).

Based on the HMDD v2.0 database that included 5430 verified associations and 184155 unverified associations between 383 diseases and 495 miRNAs, global Leave-one-out cross validation (global LOOCV) and five-fold cross validation (5-CV) were implemented to evaluate the prediction performance of these methods. In the framework of global LOOCV, the test set was held by each verified association of miRNA-disease in turn, the training set was composed of other verified associations. The whole unknown miRNA-disease associations were considered as candidate samples. Similarly, in the framework of 5-CV, the whole verified miRNA-disease associations were divided into five parts in a random way, where test set was held by one part in turn, training set consisted of other four parts in turn. The whole unknown miRNA-disease associations were considered as candidate samples. In addition, by either the global LOOCV or the 5-CV, we applied SCMFMDA to obtain all predicted association scores so that the ranking of test set relative to candidate samples could be calculated. When the ranking of all test sample were higher than the certain threshold, SCMFMDA was regarded as a valid model. Then we could utilize the Receiver operating characteristics (ROC) curve that was obtained by plotting the true positive rate (TPR) against the false positive rate (FPR) to effectively evaluate the performance of SCMFMDA. We could calculate the area under the ROC curve (AUC) of SCMFMDA whose value was between 0 and 1. Similarly, we could obtain AUCs of other computational methods by utilizing the information of HMDD v2.0 database.

In this work, when global LOOCV method was conducted, SCMFMDA, MSCHLMDA, ICFMDA and SACMDA acquired average AUC values of 0.9675, 0.9287, 0.9072 and 0.8777, respectively (Fig 4). For the purpose of reducing potential deviations resulted in random sample segmentations, we applied 100 times repeated segmentations to verified associations of miRNA-disease in 5-CV method, and the average AUC values of SCMFMDA, MSCHLMDA, ICFMDA and SACMDA reached 0.9447, 0.9263, 0.9046, and 0.8773, respectively (Fig 5). Obviously, the prediction performance of SCMFMDA was better than other methods.

Download:

Fig 4. AUC of global LOOCV compared with those of MSCHLMDA, ICFMDA and SACMDA.

https://doi.org/10.1371/journal.pcbi.1009165.g004

Download:

Fig 5. AUC of 5-CV compared with those of MSCHLMDA, ICFMDA and SACMDA.

https://doi.org/10.1371/journal.pcbi.1009165.g005

In order to further reflect the performance of the SCMFMDA, it is also compared with other state-of-the-art matrix factorization-based methods that include GRNMF, GRL_2,1−NMF, NPCMF, KBMFMDA. The 5-CV results of all model are demonstrated in Table 1, clearly SCMFMDA possesses the best AUC. The advantages of SCMFMDA than other matrix factorization-based models are as follows: first, the biological similarity data that are utilized in SCMFMDA obviously more than other models; second, SCMFMDA utilizes SNF instead of traditional linear combination method to integrate various similarity data, which greatly guarantee the completeness and effectiveness of experiment data; third, the L₂ regularization and similarity constraint terms are added to the NMF objective function, which benefit to correctly discover more unknown miRNA-disease connections.

Download:

Table 1. Comparisons between SCMFMDA and other MF-based models.

https://doi.org/10.1371/journal.pcbi.1009165.t001

Case studies

For the purpose of demonstrating the effectiveness and accuracy of SCMFMDA, we applied an evaluation experiment in this section. We implemented two types of human diseases, i.e., colon neoplasms and lung neoplasms to validate the expression of our method. There is no doubt that these diseases do great harm to human health. Colon neoplasms belongs to malignancy in the field of Medicine, which has been confirmed to associate with several miRNAs [62,63]. Lung neoplasms is one of the most dangerous malignancies with the fastest increase in morbidity and mortality [12]. A growing number of evidence indicates that lung neoplasms and a few of miRNAs have close relationship. For a specific disease, verified associations of whole diseases in HMDD v2.0 database are considered as training samples, unverified associations with the specific disease in HMDD v2.0 database are treated a candidate samples. By training this model, we could rank predicted association score of the candidate samples and then the top 50 candidate associations with the specific disease are selected. In addition, we utilized two types of databases that were miR2disease and dbDEMC v2.0 to check out miRNAs that have been ranked. Moreover, Tables 2 and 3 indicated prediction results obtained via SCMFMDA, respectively. The 94% and 92% of top 50 miRNAs that inferred by our model, which were individually confirmed to associate with colon neoplasms and lung neoplasms according to the miR2Disease and dbDEMC v2.0 databases. Only 3 and 4 of top 50 predicted miRNAs that are related colon neoplasm and lung neoplasms could not find clues in the databases.

Download:

Table 2. The top 50 potential miRNAs associated with colon neoplasms.

https://doi.org/10.1371/journal.pcbi.1009165.t002

Download:

Table 3. The top 50 potential miRNAs associated with lung neoplasms.

https://doi.org/10.1371/journal.pcbi.1009165.t003

Discussion and conclusion

In this paper, we introduced a new model named SCMFMDA that used similarity constrained matrix factorization algorithm to predict possible associations of miRNA-disease. In order to obtain plenty of disease similarity data and miRNA similarity data, similarity network fusion algorithm is used to integrate various disease and miRNA biological information, respectively. In addition, L₂ regularization terms and similarity constraint terms are added to the standard NMF for predicting more unobserved miRNA-disease associations. In the frameworks of global LOOCV and 5-CV, the AUCs of SCMFMDA severally achieved 0.9675 and 0.9447 that indicated the performance of our model had a significant improvement relative to previous models. Furthermore, the predicted miRNAs that related to colon neoplasms and lung neoplasms were confirmed by the experiment literatures, so the prediction results of our model were proved to be reliable.

What should be denoted is that the following factors may contribute to the reliable performance of SCMFMDA. First, similarity network fusion algorithm was applied to integrate different disease and miRNA similarities, which can ensure the richness of biological data in the experiment. Then, the function of L₂ regularization terms is avoiding overfitting problem. Moreover, the similarity constraint terms consist of disease feature-based similarity and miRNA feature-based similarity, which can generate robustness to the data richness.

However, several limitations may influence the performance of SCMFMDA. First, the model is applicable to the diseases and miRNAs must appear in the selected dataset, but can’t make predictions for other diseases and miRNAs. In addition, for some important parameters in SCMFMDA, we hadn’t appropriate way to select the most suitable parameters expect carrying out all combinations. Therefore, we should continuously optimize our model to improve its performance in later days.

Supporting information

S1 Table. Known human miRNA-disease associations obtained from HMDD v2.0 database.

https://doi.org/10.1371/journal.pcbi.1009165.s001

(XLSX)

S2 Table. Names of 383 diseases involved in known human miRNA-disease associations obtained from HMDD v2.0 database.

https://doi.org/10.1371/journal.pcbi.1009165.s002

(XLSX)

S3 Table. Names of 495 miRNAs involved in known human miRNA-disease associations obtained from HMDD v2.0 database.

https://doi.org/10.1371/journal.pcbi.1009165.s003

(XLSX)

S4 Table. The constructed disease functional similarity score matrix.

https://doi.org/10.1371/journal.pcbi.1009165.s004

(XLSX)

S5 Table. The constructed disease semantic similarity score matrix.

https://doi.org/10.1371/journal.pcbi.1009165.s005

(XLSX)

S6 Table. The constructed miRNA functional similarity score matrix.

https://doi.org/10.1371/journal.pcbi.1009165.s006

(XLSX)

S7 Table. The constructed miRNA sequence similarity score matrix.

https://doi.org/10.1371/journal.pcbi.1009165.s007

(XLSX)

References

1. Bartel DP. MicroRNAs: Genomics, Biogenesis, Mechanism, and Function. Cell. 2004;116(2):281–297. pmid:14744438
- View Article
- PubMed/NCBI
- Google Scholar
2. Chatterjee S, Grosshans H. Active turnover modulates mature microRNA activity in Caenorhabditis elegans. Nature. 2009;461(7263):546–549. pmid:19734881
- View Article
- PubMed/NCBI
- Google Scholar
3. He L, Hannon GJ. MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet. 2004;5(7):522–531. pmid:15211354
- View Article
- PubMed/NCBI
- Google Scholar
4. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75(5):843–854. pmid:8252621
- View Article
- PubMed/NCBI
- Google Scholar
5. Wightman B, Ha I, Ruvkun G. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell. 1993;75(5):855–862. pmid:8252622
- View Article
- PubMed/NCBI
- Google Scholar
6. Jopling CL, Yi M, Lancaster AM, Lemon SM, Sarnow P. Modulation of Hepatitis C Virus RNA Abundance by a Liver-Specific MicroRNA. Science. 2005;309(5740):1577–1581. pmid:16141076
- View Article
- PubMed/NCBI
- Google Scholar
7. Xu P, Guo M, Hay BA. MicroRNAs and the regulation of cell death. Trends Genet. 2004;20(12):617–624. pmid:15522457
- View Article
- PubMed/NCBI
- Google Scholar
8. Bartel DP. MicroRNAs: Target Recognition and Regulatory Functions. Cell. 2009;136(2):215–233. pmid:19167326
- View Article
- PubMed/NCBI
- Google Scholar
9. Miska EA. How microRNAs control cell division, differentiation and death. Curr Opin Genet Dev. 2005;15(5):563–568. pmid:16099643
- View Article
- PubMed/NCBI
- Google Scholar
10. Harfe BD. MicroRNAs in vertebrate development. Curr Opin Genet Dev. 2005;15(4):410–415. pmid:15979303
- View Article
- PubMed/NCBI
- Google Scholar
11. Meola N, Gennarino V, Banfi S. microRNAs and genetic diseases. Pathogenetics. 2009;2(1):7. pmid:19889204
- View Article
- PubMed/NCBI
- Google Scholar
12. Yanaihara N, Caplen N, Bowman E, Seike M, Kumamoto K. Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell. 2006;9(3):189–198. pmid:16530703
- View Article
- PubMed/NCBI
- Google Scholar
13. Sita-Lumsden A, Dart DA, Waxman J, Bevan CL. Circulating microRNAs as potential new biomarkers for prostate cancer. Br J Cancer. 2013;108(10):1925–1930. pmid:23632485
- View Article
- PubMed/NCBI
- Google Scholar
14. Mohammadi-Yeganeh S, Paryan M, Samiee SM, Soleimani M, Arefian E, Azadmanesh K, et al. Development of a robust, low cost stem-loop real-time quantification PCR technique for miRNA expression analysis. Mol Biol Rep. 2013;40(5):3665–3674. pmid:23307300
- View Article
- PubMed/NCBI
- Google Scholar
15. Thomson JM, Parker JS, Hammond SM. Microarray Analysis of miRNA Gene Expression. Methods Enzymol. 2007;427:107–122. pmid:17720481
- View Article
- PubMed/NCBI
- Google Scholar
16. Han K, Xuan P, Ding J, Zhao ZJ, Hui L, Zhong YL. Prediction of disease-related microRNAs by incorporating functional similarity and common association information. Genet Mol Res. 2014;13(1):2009–2019. pmid:24737426
- View Article
- PubMed/NCBI
- Google Scholar
17. Yu S, Liang C, Xiao Q, Li G, Ding P, Luo J. MCLPMDA: A novel method for miRNA-disease association prediction based on matrix completion and label propagation. J Cell Mol Med. 2019;23(2):1427–1438. pmid:30499204
- View Article
- PubMed/NCBI
- Google Scholar
18. Chen X, Gong Y, Zhang D, You Z, Li Z. DRMDA: deep representations–based miRNA–disease association prediction. J Cell Mol Med. 2018;22(1):472–485. pmid:28857494
- View Article
- PubMed/NCBI
- Google Scholar
19. Jiang Q, Hao Y, Wang G, Juan L, Wang Y. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol. 2010;4(SUPPL. 1):S2. pmid:20522252
- View Article
- PubMed/NCBI
- Google Scholar
20. Chen X, Yan C, Zhang X, You Z, Deng L, Liu Y, et al. WBSMDA: Within and Between Score for MiRNA-Disease Association prediction. Sci Rep. 2016;6:21106. pmid:26880032
- View Article
- PubMed/NCBI
- Google Scholar
21. Shen Z, Zhang YH, Han K, Nandi AK, Honig B, Huang DS. miRNA-Disease Association Prediction with Collaborative Matrix Factorization. Complexity. 2017;2017:1–9.
- View Article
- Google Scholar
22. Chen X, Huang L, Xie D, Zhao Q. EGBMMDA: Extreme Gradient Boosting Machine for MiRNA-Disease Association prediction. Cell Death Dis. 2018;9(1):3. pmid:29305594
- View Article
- PubMed/NCBI
- Google Scholar
23. Zhao Y, Chen X, Yin J. Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics. 2019;35(22):4730–4738. pmid:31038664
- View Article
- PubMed/NCBI
- Google Scholar
24. Zhu XY, Wang XZ, Zhao HC, Pei TR, Kuang LN, Wang L. BHCMDA: A New Biased Conduction Based Method for Potential MiRNA-Disease Association Prediction. Front Genet. 2020;11(1):384. pmid:32425979
- View Article
- PubMed/NCBI
- Google Scholar
25. Ha J, Park C, Park C, Park S. IMIPMF: Inferring miRNA-disease interactions using probabilistic matrix factorization. J Biomed Inform. 2020;102:103358. pmid:31857202
- View Article
- PubMed/NCBI
- Google Scholar
26. Chen X, Liu M, Yan G. RWRMDA: predicting novel human microRNA-disease associations. Mol Biosyst. 2012;8(10):2792–2798. pmid:22875290
- View Article
- PubMed/NCBI
- Google Scholar
27. Köhler S, Bauer S, Horn D, Robinson PN. Walking the Interactome for Prioritization of Candidate Disease Genes. The Am J Hum Genet. 2008;82(4):949–958. pmid:18371930
- View Article
- PubMed/NCBI
- Google Scholar
28. Zhang H, Cao L, Gao S. A locality correlation preserving support vector machine. Pattern Recognition. 2014;47(9):3168–3178.
- View Article
- Google Scholar
29. Shi H, Xu J, Zhang G, Xu L, Xia L. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013;7:101. pmid:24103777
- View Article
- PubMed/NCBI
- Google Scholar
30. Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2017;14(4):905–915. pmid:27076459
- View Article
- PubMed/NCBI
- Google Scholar
31. Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J Biomed Inform. 2017;66:194–203. pmid:28104458
- View Article
- PubMed/NCBI
- Google Scholar
32. Niu Y, Wang G, Yan G, Chen X. Integrating random walk and binary regression to identify novel miRNA-disease association. BMC Bioinformatics. 2019;20:59. pmid:30691413
- View Article
- PubMed/NCBI
- Google Scholar
33. Chen X, Yan CC, Zhang X, Li Z, Deng L, Zhang Y, et al. RBMMMDA: predicting multiple types of disease-microRNA associations. Sci Rep. 2015;5(1):13877. pmid:26347258
- View Article
- PubMed/NCBI
- Google Scholar
34. You Z, Huang Z, Zhu Z, Yan G, Chen X. PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput Biol. 2017;13(1):e1005455. pmid:28339468
- View Article
- PubMed/NCBI
- Google Scholar
35. Yan C, Wang JX, Ni P, Lan W, Wu FX, Pan Y. DNRLMF-MDA: Predicting microRNA-Disease Associations Based on Similarities of microRNAs and Diseases. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(1):233–243. pmid:29990253
- View Article
- PubMed/NCBI
- Google Scholar
36. Peng J, Hui W, Li Q, Chen B, Hao J, Jiang Q, et al. A learning-based framework for miRNA-disease association identification using neural networks. Bioinformatics. 2019;35(21):4364–4371. pmid:30977780
- View Article
- PubMed/NCBI
- Google Scholar
37. Zheng K, You ZH, Wang L, Zhou Y, Li LP, Li ZW. MLMDA: a machine learning approach to predict and validate MicroRNA-disease associations by integrating of heterogeneous information source. J Transl Med. 2019;17(1):260. pmid:31395072
- View Article
- PubMed/NCBI
- Google Scholar
38. Chen X, Sun LG, Zhao Y. NCMCMDA: miRNA-disease association prediction through neighborhood constraint matrix completion. Brief Bioinform. 2021;22(1):485–496. pmid:31927572
- View Article
- PubMed/NCBI
- Google Scholar
39. Zhang Y, Chen M, Cheng X, Wei H. MSFSP: A Novel miRNA-Disease Association Prediction Model by Federating Multiple-Similarities Fusion and Space Projection. Front Genet. 2020;11:389. pmid:32425980
- View Article
- PubMed/NCBI
- Google Scholar
40. Ji C, Wang Y, Gao Z, Li L, Zheng C. A Semi-Supervised Learning Method for MiRNA-Disease Association Prediction Based on Variational Autoencoder. IEEE/ACM Trans Comput Biol Bioinform. 2021;1(1):99. pmid:33735084
- View Article
- PubMed/NCBI
- Google Scholar
41. Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, et al. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2013;42(Database issue): D1070–D1074. pmid:24194601
- View Article
- PubMed/NCBI
- Google Scholar
42. Jiang Q, Wang Y, Hao Y, Liran J, Teng M, Zhang X, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37(Database issue): D98–D104. pmid:18927107
- View Article
- PubMed/NCBI
- Google Scholar
43. Yang Z, Wu L, Wang A, Tang W, Zhao Y, Zhao H, et al. dbDEMC 2.0: Updated database of differentially expressed miRNAs in human cancers. Nucleic Acids Res. 2016;45(D1):D812–D818. pmid:27899556
- View Article
- PubMed/NCBI
- Google Scholar
44. Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011;21(7):1109–1121. pmid:21536720
- View Article
- PubMed/NCBI
- Google Scholar
45. Cheng L, Wang G, Li J, Zhang T, Xu P, Wang Y, et al. SIDD: A Semantically Integrated Database towards a Global View of Human Disease. PLoS One. 2013;8(10):e75504. pmid:24146757
- View Article
- PubMed/NCBI
- Google Scholar
46. Lipscomb CE. Medical Subject Headings (MeSH). Bull Med Libr Assoc. 2000;88(3):265–266. pmid:10928714
- View Article
- PubMed/NCBI
- Google Scholar
47. Wang D, Wang JY, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26(13):1644–1650. pmid:20439255
- View Article
- PubMed/NCBI
- Google Scholar
48. Xuan P, Han K, Guo MZ, Guo YH, Li JB. Ding J, et al. Correction: Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors. PLoS One. 2013;8(9):e70204. pmid:24116246
- View Article
- PubMed/NCBI
- Google Scholar
49. Goh KI, Cusick ME, Valle D, Childs B, Barabási AL. The human disease network. Proc Natl Acad Sci U S A. 2007;104(27):8685–8690. pmid:17502601
- View Article
- PubMed/NCBI
- Google Scholar
50. Lu M, Zhang Q, Min D, Jing M, Guo Y, Guo W, et al. An Analysis of Human MicroRNA and Disease Associations. PLoS One. 2008;3(10):e3420. pmid:18923704
- View Article
- PubMed/NCBI
- Google Scholar
51. Kozomara A, Griffiths-jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2013;42(D1):D68–D73. pmid:24275495
- View Article
- PubMed/NCBI
- Google Scholar
52. Wang B, Mezlini AM, Demir F, Fiume M, Tu ZW, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11(3):333–337. pmid:24464287
- View Article
- PubMed/NCBI
- Google Scholar
53. Zhang W, Liu XR, Chen YL, Wu WJ, Wang W, Li XH. Feature-derived graph regularized matrix factorization for predicting drug side effects-Science Direct. Neurocomputing. 2018;287:154–162.
- View Article
- Google Scholar
54. Rana B, Juneja A, Saxena M, Gudwani S, Kumaran SS, Behari M, et al. Graph Theory based Spectral Feature Selection for Computer Aided Diagnosis of Parkinson’s Disease Using T1-weighted MRI. International Journal of Imaging Systems and Technology. 2015;25(3):245–255.
- View Article
- Google Scholar
55. Wu Q, Wang Y, Gao Z, Ni J, Zheng C. MSCHLMDA: Multi-Similarity Based Combinative Hypergraph Learning for Predicting MiRNA-Disease Association. Front Genet. 2020;11:354. pmid:32351545
- View Article
- PubMed/NCBI
- Google Scholar
56. Jiang Y, Liu B, Yu L, Yan C, Bian H. Predict MiRNA-Disease Association with Collaborative Filtering. Neuroinformatics. 2018;16:363–372. pmid:29948843
- View Article
- PubMed/NCBI
- Google Scholar
57. Shao B, Liu B, Yan C. SACMDA: MiRNA-Disease Association Prediction with Short Acyclic Connections in Heterogeneous Graph. Neuroinformatics. 2018;16:373–382. pmid:29644547
- View Article
- PubMed/NCBI
- Google Scholar
58. Xiao Q, Luo J, Liang C, Cai J, Ding P. A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics. 2018;34(2):239–248. pmid:28968779
- View Article
- PubMed/NCBI
- Google Scholar
59. Gao Z, Wang Y, Wu Q, Ni J, Zheng C. Graph regularized L_2,1-nonnegative matrix factorization for miRNA-disease association prediction. BMC Bioinformatics. 2020;21:61. pmid:32070280
- View Article
- PubMed/NCBI
- Google Scholar
60. Gao Y, Cui Z, Liu J, Wang J, Zheng C. NPCMF: Nearest Profile-based Collaborative Matrix Factorization method for predicting miRNA-disease associations. BMC Bioinformatics. 2019;20(1):353. pmid:31234797
- View Article
- PubMed/NCBI
- Google Scholar
61. Chen X, Li S, Yin J, Wang C. Potential miRNA-disease association prediction based on kernelized Bayesian matrix factorization. Genomics. 2020;112(1):809–819. pmid:31136792
- View Article
- PubMed/NCBI
- Google Scholar
62. Torre LA, Bray F, Siegel RL, Tieulent JL, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65(2):87–108. pmid:25651787
- View Article
- PubMed/NCBI
- Google Scholar
63. Hiroko OK, Masashi I, Daisuke K, Yoshitaka H, Yasuhide Y, Koh F, et al. Circulating Exosomal microRNAs as Biomarkers of Colon Cancer. PLoS One. 2014;9(4):e92921. pmid:24705249
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Bartel DP. MicroRNAs: Genomics, Biogenesis, Mechanism, and Function. Cell. 2004;116(2):281–297. pmid:14744438
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Chatterjee S, Grosshans H. Active turnover modulates mature microRNA activity in Caenorhabditis elegans. Nature. 2009;461(7263):546–549. pmid:19734881
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. He L, Hannon GJ. MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet. 2004;5(7):522–531. pmid:15211354
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75(5):843–854. pmid:8252621
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Wightman B, Ha I, Ruvkun G. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell. 1993;75(5):855–862. pmid:8252622
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Jopling CL, Yi M, Lancaster AM, Lemon SM, Sarnow P. Modulation of Hepatitis C Virus RNA Abundance by a Liver-Specific MicroRNA. Science. 2005;309(5740):1577–1581. pmid:16141076
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Xu P, Guo M, Hay BA. MicroRNAs and the regulation of cell death. Trends Genet. 2004;20(12):617–624. pmid:15522457
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Bartel DP. MicroRNAs: Target Recognition and Regulatory Functions. Cell. 2009;136(2):215–233. pmid:19167326
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Miska EA. How microRNAs control cell division, differentiation and death. Curr Opin Genet Dev. 2005;15(5):563–568. pmid:16099643
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Harfe BD. MicroRNAs in vertebrate development. Curr Opin Genet Dev. 2005;15(4):410–415. pmid:15979303
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Meola N, Gennarino V, Banfi S. microRNAs and genetic diseases. Pathogenetics. 2009;2(1):7. pmid:19889204
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Yanaihara N, Caplen N, Bowman E, Seike M, Kumamoto K. Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell. 2006;9(3):189–198. pmid:16530703
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. Sita-Lumsden A, Dart DA, Waxman J, Bevan CL. Circulating microRNAs as potential new biomarkers for prostate cancer. Br J Cancer. 2013;108(10):1925–1930. pmid:23632485
View Article
PubMed/NCBI
Google Scholar

[50] View Article

[51] PubMed/NCBI

[52] Google Scholar

[ref14] 14. Mohammadi-Yeganeh S, Paryan M, Samiee SM, Soleimani M, Arefian E, Azadmanesh K, et al. Development of a robust, low cost stem-loop real-time quantification PCR technique for miRNA expression analysis. Mol Biol Rep. 2013;40(5):3665–3674. pmid:23307300
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref15] 15. Thomson JM, Parker JS, Hammond SM. Microarray Analysis of miRNA Gene Expression. Methods Enzymol. 2007;427:107–122. pmid:17720481
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref16] 16. Han K, Xuan P, Ding J, Zhao ZJ, Hui L, Zhong YL. Prediction of disease-related microRNAs by incorporating functional similarity and common association information. Genet Mol Res. 2014;13(1):2009–2019. pmid:24737426
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref17] 17. Yu S, Liang C, Xiao Q, Li G, Ding P, Luo J. MCLPMDA: A novel method for miRNA-disease association prediction based on matrix completion and label propagation. J Cell Mol Med. 2019;23(2):1427–1438. pmid:30499204
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref18] 18. Chen X, Gong Y, Zhang D, You Z, Li Z. DRMDA: deep representations–based miRNA–disease association prediction. J Cell Mol Med. 2018;22(1):472–485. pmid:28857494
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref19] 19. Jiang Q, Hao Y, Wang G, Juan L, Wang Y. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol. 2010;4(SUPPL. 1):S2. pmid:20522252
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref20] 20. Chen X, Yan C, Zhang X, You Z, Deng L, Liu Y, et al. WBSMDA: Within and Between Score for MiRNA-Disease Association prediction. Sci Rep. 2016;6:21106. pmid:26880032
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref21] 21. Shen Z, Zhang YH, Han K, Nandi AK, Honig B, Huang DS. miRNA-Disease Association Prediction with Collaborative Matrix Factorization. Complexity. 2017;2017:1–9.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref22] 22. Chen X, Huang L, Xie D, Zhao Q. EGBMMDA: Extreme Gradient Boosting Machine for MiRNA-Disease Association prediction. Cell Death Dis. 2018;9(1):3. pmid:29305594
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref23] 23. Zhao Y, Chen X, Yin J. Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics. 2019;35(22):4730–4738. pmid:31038664
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref24] 24. Zhu XY, Wang XZ, Zhao HC, Pei TR, Kuang LN, Wang L. BHCMDA: A New Biased Conduction Based Method for Potential MiRNA-Disease Association Prediction. Front Genet. 2020;11(1):384. pmid:32425979
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref25] 25. Ha J, Park C, Park C, Park S. IMIPMF: Inferring miRNA-disease interactions using probabilistic matrix factorization. J Biomed Inform. 2020;102:103358. pmid:31857202
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref26] 26. Chen X, Liu M, Yan G. RWRMDA: predicting novel human microRNA-disease associations. Mol Biosyst. 2012;8(10):2792–2798. pmid:22875290
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref27] 27. Köhler S, Bauer S, Horn D, Robinson PN. Walking the Interactome for Prioritization of Candidate Disease Genes. The Am J Hum Genet. 2008;82(4):949–958. pmid:18371930
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref28] 28. Zhang H, Cao L, Gao S. A locality correlation preserving support vector machine. Pattern Recognition. 2014;47(9):3168–3178.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref29] 29. Shi H, Xu J, Zhang G, Xu L, Xia L. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013;7:101. pmid:24103777
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref30] 30. Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2017;14(4):905–915. pmid:27076459
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref31] 31. Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J Biomed Inform. 2017;66:194–203. pmid:28104458
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref32] 32. Niu Y, Wang G, Yan G, Chen X. Integrating random walk and binary regression to identify novel miRNA-disease association. BMC Bioinformatics. 2019;20:59. pmid:30691413
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref33] 33. Chen X, Yan CC, Zhang X, Li Z, Deng L, Zhang Y, et al. RBMMMDA: predicting multiple types of disease-microRNA associations. Sci Rep. 2015;5(1):13877. pmid:26347258
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref34] 34. You Z, Huang Z, Zhu Z, Yan G, Chen X. PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput Biol. 2017;13(1):e1005455. pmid:28339468
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref35] 35. Yan C, Wang JX, Ni P, Lan W, Wu FX, Pan Y. DNRLMF-MDA: Predicting microRNA-Disease Associations Based on Similarities of microRNAs and Diseases. IEEE/ACM Trans Comput Biol Bioinform. 2019;16(1):233–243. pmid:29990253
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref36] 36. Peng J, Hui W, Li Q, Chen B, Hao J, Jiang Q, et al. A learning-based framework for miRNA-disease association identification using neural networks. Bioinformatics. 2019;35(21):4364–4371. pmid:30977780
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref37] 37. Zheng K, You ZH, Wang L, Zhou Y, Li LP, Li ZW. MLMDA: a machine learning approach to predict and validate MicroRNA-disease associations by integrating of heterogeneous information source. J Transl Med. 2019;17(1):260. pmid:31395072
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

[ref38] 38. Chen X, Sun LG, Zhao Y. NCMCMDA: miRNA-disease association prediction through neighborhood constraint matrix completion. Brief Bioinform. 2021;22(1):485–496. pmid:31927572
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref39] 39. Zhang Y, Chen M, Cheng X, Wei H. MSFSP: A Novel miRNA-Disease Association Prediction Model by Federating Multiple-Similarities Fusion and Space Projection. Front Genet. 2020;11:389. pmid:32425980
View Article
PubMed/NCBI
Google Scholar

[152] View Article

[153] PubMed/NCBI

[154] Google Scholar

[ref40] 40. Ji C, Wang Y, Gao Z, Li L, Zheng C. A Semi-Supervised Learning Method for MiRNA-Disease Association Prediction Based on Variational Autoencoder. IEEE/ACM Trans Comput Biol Bioinform. 2021;1(1):99. pmid:33735084
View Article
PubMed/NCBI
Google Scholar

[156] View Article

[157] PubMed/NCBI

[158] Google Scholar

[ref41] 41. Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, et al. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2013;42(Database issue): D1070–D1074. pmid:24194601
View Article
PubMed/NCBI
Google Scholar

[160] View Article

[161] PubMed/NCBI

[162] Google Scholar

[ref42] 42. Jiang Q, Wang Y, Hao Y, Liran J, Teng M, Zhang X, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic Acids Res. 2009;37(Database issue): D98–D104. pmid:18927107
View Article
PubMed/NCBI
Google Scholar

[164] View Article

[165] PubMed/NCBI

[166] Google Scholar

[ref43] 43. Yang Z, Wu L, Wang A, Tang W, Zhao Y, Zhao H, et al. dbDEMC 2.0: Updated database of differentially expressed miRNAs in human cancers. Nucleic Acids Res. 2016;45(D1):D812–D818. pmid:27899556
View Article
PubMed/NCBI
Google Scholar

[168] View Article

[169] PubMed/NCBI

[170] Google Scholar

[ref44] 44. Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011;21(7):1109–1121. pmid:21536720
View Article
PubMed/NCBI
Google Scholar

[172] View Article

[173] PubMed/NCBI

[174] Google Scholar

[ref45] 45. Cheng L, Wang G, Li J, Zhang T, Xu P, Wang Y, et al. SIDD: A Semantically Integrated Database towards a Global View of Human Disease. PLoS One. 2013;8(10):e75504. pmid:24146757
View Article
PubMed/NCBI
Google Scholar

[176] View Article

[177] PubMed/NCBI

[178] Google Scholar

[ref46] 46. Lipscomb CE. Medical Subject Headings (MeSH). Bull Med Libr Assoc. 2000;88(3):265–266. pmid:10928714
View Article
PubMed/NCBI
Google Scholar

[180] View Article

[181] PubMed/NCBI

[182] Google Scholar

[ref47] 47. Wang D, Wang JY, Lu M, Song F, Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26(13):1644–1650. pmid:20439255
View Article
PubMed/NCBI
Google Scholar

[184] View Article

[185] PubMed/NCBI

[186] Google Scholar

[ref48] 48. Xuan P, Han K, Guo MZ, Guo YH, Li JB. Ding J, et al. Correction: Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors. PLoS One. 2013;8(9):e70204. pmid:24116246
View Article
PubMed/NCBI
Google Scholar

[188] View Article

[189] PubMed/NCBI

[190] Google Scholar

[ref49] 49. Goh KI, Cusick ME, Valle D, Childs B, Barabási AL. The human disease network. Proc Natl Acad Sci U S A. 2007;104(27):8685–8690. pmid:17502601
View Article
PubMed/NCBI
Google Scholar

[192] View Article

[193] PubMed/NCBI

[194] Google Scholar

[ref50] 50. Lu M, Zhang Q, Min D, Jing M, Guo Y, Guo W, et al. An Analysis of Human MicroRNA and Disease Associations. PLoS One. 2008;3(10):e3420. pmid:18923704
View Article
PubMed/NCBI
Google Scholar

[196] View Article

[197] PubMed/NCBI

[198] Google Scholar

[ref51] 51. Kozomara A, Griffiths-jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2013;42(D1):D68–D73. pmid:24275495
View Article
PubMed/NCBI
Google Scholar

[200] View Article

[201] PubMed/NCBI

[202] Google Scholar

[ref52] 52. Wang B, Mezlini AM, Demir F, Fiume M, Tu ZW, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11(3):333–337. pmid:24464287
View Article
PubMed/NCBI
Google Scholar

[204] View Article

[205] PubMed/NCBI

[206] Google Scholar

[ref53] 53. Zhang W, Liu XR, Chen YL, Wu WJ, Wang W, Li XH. Feature-derived graph regularized matrix factorization for predicting drug side effects-Science Direct. Neurocomputing. 2018;287:154–162.
View Article
Google Scholar

[208] View Article

[209] Google Scholar

[ref54] 54. Rana B, Juneja A, Saxena M, Gudwani S, Kumaran SS, Behari M, et al. Graph Theory based Spectral Feature Selection for Computer Aided Diagnosis of Parkinson’s Disease Using T1-weighted MRI. International Journal of Imaging Systems and Technology. 2015;25(3):245–255.
View Article
Google Scholar

[211] View Article

[212] Google Scholar

[ref55] 55. Wu Q, Wang Y, Gao Z, Ni J, Zheng C. MSCHLMDA: Multi-Similarity Based Combinative Hypergraph Learning for Predicting MiRNA-Disease Association. Front Genet. 2020;11:354. pmid:32351545
View Article
PubMed/NCBI
Google Scholar

[214] View Article

[215] PubMed/NCBI

[216] Google Scholar

[ref56] 56. Jiang Y, Liu B, Yu L, Yan C, Bian H. Predict MiRNA-Disease Association with Collaborative Filtering. Neuroinformatics. 2018;16:363–372. pmid:29948843
View Article
PubMed/NCBI
Google Scholar

[218] View Article

[219] PubMed/NCBI

[220] Google Scholar

[ref57] 57. Shao B, Liu B, Yan C. SACMDA: MiRNA-Disease Association Prediction with Short Acyclic Connections in Heterogeneous Graph. Neuroinformatics. 2018;16:373–382. pmid:29644547
View Article
PubMed/NCBI
Google Scholar

[222] View Article

[223] PubMed/NCBI

[224] Google Scholar

[ref58] 58. Xiao Q, Luo J, Liang C, Cai J, Ding P. A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics. 2018;34(2):239–248. pmid:28968779
View Article
PubMed/NCBI
Google Scholar

[226] View Article

[227] PubMed/NCBI

[228] Google Scholar

[ref59] 59. Gao Z, Wang Y, Wu Q, Ni J, Zheng C. Graph regularized L_2,1-nonnegative matrix factorization for miRNA-disease association prediction. BMC Bioinformatics. 2020;21:61. pmid:32070280
View Article
PubMed/NCBI
Google Scholar

[230] View Article

[231] PubMed/NCBI

[232] Google Scholar

[ref60] 60. Gao Y, Cui Z, Liu J, Wang J, Zheng C. NPCMF: Nearest Profile-based Collaborative Matrix Factorization method for predicting miRNA-disease associations. BMC Bioinformatics. 2019;20(1):353. pmid:31234797
View Article
PubMed/NCBI
Google Scholar

[234] View Article

[235] PubMed/NCBI

[236] Google Scholar

[ref61] 61. Chen X, Li S, Yin J, Wang C. Potential miRNA-disease association prediction based on kernelized Bayesian matrix factorization. Genomics. 2020;112(1):809–819. pmid:31136792
View Article
PubMed/NCBI
Google Scholar

[238] View Article

[239] PubMed/NCBI

[240] Google Scholar

[ref62] 62. Torre LA, Bray F, Siegel RL, Tieulent JL, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65(2):87–108. pmid:25651787
View Article
PubMed/NCBI
Google Scholar

[242] View Article

[243] PubMed/NCBI

[244] Google Scholar

[ref63] 63. Hiroko OK, Masashi I, Daisuke K, Yoshitaka H, Yasuhide Y, Koh F, et al. Circulating Exosomal microRNAs as Biomarkers of Colon Cancer. PLoS One. 2014;9(4):e92921. pmid:24705249
View Article
PubMed/NCBI
Google Scholar

[246] View Article

[247] PubMed/NCBI

[248] Google Scholar

Figures

Abstract

Author summary

Introduction

Materials

Human miRNA-disease associations

Disease functional similarity

Disease semantic similarity

miRNA functional similarity

miRNA sequence similarity

Gaussian interaction profile kernel similarity for diseases and miRNAs

Methods

Overview

Integrating similarity for diseases and miRNAs

Similarity constrained matrix factorization

Optimization algorithm

Results

Parameters optimization

Model comparison

Case studies

Discussion and conclusion

Supporting information

S1 Table. Known human miRNA-disease associations obtained from HMDD v2.0 database.

S2 Table. Names of 383 diseases involved in known human miRNA-disease associations obtained from HMDD v2.0 database.

S3 Table. Names of 495 miRNAs involved in known human miRNA-disease associations obtained from HMDD v2.0 database.

S4 Table. The constructed disease functional similarity score matrix.

S5 Table. The constructed disease semantic similarity score matrix.

S6 Table. The constructed miRNA functional similarity score matrix.

S7 Table. The constructed miRNA sequence similarity score matrix.

References