Background
In the past few decades, people have made remarkable progress in life sciences and genomics. However, the development of a new drug is still a high-risky, tremendously expensive and time-consuming process [
1,
2]. On average, it takes about 15 years and costs more than $ 800 million to discover and bring a new drug to the market [
3,
4]. Although tremendous investment in new drugs design and discovery, the number of new drugs authorized by the U.S. Food and Drug Administration (FDA) has remained low since the 1990s [
5,
6]. About 90% new drugs designed for specific diseases fail the first phase of clinical trials, which means that new drugs design and discovery are becoming more and more costly [
7]. In light of these challenges, repositioning of already commercialized drugs, which aims to identify and discover the new therapeutic uses for these drugs, is attracting strong increasing interests from the biomedical researchers and pharmaceutical companies [
8]. Since existing drugs have been proven safe through various clinical trials, drug repositioning can lower risk, shorten the process of drug development, and are more likely to be approved by regulatory authorities [
9]. Therefore, drug repositioning plays an important role in drug research and development. Nowadays, some existing drugs (e.g. Minoxidil, Thalidomide, Sildenafil) have been successfully repositioned in clinical trials, which have saved new drug development costs and created great economic value for related pharmaceutical companies [
10]. For example, Minoxidil, originally commercialized to prevent high blood pressure, was repositioned to treat the androgenic alopecia; Thalidomide was marketed to use as a sedative, it was later repurposed as a treatment to insomnia and nausea [
11,
12]. Compared with the development of a novel drug for specific indications, drug repositioning costs only about $ 300 million and can shorten the drug development cycle by more than half [
10,
13]. To this end, more and more existing drugs are being repurposed to treat diseases other than those originally intended [
14].
In fact, drug repositioning can be seen as identifying the associations between drug and disease. Although some associations of drugs with diseases have been verified in clinical trials, many of them are still undiscovered. In recent years, some computational approaches have been developed to infer associations between drug and disease for drug repositioning, such as semantic inference [
1], network analysis [
15], text mining [
16] and machine learning [
17], etc. For example, Napolitano et al. trained a multi-class Support Vector Machine (SVM) classifier based on drugs similarities to identify potential drug indications [
18]. Gottlieb et al. constructed classification features by integrating disease similarities and drug similarities, and scored the new associations of drugs with diseases to predict novel therapeutic indications by implementing a logistic regression classification algorithm [
19]. Based on the hypothesis that different diseases with similar treatments can be treated with similar drugs, Chiang et al. developed a “guilt-by-association” principle approach to infer potential relationships between drug and disease [
20]. Yang et al. developed a causal network linking drug-target-pathway-gene-disease to calculate association scores of drugs with diseases. Based on known drug-disease associations, a probabilistic matrix factorization model is learned to classify drug–disease associations, and novel associations of drugs with diseases were predicted according to the calculated association scores and association types [
21]. However, these methods fail to predict associations of novel drugs without any known related disease.
At present, with the generation of large-scale high-throughput biological data, researchers are increasingly concerned how to establish complex biomolecular interaction networks for predicting their associations. Martínez et al. have developed a novel model, DrugNet, to infer new treatments for diseases and novel therapeutic indications for drugs [
22]. This method predicts drug-disease potential associations by prioritizing based on a heterogeneous network which was integrated biological information about drugs, targets and diseases. Wang et al. proposed three-layer heterogeneous network-based computational method named TL-HGBI, which performs drug repositioning by applying known drug-disease associations and drug, disease and target similarities [
23]. Luo et al. presented a new prediction model MBiRW, which utilized Bi-Random walk algorithm to infer new drug indications based on the assumption that similar drugs tend to be associated with the different diseases that with similar treatments [
24].
In fact, predicting novel indications for existing drugs can be considered as a recommendation system problem. Recently, recommendation system models have been used to predict associations between biomolecules (e.g. drug-target interactions, circRNA-disease associations) [
25,
26]. Luo et al. developed a drug repositioning recommendation system (DRRS) to infer new indications for existing drugs, which used fast Singular Value Thresholding (SVT) algorithm to complete the association adjacency matrix of drug with disease [
27]. Similar to finding missing interactions in an adjacency matrix, matrix factorization is well applied in collaborative filtering recommendation algorithms [
28]. Recent studies have shown that matrix factorization technique has been successfully used in recommender system and link prediction for data representation [
29,
30], especially in the field of bioinformatics [
31‐
33]. Inspired by these, we can view the drug-disease association prediction problem as a recommender system task and used matrix factorization to predict.
In this paper, we propose a new computational method named WNMFDDA to infer the unknown associations of drugs with diseases, which is based on weighted graph regularized collaborative non-negative matrix factorization. Distinct from previous methods, graph Laplacian regularization is introduced to prevent overfitting, which can ensure close drugs or diseases are sufficiently close to each other in the corresponding latent feature space; Tikhonov (\({L}_{2}\)) is used to guarantee that the solution obtained from matrix factorization is smooth. In addition, in order to extend our model to work for new drugs (or new diseases) and reduce the impact of sparse associations on prediction performance, weighted \(K\)-nearest neighbor is utilized to rebuild the association adjacency matrix between drug and disease before performing matrix factorization. We carry out ten-fold cross validation to verify the performance of WNMFDDA and compared it with several classical models. The experimental results of cross validation show that WNMFDDA obtains better performance than other compared models. Case studies on drugs and diseases also demonstrate that our proposed approach is reliable in identifying drug-disease potential associations.
Conclusions
Identifying new indications for existing drugs is a promising alternative to drug development, which not only saves time and costs, but also reduces risks and expedites drug approval. In this work, a model based on weight non-negative matrix factorization, WNMFDDA, was proposed to predict potential drug-disease associations. Different from other traditional computational methods, WNMFDDA reformulate the adjacency association matrix based on weighted \(K\) nearest neighbor profiles as a preprocessing step, which enables it to infer potential associations for novel diseases/drugs without any known associated with drugs/diseases. Meanwhile, graph regularized matrix factorization was used to calculate the association scores.
We conducted 10-CV on two datasets and case studies on Fdataset to verify the performance of our developed model. Comprehensive experimental results demonstrate that WNMFDDA outperforms other state-of-the-art approaches, and can effectively infer potential associations between drug and disease. We believe that WNMFDDA is helpful for relevant biomedical researchers in follow-up studies. However, WNMFDDA still has some limitations. Firstly, the number of experimental verified drug-disease associations used in this work is relatively sparse. Secondly, determining the optimal parameter combinations for different biological datasets is still a daunting task. Finally, how to reasonably incorporate more effective drug and disease features to enhance the performance of WNMFDDA deserves further research.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.