| Since sildenafil was accidentally found as a non-original target indication in clinical observation,drug development has gradually evolved into a new field drug repositioning.This field is focused on finding new disease indications for drugs,making old drugs "rejuvenate" and turning unsuccessful drugs into treasure.Since the concerned drugs have been clinically verified for toxic and side effects,drug repositioning has greatly reduced the economic cost and years of time to market.In recent years,it has been more and more discussed and researched.In order to learn the highly nonlinear characteristics in heterogeneous networks and adapt to the model extensibility of multi-source data better,this paper uses the research framework of graph representation learning combined with link prediction to establish two models.Based on the research of existing literature,there are few studies focusing on the heterogeneous semantic extraction and fusion of multi-source data.Therefore,the proposed model I uses meta-path graph to sample the neighborhood of nodes in heterogeneous networks,makes the topological structure contain complex semantic information,and improves the graph convolution network model.Compared with the random walk graph sampling improved graph convolution model,the AUROC and AUPRC of model I are both increased by 4.7%,and the F1 score is increased by 2.1%.It shows that the improved idea of model I is beneficial to semantic extraction of heterogeneous networks.Compared with the graph model and matrix decomposition model,AUROC,AUPRC and F1 score of model I was 8.1%,14.8%and 10.3% higher than the best performance of comparative experiments.This paper has further proposed model II,HGAlinker by extending dataset,and considering differences of heterogeneous nodes and edges.It uses meta-path to mine three kinds of heterogeneous graph patterns of drugs and disease nodes to make them contain different semantic information.Since graph convolution model treats neighborhood nodes equally,model II selects heterogeneous graph attention network to learn and fuse node information and semantic information in heterogeneous graph model hierarchically,and inputs the fusion results into downstream connection prediction model.Model II compared graph model,matrix decomposition model and deep learning model from three dimensions of learning ability,robustness and biological interpretability.Model II was 1.3%higher than the best performance of comparative experiments at AUROC and AUPRC,and recall of K was higher than 0.4 in the experimental range.At the same time,model II also analyzes the sensitivity of parameters,and explains the biological interpretability through two case studies. |