| Drug discovery which identifies the potential target-protein by predicting the interaction between drugs and targets(genes/proteins).In the data-driven drug discovery tasks,an effective approach to predict drug-targets relations from the constructed phenotype-drugmolecule multi-level knowledge graph is to use knowledge representation learning methods,which map the high-dimensional and sparse knowledge graph to a low-dimensional and dense vector space to improve the computational efficiency of drug discovery.However,there are a large number of complex relations(N-N,1-N,N-1),few relations types,and uneven sample distribution in the phenotype-drug-molecule multi-level knowledge graph,which brings the great challenges to knowledge representation learning.The following research has been done to focus on the above problems:(1)To address the problem that the knowledge graph has many-to-many and multi-level complex relations that are difficult to fully express,we propose a multi-level complex relations knowledge graph completion method based on CP tensor decomposition,which treat the knowledge graph as a third-order binary tensor,and use CP decomposition to decompose the third-order tensor into the sum of multiple rank-one tensors,which is the sum of the outer products of the head entity embedding,relation embedding,and tail entity embedding for each triple,and convert it into a super-diagonal tensor product the factor matrix of each mode,and use scoring function calculate the probability that the triple of missing relation is true.Link prediction experimental results from four different domains of benchmarks knowledge graph datasets show that the proposed methods are better than other comparison methods,it also can express the complex relations of knowledge graph,and the decomposition has uniqueness,reduces the total amount of calculations and parameters,avoids overfitting.(2)Aiming at the fact that there are many 1-N and N-1 relations in the knowledge graph with many nodes and few edges,which can easily confuse the semantics of entities,and at the fact that there are unimportant triples(knowledge map noise)of insignificant or low actual occurrence rates in the prediction of the relations between drugs and targets,such as treatment,mapping and pathways,etc.We propose a Noise-reduction-based drug-target relations prediction model.The model first determines the most valuable embedding network by screening entities and relations through data noise reduction,and then embeds the tail entities of the screened golden triplet on the hyper-ellipsoidal sphere for modeling,and finally applies it to the drug-target relation prediction.Link prediction experimental showed that the MRR increased by 14.6% compared with CP tensor decomposition model on the Biomedical Knowledge Map(DDG),and increased by 12.5% compared with CP on the New Crown Research Viral Drug Atlas(ADKG).(3)Aiming at the problems of unbalanced sample distribution and insufficient learning of negative sampling in the knowledge graph,we propose an adversarial-based drug-target relations prediction method,which first add adversarial negative sampling to construct its corresponding adversarial samples and mix them with the original samples during the pretraining process after data noise reduction of the Noise-reduction-based drug-target relations prediction model,so as to optimize the knowledge representation vector of few-sample triples,and improve the stability of the generated confrontation network.Then put all the samples into the Noise-reduction-based drug-target relations prediction model,and embeds the tail entities embedding on the hyper-ellipsoidal sphere for training to produce a strong prediction model.Finally,the model is applied to all possible new interactions between unknown drugs and targets.Experimental results show that the MRR is improved by 6.9% on DDG and 8.5% on ADKG compared to the Noise-reduction-based drug-target relations prediction model. |