The strong drug-targets interaction is often required for drug molecules to exert their efficacy.Compared with drug target interactions identification methods based on biochemical experiments,drug target prediction methods based computational simulation have attracted much attention due to their low cost and high efficiency.The development of drug-target interactions prediction methods can not only promote the identification of potential drug targets,but also accelerate the discovery of hit compounds,which has great practical application significance.With the advent of the big data era,artificial intelligence techniques are being used to mine general laws of drug-target interactions from large-scale biomedical datasets,especially the emerging graph neural network framework.Because of its ability to deal with irregular data structures such as small molecules and pockets,and its powerful feature extraction capabilities,it provides important technical support for the development of new drugtarget interactions prediction algorithms and shows broad application prospects in drug design.This dissertation focused on the topic of drug-target interactions based on graph neural network.The dissertation reviewed the application of graph neural networks in the theoretical study of drug-target interactions(chapter 1),and then used the graph convolutional neural network framework to study drug-target interactions from transcriptomics(chapter 2)and protein-ligand binding conformations(chapter 3).The drug target prediction algorithm study based on the gene expression profiling explored the potential targets of compounds from the perspectives of cellular transcriptomics and RNA biology.Although compound perturbation profiles and gene knockdown profiles contain rich information on drug-target interactions,the complex changes of biological networks under different conditions increase the difficulty of mining drug-target interactions from high-dimensional and high-noise gene expression profiles.In order to solve these problems,this work drawed on the ideas of contrastive learning and metric learning,and designed an architecture named Siamese spectralbased graph convolutional network,which integrated biological network information under different conditions into a deep learning model to identify drugs.targets.While traditional machine learning methods usually rely on well-designed descriptors to describe the local structure of protein-protein interaction networks,the model may more systematically consider the interactions between nodes in biological networks.In addition,the model can remove noise from gene expression profiles to recover the hidden correlation between compound perturbation profiles and gene knockdown profiles.Therefore,on the benchmark dataset,the model achieved higher target prediction accuracy compared to previous methods such as CMap and the method based on random forest.Further,we experimentally validated the model and tried to solve practical target identification and drug screening problems with this model.In the first application scenario,we established a compound-centric target prediction pipeline and took the prediction of potential host targets of nelfinavir as a case study.The experimental results successfully verified that cyclophilin A is the target of nelfinavir;In the second application scenario,we established a target-centric drug screening pipeline with the screening of exonucleotide pyrophosphatase/phosphodiesterase 1 inhibitors as a case study.The experiment successfully found and confirmed that the drug methotrexate is a new nucleotide pyrophosphatase/phosphodiesterase 1 inhibitor.In general,the compound-centric and target-centric processes based on the Siamese spectral-based graph convolutional network model can be used to infer drug targets and indentify compounds with specific targets.The drug-target interaction algorithm study based on atom pair potential explored the virtual screening strategy of novel target system from the perspective of proteinligand binding conformation.Protein Data Bank contains large-scale data on nonbonded interactions.And graph neural networks can be used to learn intermolecular interactions from it.We collected protein-ligand complex structures from the Protein Data Bank and constructed a larger unbiased dataset to enrich the existing scoring function dataset.Then,based on the graph convolutional neural network named Attentive FP developed by our research group,we proposed a graph neural network model for node feature aggregation based on physical atom pair potential,and developed a novel scoring function algorithm named Graph Potential.Traditional machine learning methods often use descriptors such as physics-based energy terms,knowledge-based potential functions,statistics-based atomic pair occurrence frequencies,and surface area changes to characterize protein-ligand interactions.Graph Potential mainly describes protein-ligand interactions through physics-based phenomenological atom pair potentials,which can consider a wider range of protein-ligand interaction types.In addition,the introduction of cross-screening decoys enables Graph Potential to actively learn the interaction pattern of in the protein-ligand binding conformation,rather than merely memorizing the ligand topology.Therefore,the results on the benchmark dataset show that the screening ability of Graph Potential is superior to the commonly used Glide Score-SP and other machine learning algorithms,reaching the current level of excellence.Graph Potential incorporates physical priors into deep learning models and it is a promising approach to improve the generalization ability of scoring function models.Overall,the dissertation uses the emerging graph neural network technology to study the key issue of drug-target interactions from the perspectives of transcriptomics and protein-ligand binding conformation.The results show that the graph neural network can learn the law of drug-target interactions through irregular and highdimensional biomedical data due to its powerful representation learning ability. |