Font Size: a A A

A Deep Learning Model For Drug Virtual Screening

Posted on:2022-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhengFull Text:PDF
GTID:2504306779969519Subject:Pharmaceutics
Abstract/Summary:PDF Full Text Request
Drug design and discovery is an important stage of drug research and development to find active hit compounds for specific targets.The traditional way is to screen hit compounds through high-throughput experiments,but they have the disadvantages of long time-consuming,high expenditure and low success rate.With the rapid development of computer technology,virtual screening is gradually mature,which is a method to screen active compounds from the large compound database by computer,which not only greatly shortens the time of screening hit compounds,but also improves the effectiveness and accuracy of screening.In this thesis,three datasets-DUD-E,MUV and Kernie,are used for experiments,and a deep learning model for structure-based virtual screening is established.The main research contents and conclusions are as follows.1.We extract features of ligand-target complexes formed by Smina,a software of molecular docking,including atomic type,atomic charge,distance from reference atom and atomic amino acid type.In this thesis,we propose Deffini as a structure-based virtual screening neural network model.And then,we optimize hyper-parameters of Deffini by Tree Parzen Estimators algorithm.2.We use the three-fold clustered cross-validation,which according to the sequence similarity of targets,the data is divided into three folds,and the samples of similar targets are put into the same fold.It helps avoid overestimation of model performance caused by high similarity between testing set and training set.Three-fold clustered cross-validation is carried out on DUD-E by using Deffini.3.Using four comparison models,including Smina,a software of molecular docking,and three deep learning models-Gan DTI,CNN model of ligand-based virtual screening and Transformer model of ligand-based virtual screening,three-fold clustered cross-validation is carried out on DUD-E,and we compare some metrics of each model,including AUC-ROC,AUC-PRC,1%Enrichment Factor(EF1%)and 5%Enrichment Factor(EF5%).We analyze the differences of performance between each model.It is found that the performance of the three deep learning models is significantly better than that of Smina,but our Deffini has the best performance.4.We build the family-specific model of target protein,which ensures that the targets of training set and testing set come from the same protein family.Compared with the pan-family model,the performance and generalization ability of the family-specific model are both significantly improved.Due to the data of kinase contained in DUD-E is a little bit small,in order to explore the family-specific model well,we train on Kernie,a new and larger protein kinase dataset,and using the idea of Transfer Learning,test on MUV kinases.Compared with Deffini trained with DUD-E kinases,the values of AUC-ROC,AUC-PRC,EF1%and EF5%of the Deffini trained with Kernie are higher,and Deffini trained with Kernie has stronger generalization ability and better model performance.In this thesis,based on the three-dimensional structure of ligand-target complex,we propose a CNN model of virtual screening based on structure(Deffini),and also compare and analyze the training methods of pan-family and family-specific.Deffini and the training method of family-specific improve the generalization ability,accuracy and effectiveness of virtual screening to a certain extent.
Keywords/Search Tags:Drug Virtual Screening, Deep Learning, Convolutional Neural Network, Structure-Based, Family-Specific Model
PDF Full Text Request
Related items