Font Size: a A A

Research On The Interactions Between Compounds And Proteins Based On Deep Learning

Posted on:2020-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:C Q LiFull Text:PDF
GTID:2404330596487361Subject:Engineering·Computer Technology
Abstract/Summary:PDF Full Text Request
The research on the interactions between compounds and proteins can produce compound-protein combinations which have important implications for drug design and development.Traditional drug research and development is usually based on experimental validation,which may omit important candidate combinations and has long development cycle,high cost and low success rate.At present,clinical and animal cell experiments have accumulated a large amount of data on the interaction between compounds and proteins,which provides the possibility of discovering new combinations.In recent years,the method based on deep learning has made breakthroughs in many fields.Inspired by the information processing of biological nervous system,this method can extract features automatically from a large number of training data.Therefore,using this method to train the dataset of millions of samples can help us explore and discover new interaction patterns between compounds and proteins and then predicts proteins that interact with specific compounds,thus providing a small scope,relatively reliable assumption for experimental validation of drug design and development.The main work of this paper is to build and train a deep learning model for predicting compound and protein interactions using the TensorFlow framework.The data of this paper are from BindingDB database.In this paper,the compound-protein combination data extracted from BindingDB are taken as positive samples,which are labeled as 1,and the data which are randomly paired compound-protein and removed positive ones are taken as negative samples,which are labeled as 0.After mixing positive and negative samples,the training set,verification set and test set are divided according to the ratio of 98:1:1.The deep learning model constructed in this paper is a composite network composed of three recurrent neural networks and a deep feedforward neural network.The network model consists of the following parts: The first part is three dynamic RNN feature extraction networks,which are used to extract the features of atom block in compounds,chemical bond block in compounds and amino acid sequence of protein data,respectively.Then the output of the three dynamic RNN feature extraction networks is concatenate to one feature vector and used as the input of the second part of the network.The second part is a fully connected neural network with five hidden layers,which is used to learn the interactions between compounds and proteins.The last layer is the output layer,which has two nodes and labels in one-hot form.In this way,the research of the interactions between compounds and proteins can be transformed into a problem of binary classification.From the analysis of raw data,the determination of the program,the preparation of debugging code to the training of the final model,the entire research process has tried nearly 100 solutions,which lasted two years.The final model evaluated on the test set gets the accuracy of 97.32%,the F1-Score of 97.39% and the AUC of 99.58%.It can be seen from the result that the research of this paper on the interaction between compounds and proteins with deep learning have important implications for exploring and discovering new compound-protein combinations.
Keywords/Search Tags:deep learning, compound, protein, compound-protein interactions, binary classification
PDF Full Text Request
Related items