Font Size: a A A

Based On Deep Learning For Predictive Research Of Protein-compound Binding

Posted on:2021-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:F HouFull Text:PDF
GTID:2404330611452013Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
Modern pharmacological research has accumulated a large amount of data on the binding of protein-compound.However,to date,there have been still many compounds lacking information on binding to proteins,which limits the further rapid development of pharmacology.The traditional research methods based on pharmacological experiments demand heavy investment,long experimental period,and less data can be collected.Deep learning is a technique to build a neural network model for learning based on huge amounts of historical data collected from a specific problem.The rapid development of computer hardware technology(CPU and GPU,etc.)makes the process of deep learning possible.Therefore,based on the existing massive protein-compound binding data,a deep learning model is constructed for training,from which proteincompound binding features are extracted,and predictions can be made based on these features.The model training can be completed in a short time and any proteincompound binding can be predicted,which provides new clues for pharmacological research.This thesis mainly uses the TensorFlow framework developed by Google to build a neural network model for training and predicting.All data comes from the International Bioinformatics Database——BindingDB.By processing the original data,each piece of data finally obtained indicates whether a compound can bind to a protein.The positive samples marked as 1 are protein-compound binding data,while the negative samples marked as 0 mean that the compound cannot bind to the protein.A total of about 7 million samples are used in this thesis and divided into three parts,with ten thousand samples used for verification,ten thousand samples for testing,and the remaining samples for training.This thesis trains two models,one is a convolutional neural network model M1,and the other is a fully-connected neural network model M2.Each model is divided into two parts,in the M1 model,the first part utilizes three different types of convolution kernels to extract atomic blocks in compounds,chemical bond blocks,and characteristics of proteins represented by amino acid sequences,respectively.The first part of the M2 model extracts features of atomic blocks,chemical bond blocks and protein blocks through three fully-connected networks.The second part of the two models is the fully-connected neural network with several hidden layers,and the number of nodes in each hidden layer decreases layer by layer.The final output layer has two nodes,with one-hot coding to indicate whether the compound and protein can bind.The work of this thesis includes the following stages: theoretical preparation,downloading the original data,analyzing the data,processing the data,determining the model,writing the code,and training the model until the result is obtained,the longest duration of a single experiment is nearly 400 hours.It is found that the performance of the M2 model is better than the M1 model,and the accuracy rate in the test set is 89%.It can be seen that deep learning has high credibility for the prediction of unknown protein-compound bindings,and has certain reference value for pharmaceutical research and development.
Keywords/Search Tags:deep learning, protein-compound, binding, convolutional neural network, fully-connected neural network
PDF Full Text Request
Related items