Font Size: a A A

Research On Prediction Of Compound And Protein Binding Relationship Based On Composite Deep Learning Model

Posted on:2022-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2491306491485444Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The binding relationship between compound and protein has important suggestive significance for drug development.At present,drug development involves multiple components,multiple targets,and multiple modes of action,and drug development is faced with the problems of high cost,long cycle,and low success rate.The existing data on the binding relationship between compounds and proteins are accumulated through experimental research in modern biology,chemistry,and medicine and collected in different knowledge bases.In recent years,deep learning models have been widely used to study the binding relationship between compounds and proteins.However,there is a problem of high false positives in predicting the binding relationship between compounds and proteins through deep learning models.Reducing the false positive rate in the prediction of the binding relationship between compounds and proteins has important practical significance.This study proposes a method based on positive and negative neural network models to predict the binding relationship between compound and protein using a decision fusion mechanism.The data used in this study comes from the Binding DB database.After preprocessing,we can obtain samples of compound and protein binding and then use random matching and removal of known bound samples to generate unbound samples.This research is a binary classification problem and supervised learning,so the samples are labeled,and the purpose of the positive model is to learn the binding characteristics of the compound and the protein,that is,the label of the bound sample is 1,the label of the unbound sample is 0.The purpose of the negative model is to learn the non-binding characteristics of the compound and the protein.The unbound sample label is 1,and the label of the bound sample is 0.The neural network structure adopted by the positive and negative models is a composite neural network model that combines a recurrent neural network and a convolutional neural network.This composite neural network model mainly consists of three parts: First,this study uses three LSTMs to extract features of three variable-length data of compound atom block,chemical bond block,and protein amino acid sequence,respectively,and then the output of the three recurrent neural networks is spliced and combined.Moreover,transform the dimensions as the input of the convolutional neural network;Second,use the convolutional neural network to learn the characteristics of the binding relationship between the compound and the protein,and use the fully connected layer to extract the characteristics of the binding relationship between the compound and the protein by the convolutional neural network module Perform a two-class classification task.Third,use the Soft Max layer in the output layer to convert the fully connected layer’s output into the probability of a class,thereby predicting the sample’s class.The positive and negative models respectively predict each sample category,and then based on the decision fusion mechanism,the samples predicted to be unbound by the negative model are removed from the samples predicted to be bound by the positive model to obtain the final bound sample.In this research,from the collection,analysis and preprocessing of original data,generation of sample sets,model construction,code writing and debugging,to the training of the final model,dozens of solutions were tested during this period.After more than a year,one was obtained.Set the optimal hyperparameters,and train the final positive and negative models based on this.100 compounds were randomly selected and 7181 proteins were used to predict the actual application.After the fusion method was determined by positive and negative models,the accuracy rate increased from 94.61% to 98.42%,an increase of 3.81%,the accuracy rate increased from 0.39% to1.25%,an increase of 2.2 times,and the number of false positive samples decreased by 70.78%.This study has achieved certain results in reducing the false-positive prediction of the binding relationship between the compound and the protein,but the predicted expectation of the compound bound to the protein in the real data and the expectation of the compound bound to the protein in the real data is about 1.88,and this research predicts The final expectation is about 114.53,which needs to be improved by follow-up research.
Keywords/Search Tags:deep learning, the binding relationship between compounds and proteins, composite model, decision fusion
PDF Full Text Request
Related items