Font Size: a A A

Compound-Protein Interaction Prediction Based On Deep Learning

Posted on:2019-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z B WangFull Text:PDF
GTID:2334330569489990Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Identifying compound-protein interactions plays a crucial role in drug discovery and drug design,which provides a valuable reference for understanding drug efficacy,targeting,and adverse side effects of new drugs.The traditional method is to verify the interaction between compounds and proteins through experiments,which is time-consuming and laborious,and it is impossible to verify all the compounds and proteins one by one which has great limitations.While the prediction of compound-protein interactions by computational methods can make use of the computer's powerful computational capabilities and parallel computing algorithms to significantly reduce the prediction time,and the prediction process has the characteristics of fast intelligence,low cost,and wide coverage.Therefore,these methods have more practical application value,in comparison with traditional method.There are many computational methods for identifying compound-protein interactions.Currently,the mainstream is the docking technique,which uses the “lock-key principle” of ligands and receptors to simulate the interaction between small molecule ligands and receptor biological macromolecules.However,this method requires a background of chemical expertise,which is difficult for most computer professionals,and the accuracy is not high.The development of deep learning technology has greatly reduced the requirements of programmers for professional background knowledge and provided new possibilities for identifying compound-protein interactions.The main task of this thesis is to construct a deep neural network model.The input layer is a 2640-dimensional vector representing 2640 characteristic figures of the compound,and the output layer is a 10-dimensional vector representing 10 different proteins.The model uses the back-propagation algorithm and it has 5-layers which include one input layer,three hidden layers,and one output layer.Among them,the first hidden layer includes 1000 nodes,the second hidden layer includes 800 nodes,and the third hidden layer includes 500 nodes.After manual screening,10 protein data with the frequency of the top 10 and the compound data that interacts with them were extracted from the original data set which were used as the multi-label samples input into the model for training,and one-tenth of the total sample set was selected to make the prediction.After hundreds of repeated experiments,the single experiment lasted more than 170 hours,and the total duration lasted for more than 9 months.The model parameters were continuously adjusted and the experimental results were continuously optimized.The optimal experimental results show that in the multi-label classification of 10 labels,the model has an accuracy of 0.73.Then,this thesis upgraded the multi-label classification,mainly in expanding the number of labels from 10 to 100,and to 1000,and repeating the above process.The final experimental results show that as the number of tags increases,the accuracy rate obtained gradually decreases.Concerning the limitation of hardware computing capabilities and data incompleteness,there is still room for improvement in the current results,but it is also sufficient to show that the method based on deep learning is effective and feasible for the identification of compound-protein interactions.
Keywords/Search Tags:Deep Learning, Compound-Protein Interactions, Deep Neural Networks, Multi-label classification
PDF Full Text Request
Related items