Font Size: a A A

Research On Protein Interaction Sites Prediction Method Based On Ensemble Convolutional Network

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhuFull Text:PDF
GTID:2370330620965714Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein is an important component of all cells and tissues in the human body and an important material basis for life activities.However,the functions of organisms are not performed independently by a single protein,but by the interaction between proteins.If the protein interaction is abnormal,it will affect the activity and function of the cell,which will cause various diseases.For protein interaction,in essence,it is achieved through the mutual binding of some residues,amino acids from which water molecules are removed,on the protein,and these residues are called protein interaction sites.The research content of this article is mainly to determine which residues on the protein are involved in the interaction of the protein.The research has important influence and significance on the related researches such as understanding the mechanism of life activities,exploring the principles of protein interactions,and discovering new drug target-protein interactions.For the study of protein interaction site prediction,biological experiment methods are used,which not only has a long period but also consumes a lot of manpower and material resources.Therefore,computational methods to predict protein interaction sites have become the mainstream method.So far,a large number of computational methods have been proposed to predict protein interaction sites,but the prediction results are still quite far from the traditional experimental methods.In this case,new computational methods are needed to continuously improve the accuracy of protein interaction site prediction,which is also the motivation for this study.Summarizing the previous computational methods for predicting protein interaction sites,many of these studies often have the following problems:(1)They often took feature information of multiple adjacent residues,then vectorize them in one dimension,and then input to different learning algorithms.However,this one-dimensional vectorization operation destroys the contextual relationship between residues,thereby losing some important feature information.(2)Most of the machine learning methods used in the previous methods do not have the ability to learn the context of the residues,which leads to unsatisfactory prediction results.(3)The previous method will send the indiscriminate combination of different characteristics to the algorithm to learn,but considering the difference in the biological expression of different characteristics,the indiscriminate combination of these characteristics may affect the original characteristics distinctiveness.In view of the shortcomings of the previous prediction methods,the solutions proposed in this paper:(1)This paper first proposes the concept of feature graphs to characterize the feature information of residues.Base contextual feature information.The feature graphs constructed in this paper include PSSM evolution feature graph,PhyChem physicochemical feature graph,and PSAIA structure feature graph.(2)This article used a deep convolutional neural network to learn the feature graph we constructed.Among them,the ability of local-connection and weight-sharing of the convolutional neural network can extract the context information of the feature graphs and the feature information of the residue.In addition,deep convolution can extract high-dimensional abstract information from feature graphs.(3)Considering the different expressions of each type of original features in biology,for feature graphs constructed based on different original features,we trained them on deep convolutional neural network learners,and then use ensemble learning to integrate them together,then it can avoid the influence of the indistinguishable combination of different characteristics,and also improve the prediction ability of the model.In order to verify the performance of the proposed ConvsPPIS prediction method,two protein interaction datasets were extracted and the corresponding labels of the corresponding residues were calculated.At the same time,some important parameters of ConvsPPIS model are optimized.In comparison with some previous computational methods,the experimental results show that the performance of the ConvsPPIS model proposed in this paper is better.Among them,the accuracy of ConvsPPIS is 88%,the recall is 59%,the precision is 85%,the F1 score is 69%,and the Matthews correlation coefficient is 65%.Finally,in order to test the generalization ability of our method,we verified the performance of our method on an independent test set,which is superior to most other methods.Among them,the accuracy is 70%,the recall is 54%,the precision is 40%,the F1 score is 46%,and the Matthews correlation coefficient is 26%.Finally,due to the excellent performance of the ConvsPPIS model,it will have a certain reference value for the future study of protein interaction sites.
Keywords/Search Tags:Protein interaction sites, Deep convolutional neural network, Ensemble learning, Feature graphs
PDF Full Text Request
Related items