Font Size: a A A

Prediction Research Of Protein-Protein Interaction Based On Ensemble Of Support Vector Machine And Random Forest

Posted on:2020-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:T Y LiuFull Text:PDF
GTID:2370330596470887Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein,as an important component of life,participates in the normal maintenance of life activities.As executors of cell function,most proteins interact with other proteins to form complex to regulate life activities.The study of protein-protein interaction has a positive significance for the research of disease diagnosis and treatment,drug screening and so on.At present,high-throughput biotechnology,widely used in biological experimental methods,can be used to determine protein interaction.However,due to its high time cost and economic cost,it is impossible to achieve large-scale application.Therefore,it is of great practical significance to predict protein-protein interactions by computational methods.Predicting protein-protein interactions has always been a hot topic in the field of computational biology.In order to reduce the influence of redundant data on prediction accuracy,the evolutionary conservatism,co-evolutionary and solvent accessibility characteristics of sequence information are selected and the evolutionary conservatism features are discrete cosine transformed.Then three kinds of features are integrated and processed to construct feature matrix.A classifier is constructed based on ensemble learning algorithm combined with support vector machine and random forest,in which parameters of support vector machine and random forest are optimized and thresholds are selected.The feature matrix is input into the classifier to complete the prediction process of protein interaction.The forecasting method proposed in this paper enriches the sequence information,combines the extracted information features,and determines the classifier model based on ensemble learning algorithm.In order to verify that the model has excellent generalization,it also has good prediction performance in other protein data sets.The data are classified and predicted in the test set.The results also reflect that the prediction method proposed in this paper can achieve better prediction results than other methods for protein interaction prediction.
Keywords/Search Tags:Protein-Protein Interaction, Sequence Information Characteristics, Ensemble Learning Algorithm, Support Vector Machine, Random Forest
PDF Full Text Request
Related items