Font Size: a A A

Semi-supervised Prediction Of Protein Interaction Site From Unlabeled Sample Information

Posted on:2020-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:C Q MeiFull Text:PDF
GTID:2370330578970426Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
The identification of protein interaction sites has an irreplaceable significance in drug design.However,in practice,since only a small fraction of protein interactions can be identified experimentally,most sites on the protein sequence cannot be defined as interface sites or non-interface sites,which will result in a lack of accuracy and generalization of the prediction of protein interaction sites.In this paper,the interaction sites are predicted mainly by unlabeled protein site information.In the data processing section,redundant protein chains were first deleted,and 91 protein chains were obtained by pretreatment for experiments.Then the residues were defined,and based on the evolutionary conservation of amino acids,five characteristics were extracted from HSSP database and Consurf Server: residue spatial sequence spectrum,residue sequence information entropy and relative entropy,residue sequence conserved weight and residual Base evolution rate.These five conservative features were fused and re-encoded and the resulting data set will be used in subsequent experiments.In the site prediction part,this paper makes full use of a large number of unlabeled samples,and proposes three semi-supervised support vector machine models to predict protein interaction sites.Firstly,combining the label mean and self-training ideas,a label-average self-training semi-supervised support vector machine(Means3vm-mkl)based on multi-core learning and a label-average self-training semi-supervised support vector machine(Means3vm-iter)based on iterative optimization are proposed.We then optimized the above model to prevent performance degradation using a save semi-supervised support vector machine(S4VM).From the final predictions,it can be concluded that the use of unlabeled samples greatly improves the accuracy of the prediction,which is an improvement of 12% in accuracy compared to the classification model using only labeled samples.Three semi-supervised SVM models can predict the interaction sites.Among them,S4 VM performs best,the correct rate reaches 70.7%,and the sensitivity and specificity are 62.67% and 78.72%,respectively.Compared with traditional experiments and calculation methods,the classification effect is greatly improved.
Keywords/Search Tags:Protein interaction site, Unlabeled samples, Conservative feature, Semi-supervised support vector machine, Multi-core learning, Iterative optimization
PDF Full Text Request
Related items