Font Size: a A A

Prediction Of Protein Subnuclear Localization Based On Different Features

Posted on:2020-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiFull Text:PDF
GTID:2370330578956452Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The cell nucleus is the most important organelle in eukaryotic cells.It is the center of gene replication,RNA transcription and the control center of cell activity.The function of protein is closely related to the position of protein in the cell.Therefore,it is important to identify nucleoprotein from a large number of proteins accurately and further subnuclear localization.In this paper,the dataset of nuclear and nonnuclear localization proteins was constructed based on the newly updated UniProtKB/Swiss-Prot database.Feature parameters of the N-terminal amino acid composition information,protein blocks dipeptide composition information,amino acid index information,protein-protein interaction information and gene ontology annotation information were selected.The dataset of nuclear and nonnuclear localization proteins was predicted by using the support vector machine algorithm.In the single feature information,the overall prediction accuracies of gene ontology annotation information and protein-protein interaction information are better,and the overall prediction accuracies are more than 80%.The feature information is filtered and merged,and the overall prediction accuracy is 89.11%in the 5-fold cross validation.The protein subnuclear localization data sets N1127 and N1044 were further constructed.Four kinds of feature information were selected:amino acid composition information,protein blocks dipeptide composition information,protein-protein interaction information and gene ontology annotation information.Constructed data sets were predicted by the support vector machine algorithm.In single feature information,the overall prediction accuracies of gene ontology annotation information and protein-protein interaction information are still better.The feature information is fused,and the optimal parameter combination is selected.It is found that when the four kinds of feature information are combined to achieve the best overall prediction accuracy,the overall prediction accuracies in the 5-fold cross validation are 69.40%and 74.46%,respectively.It is shown that better prediction results are obtained by selecting appropriate feature information,effective algorithm and combining the feature information.
Keywords/Search Tags:Nucleoprotein, Protein blocks, Gene ontology, Protein-protein interaction, Support vector machine
PDF Full Text Request
Related items