Font Size: a A A

Protein Subcellular Localization Prediction Based On Feature Hidden Correlation Modeling

Posted on:2018-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhouFull Text:PDF
GTID:2370330596489134Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Protein subcellular localization is crucial for understanding protein functions,regulation mechanisms and protein-protein interactions.However,it is often laborious and costly to identify a protein's cellular compartment using wet-lab experiments,thus in-silico prediction tools are highly desired when working with large scale data sets of proteins with unknown locations.A key step in these predictors is encoding the amino acid sequences into feature vectors.Many studies have shown that features extracted from biological domains,such as gene ontology and functional domains,can be very useful for improving the prediction accuracy.However,domain knowledge usually results in redundant features and high-dimensional feature spaces,which may degenerate the performance of machine learning models.In additional,Proteins may simultaneously exist at two or more different subcellular localizations,but there are few effective methods to predict multiple subcellular localization proteins.In this paper,we propose a new amino acid sequence-based human protein subcellular location prediction approach Hum-mPLoc 3.0,which covers 12 human subcellular localizations.The sequences are represented by multi-view complementary features,i.e.,context vocabulary annotation-based gene ontology(GO)terms,peptide-based functional domains,and residue-based statistical features.To systematically reflect the structural hierarchy of the domain knowledge bases,we propose a novel feature representation protocol denoted as HCM(Hidden Correlation Modeling),which will create more compact and discriminative feature vectors by modeling the hidden correlations between annotation terms.We compared the performance of the proposed method with other predictors on four datasets.The result shows that our method is outperform than others.A large-scale application of Hum-mPLoc 3.0 on the whole human proteome in Swiss-Prot reveals proteins co-localization preferences in the cell.
Keywords/Search Tags:subcellular localization, multi-label, correlation, Gene Ontology, Machine Learning
PDF Full Text Request
Related items