Font Size: a A A

The Research On Prediction Of Protein Subcellular Location Using Multi-information Fusion Based On Sequence

Posted on:2012-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:J B JiangFull Text:PDF
GTID:2230330395485386Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein subcellular localization highly correlates with the function of protein.Studying the mechanism and principle of protein sorting and predicting proteinsubcellular localizations can provide useful insights about their functions, as well ashow they interact with each other. The traditional biochemical experiments canpredict the protein subcellular localization but it is fairly time-consuming andexpensive. Therefore, it is in high demand to develop an automated method that canpredict the subcellular localization of a protein rapidly and accurately. Focused onthis topic, this thesis makes intensive studies of protein sequence encoding, developsclassification algorithms, and takes some testing and analysis for these algorithmsbased on some different datasets. The main work and innovations of this thesis aresummarized as follows:The paper introduces a new approach to encode the protein sequence based onthe physicochemical properties of amino acid residues. A combined feature of primarysequence defined as a146D (dimensional) vector is utilized to represent a proteinincluding20amino acid compositions and126adjacent triune residues contents. Toevaluate the prediction performance of this encoding scheme, a re-substitution testand a jackknife test based on the support vector machine algorithm were employed onthree datasets. The satisfactory results indicate that our method is an alternative wayfor predicting protein subcellular localization.In order to get more the information of structures and functions in the proteinsequences, the paper proposes a novel representation for protein sequences usingposition weight amino acid composition, dipeptide composition and amino acidrefractivity correlation coefficients. It attempts to make some amino acid positioninformation, local order information and the long distance interactions betweenresidues along the sequence involved. Then we use the support vector machinealgorithm and the nearest neighbor algorithm to predict subcellular location ofapoptosis protein and gram-negative bacteria protein respectively. Both of these twomethods achieve higher predictive success rates by the jackknife tests.
Keywords/Search Tags:Protein subcellular localization, Protein function, Feature extraction, Support vector machine, Nearest neighbor algorithm
PDF Full Text Request
Related items