Font Size: a A A

Protein Subcellular Localization Prediction Based The Fusion Characteristics

Posted on:2013-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:S X LiuFull Text:PDF
GTID:2230330395485125Subject:Electronics and communications engineering
Abstract/Summary:PDF Full Text Request
People have entered post genome era from genome era after the accomplishment of Human gene mapping. One of the most remarkable features in this age is the explosive increase in volume of protein sequence data. Through bioinformatics study, it is found that the location(organelles) of a protein in a cell is closely correlated with its biological function. Thus it is highly desired to develop automated methods for efficiently identifying various attributes of uncharacterized proteins according to the subcellular localization.Focused on this topic, this thesis makes intensive studies of protein sequence encoding, develops classification algorithms, makes some testing and verifies analysis for these algorithms based on some different datasets. The main work and innovations of this thesis are summarized as follows:In order to get more global and local information in the protein sequences, the paper proposes a novel representation for protein sequences which includes three parts of features that are amino acid composition, compression tripeptide composition and local frequency domain values. It contains some amino acid composition information, local order information and the long distance interactions between residues(including indirect adjacent residues) along the sequence involved. To evaluate performance of this encoding scheme, the characteristics and parameters related to them of this paper were analyzed and proved.To evaluate the prediction performance of this encoding scheme, a jackknife test based on the support vector machine algorithm were employed on two datasets. The experimental results show that features extracted by our method include enough location information and which is an alternative way for predicting protein subcellular localization. At the end of the experiment, this paper use support vector machine and the nearest neighbor algorithm to predict subcellular localization of two benchmark data sets and compare the prediction results and evaluation index values with other methods. Comparison results prove that this method is feasibility and superiority.
Keywords/Search Tags:Sequence feature extraction, Protein subcellular localization, Compression tripeptide composition, Local frequency domain values, Support vector machine, Nearest neighbor algorithm
PDF Full Text Request
Related items