Font Size: a A A

The Sequence Analysis And Theoretical Prediction On Proteins Subcellular Location

Posted on:2005-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:F M LiFull Text:PDF
GTID:2120360125452941Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
The structure and function of a protein are closely correlated with its subcellular location. Based on the amino acid sequences of proteins, a new algorithm of the least increment of diversity and the finite coefficient is proposed to predict protein subcellular location. By choosing the different information parameters in the source of diversity, the subcellular locations of proteins are predicted by using of this method.This thesis comprises three chapters.The first chapter involves the analysis of database. The occurrence probabilities of single amino acid and adjacent residue-pair of the 12 kinds of subcellular sequences filtering from SWISS-PROT databank (2002) are calculated and compared each other. The results show that the occurrence probabilities of single amino acid and adjacent residue-pair exist obvious difference between the most different classes.The second chapter states theoretical method. The concepts of the measure of diversity, the least increment diversity and the finite coefficient of diversity are applied to predict subcellular location of a protein according to the characteristics of statistical results and the properties of the measure of diversity that can reveal total information of system. The proteins subcellular locations are predicted by using of the least increment of diversity with the data standardized transformation. In addition, the formulas for evaluating the performance of the prediction results are given.The third chapter gives the prediction of protein subcellular location. It mainly includes two sections. First, based on the finite coefficient of diversity, the subcellular locations for four kinds ofproteins are predicted. The source of diversity is composed of the occurrence number of 20 amino acids which had been standardized. The four kinds of main subcellular location proteins (Extralell,Cytoplasm, Nucleus, Plasma membrane) with 1824 proteins are predicted by using Self-consistencytest and Jackknife test respectively. The high rates of correct prediction are obtained. Second, based on the least increment of diversity, the twelve subcellular location proteins are predicted. The source of diversity is composed of the occurrence number of 20 amino acids, 400 adjacent residue-pairs and their total numbers, respectively. On the basis of the parameters standardized, the twelve subcellular locations (chloroplast, cytoplasm, cytoskeleton, endoplasmic, reticulum extracell, golgi apparatus, lysosome, mitochondria, nucleus, peroxisome, plasma membrane, vacuole) are predicted by means of Self-consistency test and Jackknife test, respectively. The results indicate that the correct prediction rate for the source of diversity with 400 adjacent residue-pairs is the best. If considering the combination of the increment of diversity, the result of prediction is further improved. Finally, these results are discussed.
Keywords/Search Tags:measure of diversity, least increment of diversity, finite coefficient of diversity, subcellular location
PDF Full Text Request
Related items