Font Size: a A A

A Study On The Prediction Of The Subnuclear Location Of Nuclear Proteins And The Subcellular Location Of Plant, Non-Plant And Mouse Proteins

Posted on:2008-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:F M LiFull Text:PDF
GTID:1100360245487031Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Although a large number of protein sequences have been known along with the success of the human genome project, their functions have not been completely annotated. The function of the protein is closely correlated with its subcellular location. So, studying the subcellular location can provide an important clue for understanding the function of the protein.In this dissertation, based on the selection of different information parameters in protein primary structure for nuclear protein, plant and non-plant protein as well as mouse membrane protein, the subcellular locations of these proteins are predicted by using of the increment of diversity (ID) and the increment of diversity combined with the covariant discriminant algorithm (CDA) as well as the support vector machine (SVM). The main contributions are summarized as follows:(1) By selecting the amino acid composition and amino acid hydropathy dipeptide composition in protein sequence as the information parameters of the diversity measure, the subnuclear localizations of nuclear proteins are predicted by using diversity coefficient (DC) methods. Moreover, when the 1-gap dipeptide composition and the 2-gap dipeptide composition on the N-terminal region in protein sequence are selected as the information parameters of the diversity measure, an algorithm combined the increment of diversity with covariant discriminant (ID_CDA) for predicting the subnuclear localizations of nuclear proteins are proposed. The subnuclear locations of nuclear proteins are predicted by using above method. The overall accuracies of prediction are 75.4% for single localization proteins in the jackknife test, and 80.4% for an independent set of multi-localization proteins. The results are 8.9% higher than Lei's SVM methods for single localization proteins, and 15.2% higher than Lei's SVM methods for multi-localization proteins, respectively. This method is also applied to predict the subnuclear localizations of nuclear proteins with <25% sequence identity. The higher overall accuracy of prediction is obtained.(2) A new ID_SVM approach combined the increment of diversity (ID) with the support vector machine (SVM) by using of amino acid compositions (AA) and pseudo amino acid compositions (PseAA) is proposed and the subcellular locations of plant and non-plant proteins are predicted by using of ID_SVM method. The overall predictive accuracies are 88.3% and 92.4% for the eukaryotic plant and non-plant proteins in jackknife tests. The results show that the overall predicted successful rates of ID_SVM module are higher than other methods.The predictive results of ID are compared with that of ID_SVM for using single parameter. And the hybrid parameters are used in ID_SVM. The results show that the higher accuracy of prediction can be obtained by using properly hybrid parameters.(3) Based on the amino acid composition and amino acid dipeptide composition, the subcellular locations of 12 classes eukaryotic proteins are predicted by using ID algorithm. The higher predictive success rates are achieved when putting hybrid parameters into diversity source.(4) Mouse protein database and mouse membrane protein database are constructed. By choosing the 1-gap dipeptide composition along the whole protein sequence and the dipeptide composition on the N-terminal regions, the ID_CDA method is also applied to predict mouse proteins and mouse membrane proteins. The better results are gained.(5) The subcellular location of proteins for Gram-negative bacteria are investigated by selecting amino acid composition, amino acid dipeptide composition, amino acid hydropathy distribution along the whole protein sequence as well as amino acid dipeptide composition on N and C terminal regions in ID method. The predictive results for using single information parameter and hybrid information parameters are discussed.
Keywords/Search Tags:Subcellular location, Subnuclear location, Increment of diversity, Covariant discriminant algorithm, Support vector machine, Mouse membrane protein, Eukaryotic protein, Prokaryotic protein
PDF Full Text Request
Related items