Font Size: a A A

The Research On The Protein Subcellular Localization And Protein Interactions Based On The Sequence Coding Method

Posted on:2014-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:L C WangFull Text:PDF
GTID:2250330425984217Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The e xplosive growth of bioinfor ma tics and the s mooth i mple me nta tion ofhuma n ge nome project has greatly promoted the developme nt of the life science. Asthe mai n underta ker of the life activity, protein is colsely related to various lifeactivitiesm, beca use any life activities and interactions be tween proteins areinseparable, and the interactions of proteins is closely linked to the subcellularlocalization in the protei ns, therefore, research on the related area of the protein isvery significant. In addition, the prediction of the protein s ubcellular localizationand the research on the protein interactions will help the researchers furtherunderstand the mecha nis m of the life activities. Generally spea king, the research ofthe protein related function mainl y involves two steps, one is to extract the protei nseque nce features, and the other one is to select the classification algorithm or theclassification model. Therefore, this paper will pri marily conce ntrate on theresearchi ng of the protein seque nce coding algorithm, the n apply it in the proteinsubcellular localization a nd protein-protein interactions. The major innovations ofthis paper are presented as follows:(1) A new protein seque nce coding method is proposed based on the pseudoami no acid composition. This me thod not only retains the infor mation of a mino acidcomposition, but also introduces the location infor mation of the a mi no acid residuesin the protein sequence, besides, it also ta kes the physic-che mical properties ofami no acid residues a nd the correlation between the a mi no acid residues in theseque nce into consideration. By this method, the protein seque nce featureinfor mation which is closely related to the protein subcellular localization can besuccessfully e xtracted and converted to a numerical eigenvector. The n this paperselect two typical datasets as the training and testi ng sets, and select the K nearestneighbor classification algorithm as the classifier for sa mple training and testi ng. Inthe process of the experi me nt, compared with other existing methods, the resultsshow that our method has better prediction performance.(2) In this paper, ma ny factors that affecting the protein-protein i nteraction arecompre hensively considered, and a new sequence coding method based on thefusion features is proposed. This me thod contains the sequence c haracteristicinfor mation. And in order to get the order infor mation of a mino acid residues in theseque nce, the triplet coding method is introduced in this paper, but this method ma kes the di mension of the features vector become hi gher. So as to reduce thedi me nsion of the features vector,20a mino acids are divided into7classesaccording to the physic-che mical properties of ami no acid in the paper. Taking thecorrelation characteristics between a mino acids which is closely related to theprotein interactions into co nsideration, a new method of a utocorrelation featurecoding is introduced. Finally, in order to evaluate the predictive perfor ma nce of thecoding method, this paper select three different da tasets, and using s upport vectormac hine as the classification algorithm for sample training a nd predicting. Theexperime nt s hows that the proposed method has a good perfor ma nce, and i ncomparing with other existi ng algorithms, the proposed method still have certainadvantages.
Keywords/Search Tags:Subcellular Localization, Protein Interaction Prediction, Pseudo Ami noAcid Composition, K Nearest Neighbor, Support Vector Machine
PDF Full Text Request
Related items