Font Size: a A A

Incorporating Secondary Features Into The General Form Of Chou’s PseAAC For Predicting Protein Structural Class

Posted on:2013-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q L XiangFull Text:PDF
GTID:2230330395985225Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Proteins in the life activity has the irreplaceable function,so the study of proteinsare becoming more and more important. The protein structure class prediction ofproteins plays an important role in many related attribute predictions, such as: proteinsubcellular localization, membrane protein types, G protein couplet type, enzymefamily classes, protein four structure types, enzyme activity and so on. Therefore,protein structure class research is very important in molecular biology.As the most primitive data, protein sequence data contains many latentinformation what is very favorable for proteins relevant research, so the proteinsequence information mining and the research of protein structure, function,interaction and subcellular localization based on sequence information attractedresearchers attention in areas.Based on the basic properties of protein, this paper in-depth researched andanalyzed the present situation of protein structure class prediction. And discussed thefour respects of the predictive model: the first was how to construct the data set; inthe part of feature extraction, the features based on proteins sequence, includingamino acid composition, coupling assembly, pseudo amino acid composition and so on were analyzed; the classification model which is used usually were studied; andthen analyzed and compared different evaluation indicators.Then based on the previous analysis, this paper presented a new sequence featureextraction method. This method based on Chou-Fasman parameter data; containamino acid composition, amino acid hydrophobic properties, polarity and part ofPair-Coupled Amino Acid Composition data. This method can greatly reduce the dataredundancy, avoids negative effect of some data to the results. After got the features,using SVM model, using the leave-one-out method, predicted the structure class of aprotein set which contain639proteins; the results proved the validity of the method.
Keywords/Search Tags:Protein structure class prediction, Feature extraction, Chou-Fasman Data, Support vector machine(SVM)
PDF Full Text Request
Related items