Font Size: a A A

The Statistical Analysis And Recognition Of Complex Super Secondary Structure βαβ Motifs In Proteins

Posted on:2014-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:L X SunFull Text:PDF
GTID:2250330422956369Subject:Physical Electronics
Abstract/Summary:PDF Full Text Request
In the rapid development high-throughput times of computer technology andinformation technology, with the increasing of known structure numbers, it has a higherefficiency and lower cost to predict the structure of the protein by theoretical calculationmethods. The prediction of complex super secondary structure is a key step for tertiarystructure research. βαβ motif is an important complex super secondary structure in proteins.Therefore, accurate prediction of βαβ motifs is not only very important to study tertiarystructure and function of protein, and it also has important guiding value to drug moleculesdeveloped and designed. The main study of this paper is to predict the complex supersecondary structure βαβ motifs. The introductions as follows step:(1)We predicted complex super secondary structure βαβ motif for the first time. Fromthe amino acid sequence of proteins, we obtained4442proteins whose resolution<3.0and sequence identity <25%, then we constructed3new βαβ motif datasets by using theDSSP and PROMTIF software, respectively.(2)We performed a statistical analysis on length and core structure of βαβ motifs andnon-βαβ motifs. The study objects were selected, whose loop-α-loop length is from10to26amino acids, and the ideal fixed-length pattern is32amino acids. The fixed-lengthpatterns were generated by using five ways to βαβ motifs and non-βαβ motifs, then weanalyzed amino acid conservative of each position.(3)By using the sequence information, predicted structure and function information toexpress the sequence characteristic, a Support Vector Machine algorithm for predictingβαβ motifs is proposed. Comparison of predictive results of the5cross-validation test andindependent test from three datasets, It can be found that the predictive results of5cross-validation test and independent test are both good in SET3dataset. The overallprediction accuracy of5-fold cross-validation and independent testing are85.9%and83.7%.The Matthew’s correlation coefficient are0.72and0.67. It is more conducive topredict βαβ motifs in the dataset which is fixed by DSSP and PROMTIF together. Inaddition, Support Vector Machine algorithm is an effective βαβ motif prediction methodbased on the optimization of the parameters.(4)Random Forest algorithm and Support Vector Machine algorithm were used to predict βαβ motifs when the same characteristic parameters and test method are used. Thepredictive result of Random Forest algorithm is better than the result of Support VectorMachine algorithm. Random Forest algorithm with many decision trees is an ensembleclassifier. The final classification result is decided by all the votes. The effect of RandomForest algorithm is better than the single classifier Support Vector Machine algorithm. Atthe same time, Random Forest algorithm has advantage of the high-dimensionalparameters. We used Random Forest algorithm to predict βαβ motifs further basedHydropathy component of position. The overall accuracy and Matthew’s correlationcoefficient of5-fold cross-validation achieved88.9%and0.78. It is a very effective meanto successfully predict βαβ motifs by using Random Forest algorithm.
Keywords/Search Tags:βαβ motif, Complex super secondary structure, Protein, Random forestalgorithm, Support vector machine, Structure prediction
PDF Full Text Request
Related items