Font Size: a A A

Protein Tertiary Structure Prediction Based On The Integration Of Multiple Classification Model

Posted on:2016-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:W Z BaoFull Text:PDF
GTID:2180330464969115Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently, humans have started off the post-genomics era, which mainly focus on analyzing and finding the solution to the issues on bioinformatics and proteomics. The prediction of protein structure may be a hot topic, in the field of proteomics and bioinformatics. With the gradually maturing application of machine learning methods being taken advantage of solving the issue in the field of bioinformatics, pure biology experiments will be replaced by the methods of computational intelligence. The key to solving such problems should take into account the following aspects: firstly, finding out the key information of protein characteristics. Secondly, building the classification model based on corresponding biological principle. Last but not least, selecting appropriate ensemble strategies is an important aspect. In this paper, a novel tree structure classification model,which is based on the method on ensemble strategies to predict protein tertiary structure, is put forward.During the research, improving statistical information of protein sequences, combining the characteristics of properties of amino acids aim at putting forward the generalized polypeptide correlation coefficient. According to hypothesis about the melting ball state, the features of hydrophobic model and secondary structure propensity are also employed in the experiment. In the classification model, the special nature of biological significance and problems play a key role, a double-stage model, which firstly solve the problem of three types division and then solve the problem of double types division, is put forward. Moreover, the method of multiple classifiers and feature integrated group combined voting are adopted in each classification nodes. In order to validate the corresponding results, the four classical datasets, including ASTRAL, C204, 640 and 1189, are employed. Such four datasets contain different homologous proteins. On the other hand, a comparison experiment, which taking advantage of one-vs-all single stage model, should also be adopted. Taking into account the biological information contained in a number of redundant information. Reducing sizes of the feature group can decrease the running time of each classifiers.This paper has creatively employed flexible neural tree(FNT), neural network optimized with Particle Swarm algorithm(ANN+PSO) and support vector machine(SVM) as basedclassifiers in a novel tree structure classification model. Taking advantage of the variety of the classifiers including ANN+PSO, SVM and FNT. And selecting available voting strategy in ensemble learning among different characteristics and the corresponding classifiers may aim at improving the efficiency of the model. This method can effectively overcome the shortcomings of each classifiers. According the scale of features, Pearson Correlation Coefficient has be used for reducing redundant data. It is found that those selected features and the novel classification model for protein tertiary structure predicting are available.
Keywords/Search Tags:Protein tertiary structure prediction, Flexible Neural Tree, Hierarchy classification, Ensemble strategy
PDF Full Text Request
Related items