Font Size: a A A

Prediction Of Protein Tertiary Structural Classes Based On Predicted Secondary Structure

Posted on:2017-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:F L KongFull Text:PDF
GTID:2310330488468642Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As humans are coming into the post-genome era, the main research content of bioinformatics has been changed. The prediction of protein tertiary structural classes becomes a new hot research topic in bioinformatics. Protein is a kind of biological macromolecules which has been widely studied, its function and structure has very close relationship, so the determination of protein tertiary structure is the important research content in biology. Protein tertiary structure prediction research has great effect in protein function, protein localization in tissue, drug design, etc. Currently, the experimental method for determination of protein tertiary structure is still severely limited. Failed to obtain protein tertiary structure is a bottleneck for further understanding of protein function and development of protein-related industries. Recently, researchers continuously proposed new prediction method to solve this problem and the prediction accuracy of protein tertiary structure is improved. High similarity data sets can achieve high accuracy, but the prediction accuracy of the low similarity data sets is not ideal. In recent years, machine learning is developing very rapidly, this paper proposed machine learning methods to solve the problem of protein tertiary structure prediction on the low similarity data sets.In this paper, three aspects of improvements of the experiments are carried out which are submitting better feature extraction methods, building a more rational classification model and adopting ensemble learning, that finally improved the prediction precision:.The most commonly used feature extraction methods are based on the amino acid sequence, but in this paper, we convert amino acid sequence to secondary structure sequence and simplify the representation of the feature. According to previous experimental results, the classification results between ?/? and ?+? is not good, to solve the problem the new feature according to the different biological characteristics of these two types is proposed.In this paper, two different structure of hierarchical classification model was built. In order to verify the performance of the two models, this paper chooses three low similarity data sets: 640 data sets, 25 PDB data sets and 1189 data sets. Flexible neural tree was selected as the base classifier, eventually, the prediction results of two experiments is obtained. Accordingly, the better effect of hierarchical classification model can be chosen.On the basis of previous experiment, ensemble learning was chosen to design a new experiment. Flexible neural tree, support vector machines and artificial neural network was chosen as the base classifier, two different feature extraction methods based on the amino acid sequence and secondary structure was chosen to build two different sets of feature vectors. Eventually, five base classifiers, which are different, configured to implement the ensemble learning.Compared with the results in other literature, the prediction accuracy is improved by 5.57%, 4.53% and 2.16%, respectively, in three data sets. The experimental results proved that the proposed method in this paper improved the accuracy of the low similarity data sets in protein tertiary structure prediction problems, so the proposed method was more feasible and effective.
Keywords/Search Tags:prediction of protein tertiary structural classes, protein secondary structure, hierarchical classification, ensemble learning
PDF Full Text Request
Related items