Font Size: a A A

The Machine Learning Model Of Protein Structural Prediction Based On Protein Sequence

Posted on:2016-06-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:L C ZhangFull Text:PDF
GTID:1220330473458054Subject:Genetics
Abstract/Summary:PDF Full Text Request
It is a challenge to analysis the structure and biology function of protein from the incredible mass of protein sequence information in post-genome era. Protein structural class, which expresses the whole folding structural pattern, is an important information source to explain the structure and function of protein and provide theoretical evidence for the development of relative biotechnology. However, the traditional biological experiment for identifying the protein structural class is of high cost and long cycle. It is urgently to develop an effective model to predict the protein structural class by using mathematics and computer technology to compensate for the limitation of biological experiment. In this study, we focus on the problem of protein structural class prediction, in which the feature representation method is studied based on the theories and methods of statistic pattern recognition. The main contributions are as follows:(1) A secondary structural information based feature representation is proposed. Based on predicted secondary structure sequences, the features are designed from three aspects (the content and order of the secondary structure elements as well as the distance between the secondary structure elements) to reflect the content and spatial arrangement of secondary structure elements. Especially, some distance-related features are proposed. Experimental results on four low-similarity benchmark datasets show that the method could effectively reflect the spatial arrangement of secondary structure. The overall accuracies and accuracies of a/β and a+β classes are improved obviously. This conforms that the model is an effective feature representation method. In addition,9 features are defined based on the rare secondary structure to study the influence of rare secondary structure information on predicting protein structural class. Experimental results show that the rare secondary structure information could improve the prediction performance.(2) A protein sequence evolutionary information based feature representation method is proposed. The protein sequence evolutionary information reflects the conservative property of acid amino residues in a given protein sequence, which is significant to reveal the protein structure and function. Based on the position specific scoring matrix obtained from PSI-BLAST, five evolutionary difference formulas are proposed to extract protein evolutionary features. Experimental results on two low-similarity benchmark datasets show that the proposed methods are effective.In addition, the fusion protein structural class prediction model based on secondary structure and evolutionary information are proposed. Experimental results show that the effective feature fusion approach could improve the prediction performance compared with the single-source sequence feature. The method provides new idea of dealing with protein structure class prediction problem from the view of multi-source information fusion.
Keywords/Search Tags:Protein structural class, Secondary structure, Position specific scoring matrix, Support vector machine
PDF Full Text Request
Related items