Font Size: a A A

Study On Coding Schemes Of Protein Secondary Structure Prediction Based On Support Vector Machines

Posted on:2012-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:X J YeFull Text:PDF
GTID:2210330368975187Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The significance of study on protein structure is great. Analyzing the protein structure, function and its relationship is an important component of the later genome project. Theory of protein structure prediction is becoming more and more esteemed as the traditional experimental determination method is not only complex but also time-consuming. Because of the difficulty of prediction the protein structure directly, protein secondary structure prediction is the bridge of prediction tertiary structure from the primary structure. The use of its prediction methods and coding schemes are the two important factors for the predict accuracy. This thesis mainly studies the influence of the code schemes for the protein secondary structure prediction, and we suggests a new comprehensive coding scheme for single-sequence protein secondary structure prediction.Firstly, this thesis introduces the research background and meaning of protein secondary structure prediction, and studies some relevant basic background knowledge, including the bioinformatics knowledge, the molecular composition and structure classification of protein, the prediction methods for protein structure prediction especially secondary structure prediction and its evaluation method.Then, the thesis studies Support Vector Machines approach. As protein Secondary Structure Prediction is a pattern recognition classification problem, and SVM has many special advantages in solving small sample, nonlinear and high dimensional pattern recognition, so we use this approach for the research. Then we study the application of SVM in high dimensional pattern recognition problem, and we study personal credit evaluation as an example.Finally, the thesis discusses the characteristics and defects of some current Coding Schemes. We suggest a new comprehensive coding scheme for single-sequence of noncognate or low homologous protein secondary structure prediction. The method regards not only the trending factor of every amino acid appearance in protein secondary structure, but also the value of amino acid hydrophobicity, and binary form is used to express all kinds of amino acid. Then we used different code schemes to do the prediction. The results shows that the new coding scheme make full use of the protein level structure information and is more suitable for non-homologous or lower homologous protein structure prediction.
Keywords/Search Tags:Coding schemes, Protein, SVM, secondary structure prediction
PDF Full Text Request
Related items