Font Size: a A A

Research On Protein Secondary Structure Prediction Based On Word Frequency Statistics Coding And Manifold Learning

Posted on:2015-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q LiuFull Text:PDF
GTID:2180330452994237Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In bioinformatics, protein structure prediction is not completely solved by researchers. Inthe study of protein structure prediction, protein secondary structure prediction is one of the mostimportant research content. In molecular biology, if protein secondary structure can besuccessful predicted, not only people can predict the tertiary structure of protein molecules moreaccurately, but also be great significance to analyze protein sequences and winding structure, andconfirm the protein molecule function. This paper includes the following contents about theprotein secondary structure prediction:1、Comparing and analyzing the prediction result of different amino acid coding methods. Inthis paper, I have compared the results of prediction with21coding, five coding and Profilecoding respectively. Through analyzing amino acid sequence with sliding window method, I putforward a new amino acids coding method——based on word frequency statistics coding. Atthe same time, different classification algorithms were carried out on the three data sets with fourdifferent coding methods, it is concluded that the coding method based on word frequencystatistics finally has the highest prediction accuracy, and it can reach80%~90%, much higherthan the other three coding ways.2、Establishing a protein secondary structure prediction model based on manifold learning,it can be described that firstly using manifold learning methods for feature extraction, and thenusing different classification algorithms for secondary structure prediction. I used three differentdimension reduction methods——Isomap, LE, LLE on three groups of data sets for dimensionreduction, and finally found that LE dimension reduction method is most suitable for featureextraction of proteins.3、Verifying the performance of the prediction method based on manifold learning on threedata sets. In the experiment, firstly, using LE algorithm to make the high-dimensional data setsmap to a lower dimensional space, and then use SVM, NB, BP neural network and K neighborclassification method to predict protein secondary structure, the experimental results indicate thatusing SVM can get the best predict results. Similarly, after the dimension reduction, based onword frequency statistics coding can get a accuracy significantly higher than the other three coding methods. At the same time, the prediction method to improve the execution efficiencygreatly.
Keywords/Search Tags:Protein Secondary Structure Prediction, Word Frequency Statistics, Manifold Learning, Support Vector Machine (SVM)
PDF Full Text Request
Related items