Font Size: a A A

Research On The Feature Extraction Approach For Protein Secondary Structure Prediction

Posted on:2017-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y C WangFull Text:PDF
GTID:2310330491957957Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is not only a recently emerging discipline,but also a very promising domain in the future research work.The bioinformatics researches is a very important equipment especially in the development of medical science.Secondly,bioinformatics is a comprehensive and systematic discipline,and combines biology and the systems about biology.In essence,it induces the input information and its nature.It is necessary for researchers to analyze a large amount of data upon bioinformatics before we select the right direction.Currently,bioinformatics is rapidly developing,and the growth of DNA sequence and amino acid sequence is also very fast.This directly leads to more and more data to be analyzed.At this time,the legacy methods such as the experimental approach are difficulty to deal with such big data.Then,machine learning is introduced into this domain,since it can automatically identify pattern and classify from disordered data.This paper takes CB513 and RS126 as the data set,and uses the support vector machine as a classification method to extract the amino acid sequence,hydrophobicity,dipole moment and PSSM matrix as features.The main research work is as follows:First,we take the amino acid sequence as a feature,and set the sliding window to13.Then we find that the prediction results are not very well.Second,in order to improve the prediction accuracy,we add the hydrophobicity and dipole moment into the amino acid sequence.And then we find that the prediction accuracy is improved slightly.Third,based on the amino acid sequence,we use PSI-BLAST program and NR database to achieve PSSM matrix.And then we take PSSM matrix as the feature to predict protein secondary structure,and find that the prediction accuracy is greatly improved.Finally,the paper uses the grid search algorithm and genetic algorithm to optimize the parameters of support vector machine,the prediction accuracy has been further improved.The experimental results show that our method is more effective than the traditional method in the prediction of protein secondary structure.
Keywords/Search Tags:Protein secondary structure, Support vector machine(SVM), Extraction, Feature, Position specific scoring matrices
PDF Full Text Request
Related items