Font Size: a A A

The Prediction Of Secondary Structure Of Protein Based On Deep Learning

Posted on:2019-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:S X XieFull Text:PDF
GTID:2310330542973649Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The prediction of secondary structure of protein is an important research field in bioinformatics.With the development of artificial intelligence,many researchers have begun to make prediction of secondary structure of protein using machine learning.Although we have achieved some satisfying results,further improvement is needed.In this paper,we employed three methods,namely,fuzzy support vector machine,convolutional neural network?CNN?combined with FSVM,CNN combined with Long Short-Term Memory,to predict the secondary structure of protein.?1?Prediction of the secondary structure of protein by fuzzy support vector machine?FSVM?.Firstly,it constructs two initial hyper planes,which involves an iterative process to locate class centers and the approximate hyper plane based on the initial hyper planes with an iterative process in the feature space;then,the membership values of samples in the training set are assigned according to the distances between each sample to the approximate hyper plane;Finally,a FSVM based on feature space is trained based on the training set.Besides,our method also exploits information on sequence-based structural similarity.In four datasets?e.g.RS126,CB513,data1199 and CASP?our method achieves 94.2%,93.1%,96.7%and92.1%Q3 accuracy and 91.7%,89.7%,94.1%and 89.6%SOV values,respectively.?2?Prediction of the secondary structure of protein by CNN combined with FSVM.Firstly,we transform the vector features of protein into matrix features;then,some feature representations of protein are extracted from the original features by CNN;finally,based on the features from CNN,we train a FSVM classifier and make the prediction on test sets.In four datasets?e.g.RS126,CB513,data1199 and CASP?our method achieves 94.3%,93.8%,97.1%and 92.7%Q3 accuracy and 92.5%,90.4%,94.5%and 90.2%SOV values,respectively.?3?Prediction of the secondary structure of protein by CNN combined with LSTM.Since CNN are shift invariance,we firstly use multiple kernels of different sizes to extract local features;then,considering the long term dependence between the residues in a protein sequences,we use bidirectional LSTM to extract the global features;Finally,the local features and the global features are combined to form the final feature,and the soft-max classifier is used to predict the secondary structure of protein.In four datasets?e.g.RS126,CB513,data1199 and CASP?our method achieves 94.5%,94.2%,97.2%and 93.5%Q3 accuracy and 92.2%,90.3%,94.8%and 90.2%SOV values,respectively.Experimental results show that the three methods achieve high accuracy in the prediction of secondary structure of protein.Finally,this paper analyzes the shortcomings of methods mentioned above and proposes the further research direction.
Keywords/Search Tags:Secondary structure of protein, FSVM, CNN_FSVM, CNN_LSTM, Sequence-based structural similarity
PDF Full Text Request
Related items