Font Size: a A A

Protein Secondary Structure Prediction By Using Deep Learning Model

Posted on:2019-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:F L ChangFull Text:PDF
GTID:2370330593451042Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The prediction of protein secondary structure can help to understand the tertiary structure of protein and further understand the biological function of protein and the interactions between protein molecules.A large number of computational biology research methods have sprung up around the protein secondary structure prediction problem.The methods include statistics methods and machine learning such as support vector machine,conditions with the airport,the bayesian method,etc.,and the deep learning method,such as restricted boltzmann machine,the convolutional neural network method and recurrent neural network method.Now protein secondary structure prediction research adopts artificial feature extraction,but it's difficult to capture protein sequence characteristics and complicated nonlinear relationship between the secondary institutions.In this paper,the model integrates with conditional random field and depth neural network.This model not only considers protein secondary structure adjacent residues dependency and dependency for a long range,but also describes the complex nonlinear relationship between the characteristics of the protein sequence and the secondary structure.Because the convolutional neural network is hard-coded,spatial structure is not enough to capture the protein sequence,especially for structure of long range interaction.In order to realize the long range of protein interaction modeling,this experiment combines with improved recurrent automatic encoder to achieve.Using the auto encoder,convolutional neural networks,bidirectional recurrent neural network structure to get the high-rise characteristic information of sequence,and then input to conditions with the CRF classifier to predict protein secondary structure.This experiment adopts CB513 data sets and Cull PDB public data sets,uses PSSM scoring matrix to feature extraction,and compares the advantages and disadvantages of other feature extraction method.Through the experiment,this model can implement Q8 accuracy of 72.5%on the Cull PDB data set,and Q8 accuracy of 67.5%on the CB513 data set.The experimental results show that compared with the traditional statistics and machine learning method,this model can obtain good effect by using the neural network structure integrated with CRF to realize the complex nonlinear relation between sequence and secondary structure.
Keywords/Search Tags:Protein Secondary Structure Prediction, Deep Learning, GRU, Auto Encoder
PDF Full Text Request
Related items