Font Size: a A A

Algorithm Research Of Protein Secondary Structure Prediction Based On Grouped Multi-Classifier

Posted on:2020-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2370330575987991Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein secondary structure prediction is an important topic for protein structure prediction and understanding of protein structure and function.The main work is to correctly identify the corresponding protein secondary structure from protein amino acid sequences with coding feature.In this paper,the 25 PDB protein sequence dataset was used,amino acid is processed into pseudo image by sliding widow with PSSM coding and orthogonal coding.Three types of training models selected in the research are CNN,LSTM neural network and random forest.A group experiment is performed for each type of model,in which experiment the training model is further optimized:In a group experiment based on CNN,a general CNN model consisting of two network structural units is designed.Each network element contains a convolutional layer and a downsampling layer,because the information of a pseudo-image is not sufficient for CNN compared to a real image.Therefore,this paper designs a residual convolutional neural network that can increase input redundancy and solve the gradient deviation of general CNN.Experiment results prove that this convolutional neural network is more stable and more accurate.In the group experiment based on LSTM neural network,experiments are performed on the general LSTM neural network with sequence data generated by slicing pseudo-image both on two dimensions.For the general LSTM neural network,direct slicing will destroy the context of protein amino acid sequence,this paper uses sliding window operation to generate multiple BP neural network hidden layers in the protein sequence dimension,and hidden layer neurons of which will be sequence data as input of LSTM network.Experiment results show that the LSTM neural network with BP neural network hidden layer can better extract the context relationship feature of protein sequences.In the random forest-based group experiment,the sample feature extracted by the residual convolutional neural network on the averagepooling is used as the input of the random forest,which is equivalent to add a feature extractor for the random forest.The experiment results show that the random forest prediction results with the feature extractor added will be greatly improved.After the end of the group experiments,this paper uses the ensemble method to integrate the three optimized models of group experiments,these models are residual convolutional neural network,the LSTM neural network with BP neural network hidden layer and the feature extractor added random forest.A method used for ensemble is to calculate sum of all member models outputs on each secondary structure label and then output the highest score secondary label corresponded.The experimental results show that the prediction results of the ensemble model is improved compared with the three types of member models.
Keywords/Search Tags:Protein secondary structure prediction, convolutional neural network, LSTM, random forest, Ensemble
PDF Full Text Request
Related items