Research Of Protein Secondary Structure Prediction Based On Ensemble Learning

Posted on:2021-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:H L Liang

Full Text:PDF

GTID:2370330611465679

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The study of protein structure and function is one of the most important topics in modern bioinformatics and computational biology.Data mining and machine learning methods are often used to perform prediction or pattern recognition tasks and help in experimental analysis.In recent years,deep learning has been widely used in the field of sequence analysis,but it still has the problems of long training time and poor parallelism.The ensemble learning algorithm can not only save training time in a highly parallel manner,but also quickly improve the overall prediction accuracy of simple models.However,the direction of ensemble learning methods combined with neural networks is seldom studied.To this end,based on the ensemble methods including Bagging,Boosting and Stacking,and neural network CNN,this thesis studies the 8-state classification in protein secondary structure prediction.The main contributions of the thesis are as follows:(1)A hybrid model based on Bagging and CNN is proposed.The model replaces traditional simple classifiers such as SVM with deep CNNs,trains multiple deep CNNs in parallel and integrates their prediction results with relative majority voting,which effectively improves the prediction accuracy.Further,a new classifier coefficient calculation method and feature selection method are proposed to improve the overall prediction ability of the model.The experimental results show that the Bagging model using CNNs as homogeneous weak classifiers increases the accuracy of secondary structure prediction from 66% to 73% of a single CNN.(2)A hybrid model based on Boosting and CNN is proposed,which uses Adaboost as an instance of Boosting.The model treat multiple CNNs as homogeneous weak classifiers,while using the SAMME method for optimization.Furthermore,a hybrid model combining multiple Adaboost strong classifiers with the Bagging method is proposed.Experiments show that the algorithm can achieve a training accuracy of 97.00%,while the predicting accuracy reaching up to 77%.The accuracy of 74.29% can be achieved on the public data set CB513,exceeding the 70.3% state-of-the-art research.(3)A hybrid model based on Stacking and CNN is proposed.The algorithm divides the data set by the K-fold cross-validation method.The training process combines the characteristics of Bagging and Boosting.It can also overlay multiple layers of heterogeneous weak classifiers to improve the feature extraction ability of the model.Further,a partitioning method for dividing the original data set according to the length of the protein sequence is proposed in combination with the original hybrid model.Experiments show that the algorithm can further improve the prediction accuracy of heterogeneous weak classifiers.Using the sequence length division method combined with the Adaboost model,the accuracy of 76.71% can be achieved on the public data set Cull PDB,exceeding the highest 74.0% currently studied.

Keywords/Search Tags:

protein secondary structure, convolutional neural network, Bagging, Adaboost, Stacking

PDF Full Text Request

Related items

1	The Protein Secondary Structure Prediction Based On Convolutional Neural Network
2	Prediction Of Protein Secondary Structure Based On Scattering Convolutional Neural Network
3	Research And Application Of Protein Secondary Structure Classification Based On Neural Network
4	Research On The Method Of Rna Secondary Structure Prediction Based On Convolutional Neural Network
5	Research On RNA Secondary Structure Prediction Based On U-net Convolutional Neural Networks
6	Algorithm Research Of Protein Secondary Structure Prediction Based On Grouped Multi-Classifier
7	Application Of Deep Learning Algorithm In Protein Structure Prediction
8	Research On Protein Secondary Structure Prediction Based On Deep Learning Method
9	The Research On The Application Of Artificial Neural Network In The Predication Of The Secondary Structure Of Protein
10	Noise Tolerating Capability Of Bagging-based Neural Networks And Their Application