Research On Ensemble Learning And Deep Learning In Membrane Protein Type Prediction

Posted on:2021-06-30

Degree:Master

Type:Thesis

Country:China

Candidate:M Y Li

Full Text:PDF

GTID:2504306197455624

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the advent of the post-genome era,membrane protein type prediction has become a research hotspot as an important part of proteomics.In the research of prediction of membrane protein type,in the face of increasingly massive data,membrane protein type prediction by traditional methods such as biological experiments are no longer applicable.Starting from the feature expression of the data,this study takes machine learning as the basis and transform the membrane protein sequence into a feature vector that can be input into the machine learning algorithm,what’s more,achieving significant performance by using various prediction model and ensemble learning.The main contents of this paper include feature extraction,highefficiency feature using,selection of ensemble strategy,construction of deep learning model,feature fusion,etc.,and we achieved better performance than existing methods.The specific research contents are listed as follows:1.Given a protein sequence,we can know its amino acid composition information and evolution information.Based on these two kinds of information,we can extract the features of protein and get the feature expression that can be input into machine learning algorithm.The main methods of feature extraction are amino acid composition(AAC),dipeptide composition(DIPC)and position-specific scoring matrix(PSSM).Among them,the position-specific scoring matrix is a powerful feature extraction method,but we have to postprocess it so that it can be input to traditional machine learning methods because of the particularity of its dimensions.There is no doubt that these postprocess will lose information in a way.In this paper,the deep learning model is combined with the position-specific scoring matrix to achieve good predictive performance without destroying the original position-specific scoring matrix.2.For a protein sequence,the information it contains is extremely complex.Amino acid position,sequence length and long-distance dependence of some amino acids will all determine the properties of membrane proteins and thus determine their class.Individual classifiers and single criterion cannot accurately capture the rich intrinsic information in the membrane protein sequence,thus affecting the performance of the prediction.Ensemble learning can “learn from others” and combine different classifiers to bring great improvements to the prediction results.How to combine classifiers and the choice of ensemble strategies is critical.We experimented with various ensemble strategies,including multiplication strategy,maximum strategy,linear weighting,exponential weighting,logarithmic weighting and stacking.3.The membrane protein in the membrane protein dataset is expressed by a sequence which consisting of a sequence of 20 amino acid letters.Each membrane protein sequence has a length ranging from tens to thousands,and the distribution of length is not uniform.Traditional machine learning algorithms unable to capture the relationship between long distance dependencies and internal in such long sequences.We introduce recurrent neural network to directly input the original position-specific scoring matrix into the deep learning model that we constructed,thus preserving the original structure of the position-specific scoring matrix without causing loss of information.This efficient use of position-specific scoring matrix allows us to obtain very good prediction performance with only one feature extraction method.

Keywords/Search Tags:

membrane protein type prediction, evolution information, high utilization of feature, ensemble learning, recurrent neural network

PDF Full Text Request

Related items

1	Research On Medical Image Processing Methods Based On Tensor Neural Network And Ensemble Learning Prediction Model
2	Method Research For The Prediction Of Drugâ€™s Side Effect Based On Information Integration
3	Recognition And Classification Of ECG Signals Based On Feature Extraction And Neural Network Ensemble
4	Research On Prediction Methods Based On Recurrent Neural Network And Their Applications
5	Research On Application Of Deep Learning Method In Dose Prediction Distribution Of Radiation Therapy
6	Research On Ensemble Of Multi-scale Fine-Tuning Convolutional Neural Network For Recognition Of Benign And Malignant Thyroid Nodules
7	Research On Multi-scale Diagnosis Prediction Based On Multi-dimensional Attribute Exploration Of Deep Learning
8	Study On Algorithms Of Upgrading Dimension And Ensemble Learning For Diabetes Prediction
9	Prediction Of Protein-drug Binding Affinity Based On Deep Learning Techniques
10	Multi-source Transfer Learning With Graph Neural Network For Excellent Modelling The Bioactivities Of Ligands Targeting Orphan G Protein-coupled Receptors