Font Size: a A A

Research On Predicition Of Antifreeze Protein Using W-GDipC And LRMR-Ri Methods

Posted on:2020-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:L DengFull Text:PDF
GTID:2370330575489309Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The empirical study about Antifreeze proteins show that the application prospect is broad.With the advent of the post-genome era,the protein sequence data collected in various databases has become more and more perfect,which has promoted the development of bioinformatics.At present,many research groups are devoted to the study of biological sequence extraction algorithms,feature selection and classification algorithms,and successfully applied them to the classification and prediction of protein structure and functional spectrum,but little research has been done in the field of antifreeze proteins.Based on this,we mainly conducts in-depth research on the characteristic expression patterns and feature selection methods of antifreeze proteins,and demonstrates them from various aspects.The main work of this paper are summarized as follows:First,the characteristic expression of antifreeze proteins.Firstly,based on the study of selected background antifreeze protein sequences,we proposes an effective improved feature extraction method,namely weighted generalized dipeptide composition(W-GDipC),which expresses two characteristic features by linear weighted fusion-generalized dipeptide Composition(GDipC)and dipeptide composition(DipC).Thirdly,the paper also discusses the fusion coefficients in the weighted fusion expression.The fusion coefficient ranges from 0-1,increasing by ten percentage points each time.Finally,we constructs support vector machine(SVM)and decision tree(DT)with different characteristics and stochastic gradient descent(SGD)classifier and weighted generalized dipeptide composition(W-GDipC)feature extraction method through five-fold cross-validation.Conduct a comparative experiment demonstration.Feature selection for antifreeze proteins.Firstly,we introduces the feature selection algorithms Lasso,Ridge,Mutual Information and Maximum Information Coefficient(Mic)and Filter Selection(Relief)in four common machine learning methods,respectively.Feature representation is processed.Secondly,we further proposes an improved feature selection method:ensembled feature selection(LRMR-Ri)method based on ridge regression.Finally,in this paper,the improved feature selection method and the original feature selection method are tested on the antifreeze protein dataset(binary classification)and the membrane protein dataset(multi-classification)using different classification algorithms,and based on five evaluation indicators.The validity of the method was verified objectively by a five-fold cross test.The final experimental results show that the proposed weighted generalized dipeptide composition method not only retains the important characteristics of the two single features,but also enriches the characteristic expression of the antifreeze protein sequence.Secondly,the ensembled feature selection method based on ridge regression proposed in we can avoid the generation of local optimal or sub-optimal feature subsets to a certain extent,screen out redundant features to a greater extent,and more effective antifreeze proteins feature subset will be extracted.
Keywords/Search Tags:Antifreeze protein prediction, Weighted General dipeptide composition, Ensembled feature selection method
PDF Full Text Request
Related items