Font Size: a A A

A Model Stacking Framework For Identifying DNA Binding Proteins By Orchestrating Multi-view Features And Classifiers

Posted on:2019-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:X J LiuFull Text:PDF
GTID:2370330626452082Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
DNA binding proteins are important for cellular processes,for instance,DNA replication,DNA repair and DNA modification.Utilizing the experimental methods such as X-ray crystallography to identify DNA binding proteins is time-consuming and costly.Many methods based on machine learning methods using only sequence information have been proposed.It is crucial for these methods to select appropriate feature extraction methods and classifiers.It is significant for prediction performance and biological experiments to disclose the meaning and measure the contribution of different feature space and classifiers.In this study,we have proposed a stacking model which orchestrates multi-view feature and classifiers to identify DNA binding proteins.The model contains two layers.One is support vector machine and the other is logistic regression.First,utilizing four kinds of feature extraction methods to train four SVMs respectively.Second,the prediction probabilities of the four SVMs are fed into the logistic regression.The prediction result of the model is the result of the second layer.Four feature extraction methods are Local_DPP,PSSM_DWT,188 D and utilizing the autocovariance to extract features from the predicted secondary protein structure information.Local_DPP and PSSM_DWT are based on the evolutionary information of sequences.188 D is based on the physiochemical properties and sequence composition.The accuracy of stacking model on the training dataset PDB1075 is 83.53%,and on the independent dataset PDB186 is 81.72%.The experimental results show that the model has better performance than most existing models,and it can flexibly coordinate different prediction models to have better performance.
Keywords/Search Tags:DNA-binding proteins, model stacking, logistic regression, multi-view features
PDF Full Text Request
Related items