Font Size: a A A

Prediction Of DNA And RNA Binding Proteins Based On Machine Learning

Posted on:2022-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q M ZhangFull Text:PDF
GTID:2480306548997449Subject:Statistics
Abstract/Summary:PDF Full Text Request
DNA and RNA binding proteins play important roles in gene regulation,alternative splicing and transcription control.Abnormal DNA and RNA binding proteins will lead to kidney disease,cancer,diabetes and other human diseases.Mining the biological laws contained in the data of DNA and RNA binding proteins plays a key role in understanding the mechanism of intermolecular interaction,disease diagnosis,drug research and development.However,traditional experimental methods are inefficient and limited by experimental conditions,which can not meet the needs of research.The prediction of DNA and RNA binding proteins by machine learning method has become the research frontier of bioinformatics.This topic studies DNA and RNA binding proteins based on machine learning,the main research contents are as follows:1.A new method StackPDB of DNA binding proteins prediction based on stacking integration is proposed.Firstly,pseudo amino acid composition,pseudo position specific scoring matrix,position specific scoring matrix transition probability composition,evolutionary distance transformation and residue probing transformation are used to extract protein sequence features.Secondly,the optimal feature subset is selected by XGBoost recursive feature elimination and input into XGBoost and Light GBM to generate the input probability of support vector machine.Finally,the meta classifier outputs the accuracy of the DNA binding proteins prediction.By leave-one-out cross-validation test,ACC of StackPDB in training dataset is 93.44%,MCC is 0.8687,ACC in independent test set PDB186 and PDB180 are 84.41% and 90.00%,MCC are 0.6882 and 0.7997 respectively.The results show that StackPDB has good prediction performance and can effectively predict DNA binding proteins.2.A new method DEEPStack-RBP of RNA binding proteins prediction based on deep learning and ensemble learning is proposed.Firstly,we use conjoint triad,pseudo amino acid composition,local descriptor,multivariate mutual information and position specific scoring matrix transition probability composition to extract the protein sequence information.Then we fuse the five features.For the first time,autoencoder is used to remove noise and redundancy.Synthetic minority over sampling technique edited nearest neighbors is used to balance the difference between positive and negative samples.Finally,we combine deep learning with ensemble learning for the first time,and input the optimal feature subset into the stacking classifier integrated by bidirectional long short-term memory,gated recurrent unit and support vector machine.Under the 10-folds cross validation test,the ACC of DEEPStack-RBP model is 98.76% and the MCC is 0.9508 on the training set RBP9873.The ACC of Human,S.cerevisiae and A.thaliana are 97.16%,97.67% and 99.57%,respectively.The MCC are 0.9405,0.9499 and 0.9906,respectively.The prediction results show that DEEPStack-RBP can overcome the shortcomings of existing models,and can be used as a powerful tool for RNA binding proteins prediction.
Keywords/Search Tags:DNA binding proteins, RNA binding proteins, multi-information fusion, bidirectional long-term and short-term memory network, gated recurrent unit, Stacking integration
PDF Full Text Request
Related items