Font Size: a A A

Prediction Of DNA-and RNA-binding Proteins Based On Deep Learning

Posted on:2024-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q WeiFull Text:PDF
GTID:2530307142454544Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the development of high-throughput sequencing technology,the number of new protein sequences has increased rapidly.Explaining the biological laws and mechanism of action contained in the nucleic acid binding protein data provides a reference for studying the pathogenesis of various diseases and the relationship between related diseases,which can promote the development of life science,medicine and information science.The traditional prediction experiment methods are limited by the high cost and low efficiency.The prediction research of DNA-binding proteins(DBPs)and RNA-binding proteins(RBPs)through the deep learning methods has become the research frontier of bioinformatics.This topic is based on deep learning to predict DBPs and RBPs.The main contents are as follows:1.We propose a new model DRBPPred-GAT to predict DBPs and RBPs.First,pseudo position specific scoring matrix(Pse PSM),composition,transition and distribution(CTD),pseudo amino acid composition(Pse AAC),grouped tri-peptide composition(GTPC),multivariate mutual information(MMI),joint triad(CT),normalized Moreau-Broto autocorrelation(NMBroto)and encoding based on grouped weight(EBGW)are used to extract multiple information of protein sequence,and the features extracted by eight methods are fused.Secondly,autocoder is used to reduce the feature dimension after fusing multiple features.Finally,the deep learning method of multi-head attention neural network is used for the first time to predict DBPs and RBPs.Under 10-fold cross-validation,the ACC value of DRBPPred-GAT for prediting DBPs on the training dataset is 84.32%,and the AUC value is 0.9219.The ACC value of DRBPPred-GAT for prediting RBPs is 83.60%,and the AUC value is 0.9040.In addition,the DRBPPred-GAT model has achieved good prediction results on the test datasets.Compared with other methods,DRBPPred-GAT has better prediction performance.2.We propose BiLSTM-MHA for predicting DBPs and RBPs.First,the protein sequence information is extracted by five methods: CTD,dipeptide deviation from expected mean(DDE),dipeptide composition(DPC),GTPC,and Pse AAC.Then the feature vectors extracted by the five methods are fused.Secondly,Group Lasso is used to reduce the dimension of fusion features,remove irrelevant features and improve the accuracy of model prediction.Finally,the multi-head attention mechanism is combined with the bidirectional long-term and short-term memory network to predict DBPs and RBPs.Under 10-fold cross-validation,the AUC values of DBPs and RBPs predicted by BiLSTM-MHA are 0.9090 and 0.8913 respectively,which are superior to other published models,and BiLSTM-MHA is compared with other methods on the test dataset.The prediction results of training dataset and test dataset show that the proposed BiLSTM-MHA model can effectively predict DBPs and RBPs.
Keywords/Search Tags:deep learning, DNA-binding proteins, RNA-binding proteins, graph attention network, multi-head attention mechanism, bidirectional long short-term memory network
PDF Full Text Request
Related items