Font Size: a A A

Recognition Of DNA/RNA Binding Proteins Based On Sequence Information

Posted on:2019-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2370330566996742Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the start-up and development of the genome project,the number of protein sequences is increasing exponentially.However,the number of proteins with structure and function is increasing slowly.How to extract useful information from protein sequences and to predict their structure and function has become an urgent and difficult problem.DNA-and RNA-binding proteins are two kinds of special proteins.They not only play important roles in various cell activities,but also relate to many diseases.Experimental techniques can identify the two proteins accurately,but these methods are expensive and have strict requirements on the experimental environment and equipment.To solve above problems and design efficient and convenient recognition methods,we studied the identification of DNA/RNA-binding proteins based on protein sequence.To improve the accuracy of sequence-based DNA-binding protein prediction methods,we designed an ensemble learning strategy based on weighted voting.An ensemble prediction model i DNA-Prot-Vote was constructed by combining three existing protein representation methods(k-mer,PDT and PDT-Profile)and SVM algorithm.The results on two widely used datasets show that the proposed ensemble learning method can significantly improve the accuracy of DNA-binding protein recognition.Besides,the performance of ensemble model i DNA-Prot-Vote is better than most existing prediction methods.To represent protein sequences effectively,we designed three protein representations based on PSFM,including PSFM-DBT,PSFM-TT and PSFM-RPT.The experimental results show that the three proposed methods outperform most existing methods,and PSFM-DBT achieved the best performance.Besides,we analyzed the features extracted by PSFM-DBT at the molecular biology level and verified the effectiveness of PSFMDBT.A prediction model for DNA-binding protein was constructed by using PSFM-DBT and SVM algorithm,and the corresponding online prediction system was also provided.To identify DNA-binding proteins,RNA-binding proteins and non-DNA/RNAbinding proteins,we proposed a prediction method(Deep DRBP)based on deep learning.Deep DRBP is the first sequence-based method that can identify DNA-binding proteins,RNA-binding proteins and non-DNA/RNA-binding proteins.It consists of two layers,each layer is a prediction model construted by using a kind of deep neural network and a kind of protein evolutionary profile.The first layer is used to distinguish between DNA/RNA-binding proteins and non-DNA/RNA-binding proteins,the second layer was used to further determine whether the query protein is a DNA-binding protein or a RNAbinding protein.The predicting results on benchmark dataset and the new proteins extracted from Swiss-Prot show that the proposed method is an effective prediction method.Besides,the corresponding online prediction system is also provided.
Keywords/Search Tags:DNA-binding protein, RNA-binding protein, PSFM-DBT, ensemble learning, deep learning
PDF Full Text Request
Related items