Font Size: a A A

Research On Nucleic Acid-binding Proteins Prediction

Posted on:2018-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:1360330563992212Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
The protein that binds to a nucleic acid is called a nucleic acid-binding protein and is divided into an RNA binding protein(RBP)and a DNA binding protein(DBP)depending on whether it binds to RNA or DNA.By interacting with an RNA or a DNA,the nucleic acidbinding proteins play important roles in a variety of cellular processes,such as gene transcription,post-transcriptional regulation,translation,and so on.Since the identification of nucleic acid-binding proteins by experimental methods requires a lot of money and time,it is necessary to develop the computational approaches for large-scale and high-precision prediction of which proteins may interact with RNA or DNA,and thus have a guiding role in the experimental design.In this thesis,the prediction of nucleic acid-binding proteins has been studied in depth,including the prediction of RNA-binding proteins,the prediction of DNA-binding proteins and the multi-class prediction of nucleic acid-binding proteins.In order to have a wider applicability,we performed the RNA-binding proteins prediction from the protein sequence,combining with the support vector machine classification,and developed the RBPPred method.The protein properties used include hydrophobicity,polarity,normalized van der Waals volume,polarizability,predicted secondary structure,predicted solvent availability,side chain's charge and polarity,and protein evolutionary information.The results showed that RBPPred achieved a sensitivity of 83%,specificity of 96% and MCC of 0.888 on the 2780 RBPs and 7093 non-RBPs using the 10-fold cross-validation.Moreover,a sensitivity of 84%,specificity of 97% and MCC of 0.788 were obtained on the independent human testing set.Much better performance was achieved by RBPPred when compared with other state-of-the-art approaches on different datasets.In addition,we also tested the capability of the RBPPred method to predict new RBPs,which further confirmed the practicability and predictability of the method.Finally,we applied RBPPred to the proteomes of different species to predict the possible RNAbinding proteins for each proteome and analyzed the conserved RNA-binding domains contained in the proteins.On the basis of RBPPred,we further developed the RBPPred2.0 method by making some improvements,including the updating of data sets,the addition of three important attribute characteristics and the exploration of the effect of different sequence alignment databases on the prediction of RNA-binding proteins.The results show that RBPPred 2.0 has a further improvement in the prediction performance relative to RBPPred.In view of the excellent performance of RBPPred2.0 in RNA-binding proteins prediction,we developed DBP-Pred and NABP-Pred methods by extending all the attributes used in RBPPred 2.0 to DNA-binding proteins prediction and muti-class prediction of nucleic acid-binding proteins.As tested,DBP-Pred achieved a sensitivity of 66%,specificity of 87% and MCC of 0.548 on a non-redundant and independent testing set including1244 DBPs and 1244 non-DBPs using the 10-fold cross-validation,which performed much better than other methods.In the multiclass prediction of nucleic acid-binding proteins,four new protein datasets(DRBP,o DBP,o RBP,non-NABP)were defined and constructed according to the binding of the protein and the nucleic acid,thus the prediction of DBPs and RBPs can be made on the same model.It's the first time to use the newly defined data sets in the prediction of nucleic acid-binding proteins.NABP-Pred achieved an overall predictive accuracy of 76.08% by performing fivefold cross-validation on the data set containing 212 DRBPs,1939 o DBPs,1314 o RBPs,and 4993 non-NABPs.
Keywords/Search Tags:Nucleic acid-binding protiens prediction, RNA-binding proteins predition, DNA-binding proteins prediction, Support vector machine (SVM), Feature encoding, Feature vector, Feature selection
PDF Full Text Request
Related items