Font Size: a A A

Analysis And Prediction Of Transcription Factor Binding Sites And Animal Toxins

Posted on:2011-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:L YangFull Text:PDF
GTID:1100360305991368Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The identification of transcription factor binding sites is an important step towards the understanding of the transcription regulation. Reliable prediction of transcription factor binding sites can help to identify the target genes of transcription factors and infer the relationship between the positions of binding sites and regulation activity of transcription factors. But the specificity of recognition results achieved by the current algorithms is quite low; therefore, algorithms that can identify binding sites more efficiently are required. The animal toxins are directed against a wide variety of pharmacological targets, making them good tools for studying the properties of these targets. The animal toxins are used in the studies of ion channels, drug discovery and formulation of insecticides. So, prediction of the animal toxins is become very important, it is necessary to propose a computational method to identify the animal toxins.In this thesis, six important issues that are transcription factor binding sites, animal toxins, neurotoxins, cytotoxins, presynaptic neurotoxins and postsynaptic neurotoxins are predicted by using position correlation scoring function (PCSF), increment of diversity (ID), support vector machine (SVM) and Naive Bayes classifier (NB). The main contributions are summarized as follows:First,8 non-redundant experimentally known transcription factor binding sites are extracted from JASPAR database. Based on pseudo-counts and the conservation analysis of transcription factor binding sites, a novel position correlation scoring function algorithm (PCSF) is proposed. In order to reduce the false positive, the optimal cutoffs are defined for the position correlation scoring function (PCSF). Testing is performed to compare the recognition accuracy of PCSF algorithm with position weight matrix (PWM) that is used in MATCHTM, the predictive results indicates that the PCSF algorithm is better than PWM algorithm.Second, the animal toxin sequences are downloaded from Animal Toxin Database (ATDB), the non-toxin dataset described in the work of Saha and Raghava is used as the negative dataset. Both animal toxin and non-toxin datasets are culled by the PISCES software, the datasets with less than 25%,40%,60%,80% and 90% sequence identity are used. Baed on 20 amino acid compositions,400 dipetide compositions,6 amino acid hydropathy compositions and 36 hydropathy dipeptide compositoons, the ID algorithm is applied to predict the animal toxins and non-toxins. The predictive results indicate that the best predictive results are obtained by selecting dipeptide compositions as imputing parameters. For improving the successful rates of the animal toxins,4 kinds of ID values as inputting the parameters of SVM are combined, and the overall prediction accuracy of SVM is better than ID algorithm. In addition, neurotoxins and cytotoxins are also predicted. In order to compare SVM with other approaches, SVM is also used to predict neurotoxins that described in the work of Saha and Raghava, the higher predictive success rates than the previous algorithms are obtained by SVM.Finally, the protein sequences for presynaptic and postsynaptic neurotoxins are obtained from Swiss-Prot. The distriution of disulfide bond numbers and classes are studied according to the annotation information provided by Swiss-Prot. Based on ATDB and Swiss-Prot, two neurotoxin datasets which the sequence identity is less than 80% are obtained. Five feature extraction methods are used in this paper:(1):the dipeptide compositions; (2):50 features extract by MRMR software; (3):the motif features discoveried by MEME; (4):the motif features discoveried by Prosite; (5):the motif features discoveried by Interpro. By selecting 12 kinds of hybrid parameters as the inputting parameters of ID algorithm and NB classifier, two datasets are predicted The predictive results of jackknife tests show that:(1):the predictive results based on extracted motif features are better than the 400 dipeptide features; (2):by using motif features and 50 extracted features, the best predictive results are obtained.
Keywords/Search Tags:transcription factor binding sites, animal toxins, motif features, increment of diversity, Naive Bayes classifier
PDF Full Text Request
Related items