Font Size: a A A

Boosting Strategy-based Promoter Prediction Methods

Posted on:2010-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q S CengFull Text:PDF
GTID:2208360302458734Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Computational prediction of eukaryotic promoter is one of most challenging problems in DNA sequence analysis. As transcription is the first step of gene expression, regulation on transcription is an important way of expression regulation as well, and promoter determines the position of transcriptional start point and the frequency with which the gene is transcribed, so promoter recognition is important for genome annotation. Based on the known promoter and nonpromoter sequences, two new different methods are presented for promoter prediction.First of all, a new method of promoter prediction called KL-Boosting is proposed by the study, and it is based on the following hypothesis: Promoter is determined by some word patterns and different promoters are determined by different words. Most potential features are selected by divergence distance to build a sequence of weak classifier by feature-boosting. A number of weak classifiers construct a strong classifier, which can achieve a better performance. Different from other classifiers, a different training and classifying strategy is adopted.Secondly, another promoter prediction method called PWM-Boosting is presented in the study. it is based on the following hypothesis: Promoter is determined not only by some word patterns, but also by the positions of word patterns. Potential features of the sequence are selected by searching the diversity of position weight matrix, and they are trained and predicted by calling the boosting algorithm of the Open Source Computer Vision Library. It adopts a similar training and classifying strategy with KL-Boosting method. Experimental results on test samples show that KL-Boosting method has a moderate sensitivity and specificity, in addition, its stability also is better. Although the sensitivity and stability of the PWM-Boosting method is not better than the KL-Boosting method, its specificity is higher than KL-Boosting.Finally, experimental results of the two algorithms in the study on large genomic sequences comparing with other four excellent algorithms(Eponine, PDF, FirstEF and PromoterInspector) show they are efficient with higher sensitivity and specificity in predicting promoter regions,and have their own characteristics.
Keywords/Search Tags:DNA sequence analysis, divergence distance, feature-boosting, position weight matrix, Open Source Computer Vision Library
PDF Full Text Request
Related items