Font Size: a A A

Bioinformatics Prediction Of Genes And Their Functional Elements

Posted on:2010-06-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:C MaFull Text:PDF
GTID:1100330338488314Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
Prediction of genes and their functional elements, which is valulabe for finding new disease genes and understanding the mechanism of gene regulation, is the first step after the complete sequencing of the genomes. Although several methods have been proposed, the prediction accuracy is still unsatisfied. In this dissertation, we focused on improving the prediction performance of genes and their functional elements with bioinformatics strategies.For the accurate prediction of alternative spliced cassette exons (SCEs), three evolutionary conservation-based features were mined from SCEs and their flanking intronic regions with comparative genomics data. Testing results indicate that these evolutionary conservation-based features have a higher discriminative power than 36 widely used sequence-based features.For the identification of transcribed regions with EST-Genome alignments, several measures, including Length Check, Direction Check, Gap Check, Identity Check, Coverage Check, Terminal Check and Multi-alignments Check, were introduced to eliminate spurious EST alignments. On the basis of that, a computational tool, named ESTCleanser, was developed to identify true EST alignments for obtaining reliable transcribed regions. The performance of ESTCleanser has been evaluated on the well-annotated ENCODE testing regions of the human genome. The evaluation results show that the accuracy of ESTCleanser at exon and intron levels is enhanced than that of UCSC spliced EST alignments.For the accurate recognition of protein-coding genes, a new gene finder, named GeneAnnotator, was developed by incorporating EST alignments and genomic DNA sequence with a two-level strategy. Testing results show that GeneAnnotator's performance is superior to the well-known gene predictor AUGUSTUS.The pre-miRNAs were predicted by combing the target-based approach, filter-based approach and machine-learning method. The effectivenss of this method was demonstrated by the prediction of pre-miRNAs on the whole Drosophila melanogaster genome. For the accurate prediction of promoters, a new method, named ComPromoter, was proposed to combine the predictions of several promoter predictors. ComPromoter integrated several features, including the relationship between the occurrence frequencies of ProKey predictions and voting distances, and the relationship between the occurrence frequencies of ProKey predictions and prediction scores. Evaluation results on the human ENCODE testing regions showed that ComPromoter could achieve higher Pearson correlation coefficient (CC) than any combined promoter predictor.
Keywords/Search Tags:genome annotation, protein-coding gene, miRNA, alternative splicing, promoter prediction
PDF Full Text Request
Related items