Font Size: a A A

Mining The Genomic Sequence Alignments To Predict Gene Structures

Posted on:2008-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:B TuFull Text:PDF
GTID:2120360272968780Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
The increasing availability of data from genome sequencing projects of different species has promoted the application of comparative genomics in genome research on the one hand, and made the prediction of protein coding gene in the genomic sequence more crucial on the other. Recently, many studies have constructed the gene structure prediction programs for a higher level of precision by comparative genomics. However, the genome sequences are so complicated that it is necessary to mine the genomic sequence alignments for the similarity of gene coding region to improve the precision of gene prediction. Therefore, the following work has been performed:Firstly, based on the investigation of genomic sequence alignments, several characteristics that could distinguish the coding and non-coding sequences were analyzed, including the extent of sequence similarity, the relationship among the alignments due to evolution selecting, the boundary distance between the coding region and its matching alignments. Accordingly, several similarity mining rules were designed, with the goal of reducing the false-positive sequence alignments.Furthermore, on the basis of the Ab Initio gene prediction system developed by our lab, GeneKey, the approach to apply the similarity mining rules to the system was designed. The model for producing and processing the sequence similarity information was created. The exon model was modified by integrating with the similarity feature. Consequently, the gene predicting system utilizing both the sequence similarity information and the statistic information was implemented.Finally, the evaluation of the system was carried out on a widely-used test dataset. The results demonstrate that the accuracy of the gene prediction system has improved by integrating with the similarity information. In addition, the system is better than the gene prediction software TWINSCAN at both the nucleotide level and exon level. It indicates that the rules of mining the genomic sequence alignments for the similarity, as well as the approaches of combining the similarity information with the statistical information could be effective for the gene structure prediction.
Keywords/Search Tags:gene structure prediction, comparative genomics, sequence similarity, features analysis, information integration
PDF Full Text Request
Related items