Font Size: a A A

New Studies On Recognition Of Protein-coding Genes And DNA Sequence Analysis In Prokaryotic Genomes

Posted on:2008-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y LinFull Text:PDF
GTID:2120360245491239Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Nowadays, a flood of sequence data means that many of the challenges in biology are now challenges in theoretical computation. Bioinformatics has now firmly established itself as a discipline in molecular biology, and encompasses a wide range of subject areas. Identification of protein-coding genes in microbial genomes is one of the most important tasks in bioinformatics. This dissertation describes a little improvement in recognizing protein-coding genes in bacterial genomes using the Z curve method.The first part of the dissertation introduces the main contents of the current bioinformatics research, and the background knowledge about gene recognition in prokaryotes is also referred in brief.The second part of the dissertation describes the re-annotation on the protein-coding genes in Bacillus cereus ATCC 10987 genome by joint applications of Zcurve and Glimmer program. To verify the additional ORFs which are not included in the original annotation, we also utilize the method of BLAST database search for better accuracy. Consequently, the number of re-annotated protein-coding genes in the Bacillus cereus ATCC 10987 genome is found to be 5180, which is evidently less than 5603 according to RefSeq annotation and more authentic. These genes then become the basis for much further study into the biology of relative organisms.The third part of the dissertation proposes the application of Z curve method in the recognition of protein-coding genes in prokaryotic genomes. Based on the Z curve theory of DNA sequences, an ab initio bacterial gene-finding program Zcurve 2.0 has been developed, which newly makes use of the SVM algorithm to classify coding ORFs and non-coding ORFs. After comprehensive comparison with Zcurve 1.02 and Glimmer 3.02 towards 419 chrosomes, Zcurve 2.0 is found to have the identical accuracy to Zcurve 1.02 or Glimmer 3.02, and much lower additional prediction rate than Zcurve 1.02. Besides, Zcurve 2.0 can be easily understood to execute and the speed of it is higher than Glimmer 3.02. It is shown that the joint applications of both systems greatly improve gene-finding results.The fourth part of the dissertation gives a brief introduction to update of the Z-curve Database and Database of essential genes (DEG). Z-curve Database 2.1 provides a useful platform to analyze the data of genome in a perceivable manner, and DEG 3.0 provides a basis to develop the algorithm in prediction of essential genes.
Keywords/Search Tags:the Z curve, bacterial and archaeal genomes, gene recognition, re-annotation, database
PDF Full Text Request
Related items