Font Size: a A A

Studies On Recognizing Protein-coding Genes In Prokaryotic Genomes And Analyzing Genome Evolution

Posted on:2005-01-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y OuFull Text:PDF
GTID:1100360182455832Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The fast increasing pace of the bacterial genome-sequencing projects leads to a need for automatic genome annotation. One of the most important tasks of annotation is to recognize protein-coding genes in genomes. This paper describes some new approaches for recognizing protein-coding genes and horizontally transferred genes in bacterial genomes using the Z curve method. The Z curve is a unique three-dimensional curve representation for a given DNA sequence. The Z curve database for more than 1310 genomes has been established here, which have been applied in gene prediction and visualization of genomes etc. Using the Z curve method, it is found that for the bacterial genome with high G+C content, the distribution pattern of ORFs shows a flower-like shape. The interesting finding presented here is useful to improve the gene-finding algorithms. In addition, a self-training algorithm, GS-Finder 1.0, is proposed to recognize translation start sites in bacterial genomes without a prior knowledge of rRNA in the genomes concerned. Subsequently, a new system, ZCURVE 1.0, for finding genes in bacterial and archaeal genomes has been proposed. The current algorithm, which is based on the Z curve, lays stress on the global statistical features of coding sequences by taking the frequencies of bases at three codon positions into account. The average accuracy of both systems, ZCURVE 1.0 and Glimmer 2.02 is well matched. However, ZCURVE 1.0 has more accurate gene start prediction, lower additional prediction rate for the genomes with high G+C content. Moreover, a new program, ZCURVE_CoV 1.0, to recognize genes in coronavirus genomes, especially suitable for SARS-CoV genomes, has been proposed. Instead of a frequently used sliding window technique, we use the z n' curve based on the Z curve to delineate the G+C content along bacterial genomes. The wavelet transform is employed to detect the sharp changes of the z n' curve, which are associated with unusual regions which have distinct G+C content with respect to the nearby regions and harbor the horizontally transferred genes. The database of the 'z n curve provides the detailed information of putative unusual regions and alien genes identified by the proposed approach for nine bacterial genomes. Additionally, we have constructed a Database of Essential Genes (DEG), which contains all the essential genes in bacterial and yeast genomes that are currently available. It is suggested that the essential genes would be inherited vertically rather than transferred horizontally. The analysis of essential genes could help to answer the question what are the basic functions necessary to support cellular life.
Keywords/Search Tags:the Z curve, bacterial and archaeal genomes, gene recognition, horizontal gene transfer, essential genes, SARS-CoV genomes
PDF Full Text Request
Related items