Font Size: a A A

A New Segmentation Algorithm Of DNA Sequences And Its Applications In The Analysis Of Genomes

Posted on:2008-04-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:F GaoFull Text:PDF
GTID:1100360245990992Subject:Biophysics
Abstract/Summary:PDF Full Text Request
With the advent of high-throughput DNA sequencing, genomic sequences of numerous prokaryotic and eukaryotic organisms have become publicly available. Mining useful biological knowledge from these DNA sequences currently represents a challenge to the biological (if not the whole scientific) community. Accumulating evidence shows that there are a number of turning points in most genome sequences, through which the nucleotide composition undergoes sudden changes. Usually, clear biological implications are associated with turning points. This dissertation describes a new segmentation algorithm of DNA sequences and its applications in the analysis of genomes.A new measure, to quantify the difference between two probability distributions, called the quadratic divergence, has been proposed. Based on the quadratic divergence, a new segmentation algorithm to partition a given genome or DNA sequence into compositionally distinct domains is put forward. The new algorithm has been applied to identification of isochore structure in eukaryotic genomes, detection of CpG islands, prediction of replication origin and terminus, and location of coding-noncoding borders. Compared with the entropic segmentation algorithm based on the Jensen-Shannon divergence, the new algorithm has a number of advantages. Particularly, it is much simpler and faster than the entropy-based method. Based on the obtained results, the relationships between the G+C content and other genomic features, such as distributions of genes and CpG islands, can be analyzed in a perceivable manner. The precise boundary coordinates obtained by the segmentation algorithm and the associated cumulative GC profile provide a useful platform to analyze a genome or chromosome. We have therefore developed them into GC-Profile, an interactive web-based software system, which can be used to segment prokaryotic and eukaryotic genomes. GC-Profile provides a quantitative and qualitative view of genome organization. It shows that GC-Profile would be an appropriate starting point for analyzing the isochore structure of higher eukaryotic genomes, and an intuitive tool for identifying genomic islands in prokaryotic genomes.Since the early 1980s, there has been great progress in the development of computational gene-finding algorithms. Some problems, however, have not yet been solved currently. Recognizing short genes in prokaryotes or short exons in eukaryotes is one of such problems. The dissertation is also devoted to assessing various algorithms, including those currently available and the new ones proposed here, in order to find the best algorithm to solve the issue. Based on the databases and a standard benchmark, 19 algorithms were evaluated. Consequently, the Z curve methods with 69 and 189 parameters are the best ones among them, based on the databases constructed here.
Keywords/Search Tags:the Z curve, isochore, cumulative GC profile, quadratic divergence, compositional segmentation, gene recognition
PDF Full Text Request
Related items