Font Size: a A A

The compositional organization of mammalian genomes: Characteristics and evolution

Posted on:2010-03-27Degree:Ph.DType:Thesis
University:University of HoustonCandidate:Elhaik, EranFull Text:PDF
GTID:2440390002475433Subject:Biology
Abstract/Summary:
The isochore theory describes the mammalian genome as a mosaic of long (≥ 300 kb) genomic regions that are fairly homogeneous in their guanine and cytosine (GC) content. The isochore theory was the first to identify the nonuniformity of nucleotide composition within vertebrate genomes. In recent years, however, the theory's methodology, terminology, and predictions have been challenged.;To overcome this problem I devised IsoPlotter, a recursive segmentation algorithm that employs a dynamic halting criterion. A segmentation of the human genome with IsoPlotter revealed that two thirds of the genome is a mixture of many short, compositionally homogeneous domains and relatively few long ones, while the remaining portion of the genome is composed of nonhomogeneous domains.;Finally, I studied seven eutherian genomes in terms of structure, composition, and evolution. Typical eutherian genomes were found to consist of mainly short homogeneous domains with "isochoric" domains (≥ 300 kb) covering only ∼20% of the genome. Murid genomes were exceptional in their long homogeneous domains and narrow compositional range. These findings are discussed in light of two phylogenetic hypotheses that differ in the validity of clade Euarchontoglires. If Euarchontoglires is valid, then the unique compositional organization of murids can be explained by a compositional transition that fused many of the domains and reduced compositional variance. If the alternative hypothesis is correct, then the compositional organization of murid and platypus genomes represent an ancestral state, while the genome of laurasiatherians and primates underwent a process of domain reduction and GC-content range expanse.;Here, I tested various methods used to detect compositionally homogeneous domains and their boundaries. First, I showed that the GC content of third-codon position cannot be used as stand-in for compositionally homogeneous domains, whether isochoric or not. My conclusion was that compositionally homogeneous domains can only be identified by using segmentation algorithms and the genome sequence. Next, I presented a benchmark for testing the performances of segmentation algorithms, and found that recursive segmentation algorithms based on the Jensen-Shannon entropic divergence outperform all other algorithms. These algorithms, however, perform poorly in certain instances because of the arbitrary choice of their halting criterion.
Keywords/Search Tags:Genome, Compositional organization, Algorithms
Related items