Font Size: a A A

Bioinformatics for the Comparative Genomic Analysis of the Cotton (Gossypium) Polyploid Complex

Posted on:2016-04-29Degree:Ph.DType:Dissertation
University:Brigham Young UniversityCandidate:Page, Justin ThomasFull Text:PDF
GTID:1473390017977754Subject:Bioinformatics
Abstract/Summary:
Understanding the composition, evolution, and function of the cotton (Gossypium) genome is complicated by the joint presence of two genomes in its nucleus (AT and DT genomes). Specifically, read-mapping (a fundamental part of next-generation sequence analysis) cannot adequately differentiate reads as belonging to one genome or the other. These two genomes were derived from progenitor A-genome and D-genome diploids involved in ancestral allopolyploidization. To better understand the allopolyploid genome, we developed PolyCat to categorize reads according to their genome of origin based on homoeo-SNPs that differentiate the two genomes. We re-sequenced the genomes of extant diploid relatives of tetraploid cotton that contain the A1 (G. herbaceum), A 2 (G. arboreum), or D5 (G. raimondii ) genomes. We identified 24 million SNPs between the A-diploid and D-diploid genomes. These analyses facilitated the construction of a robust index of conserved SNPs between the A-genomes and D-genomes at all detected polymorphic loci. This index can be used by PolyCat to assign reads from an allotetraploid to its genome-of-origin. Continued characterization of the Gossypium genomes will further enhance our ability to manipulate fiber and agronomic production of cotton.;With new whole-genome re-sequencing data from 34 lines of cotton, representing all tetraploid cotton species, we explored the evolution of the cotton genome with greatly improved resolution and improved tools, including BamBam and PolyDog. Identifying SNPs and structural variants among these 34 lines and their extant diploid relatives, we clarified phylogenetic relationships among tetraploid species, including newly characterized species, and identify introgression between different species of cultivated cotton. We explored the evolution of homoeologs in the AT- and DT-genomes and especially the phenomenon of homoeologous conversion. Homoeologous conversion is rare in cotton, perhaps due to the vast difference in chromosome sizes in the two genomes. Several regions of the genome have been introgressed between G. hirsutum and G. barbadense resulting in superior cultivars, likely with beneficial alleles from both species and novel combinations of alleles. The genomic data provide a valuable resource for cotton researchers and breeders, who can freely access the data online at CottonGen.
Keywords/Search Tags:Cotton, Genomes, Gossypium
Related items