Font Size: a A A

Analysis of chloroplast genomes databases and relationships using whole genome informatics tools

Posted on:2005-03-12Degree:Ph.DType:Dissertation
University:George Mason UniversityCandidate:Kilel, BeatriceFull Text:PDF
GTID:1453390008480505Subject:Chemistry
Abstract/Summary:
Post-genomic era has dawned with many genomes having been fully sequenced but little in terms of ways to manage, analyze and make inferences to compare structure and function. Although many of the technical issues of sequencing complete genomes have been solved at the present moment, the use of informatics tools for data analysis has not been fully exploited. It is thought that most of the fully sequenced chloroplast genomes can open up many opportunities via comparative analysis in understanding plant operations and relationships. The premise in this promising endeavor is to create a centralized database of the complete genomes so that data is easily managed and manipulated. In order to keep this data current, annotation process and subsequent automation is an invaluable process, which at the present moment is heterogeneous, not available, or not optimal.; In this work, I proposed and investigated control of data heterogeneity by designing a centralized chloroplast genome database in MYSQL DBMS and populated with complete plant and protist chloroplast genome sequences from the GenBank. This centralized database would allow for data queries to be performed and compared. I then devised an automated data extraction and update tool to track any new information that may be added into the GenBank and PubMed as more chloroplast genomes continue to be fully sequenced. Genotator, Artemis and BlastP software were used to re-annotate sequence data that are either poor or sub-optimal so that data is updated with current literature and citations. In the end I found that the total number of putative genes in the old sequences was reduced and was able to assign function to some of the genes.; Comparative analysis, which promises to be useful in studying organisms across the species divide, was performed using some of the currently available web-based computational tools. These tools were used to elucidate gene order, gene content, phylogeny and visual relatedness in the genome data. This was made possible by the fact that the chloroplast genome is a highly conserved organelle with most of the land plant species having the same general structure and gene order with a small genome size of 120–220 KB and 120–150 genes.; The significance of the results obtained from this study is to provide a better understanding of genome rearrangements, location of homologous genes in the genomes that have not yet been sequenced, and thus investigate mechanisms of genome evolution. The elucidation of related genes on a whole genome level allows for extrapolation of information across species with a lot more certainty. This may also find application in the introduction of genes to the host organisms through the chloroplasts instead of the nucleus for transgenic plants.
Keywords/Search Tags:Genome, Chloroplast, Data, Fully sequenced, Genes, Tools
Related items