Font Size: a A A

Who Is There and What are They Doing? An Agile and Computationally Efficient Framework for Genome Discovery and Annotation from Metagenomic Big Dat

Posted on:2018-03-16Degree:Ph.DType:Thesis
University:San Diego State UniversityCandidate:Silva, Genivaldo Gueiros ZacariasFull Text:PDF
GTID:2478390020457585Subject:Computer Science
Abstract/Summary:
Microbes are more abundant than any other biological guild, and in any environment it is important to understand which organisms are present, what they are doing, and how they are doing it. In many environments a majority of the microbial community members cannot be cultured. Metagenomics is a powerful tool to directly probe uncultured genomes and understand the diversity of microbial communities using only their DNA sequences. Analyzing the taxonomic and functional profiles present in a microbial community from unannotated shotgun sequencing reads is one of the goals in metagenomics, with extremely valuable applications in biological research such as medicine, biofuels, and ecology. Currently available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keeps increasing. This thesis integrates four agile and computationally efficient methods that I have developed (FOCUS, FOCUS2, Scaffold builder, and SUPER-FOCUS) to recover, scaffold, and annotate genomes from metagenomes. The framework was tested in over 500 human and ocean samples totaling over 6TB of data, and over six thousand genomes were recovered. Each computational method presented in this dissertation opens new horizons for the future of metagenomic data analyses independently of query and database size.
Keywords/Search Tags:Doing, Over
Related items