Font Size: a A A

Computational methods for bacterial characterization and bacteria-host/environment interaction analyses

Posted on:2013-06-14Degree:Ph.DType:Dissertation
University:University of Missouri - ColumbiaCandidate:Zhang, ChaoFull Text:PDF
GTID:1450390008976189Subject:Computer Engineering
Abstract/Summary:
As the largest domain of all living organisms on earth, bacteria are estimated to have more than five nonillion(1030) individuals worldwide [1], which are far more than previous estimations of the total number of bacteria [2]. These single-cell organisms can be found everywhere, e.g., deep sea, hot springs, human gut, and even in radioactive waste [3]. Due to close connections between bacteria and human life, we cannot live without them and actually benefit from the microorganisms in many cases, e.g. food production, human health [4], environmental sciences [5], and chemical industry [6, 7]. On the other hand, pathogenic bacteria are one of the most serious threats to human life. For example, tuberculosis, the most common fatal bacterial disease, kills about 2 million people every year [8]. Since 1676, when Antonie van Leeuwenhoek first observed bacteria, scientists have never stopped exploring the micro-world. The task of identification and classification of bacteria remains challenging because bacteria are invisible to naked eyes and cannot be easily differentiated morphologically. During the past two decades, DNA sequencing technologies have become a powerful tool for scientists to take up the challenge.;In 1995, when John Craig Venter just started to sequence the first bacterial genome -- Haemophilus influenza [9], DNA sequencing was extremely difficult and time consuming. The common thought at the time was that it would be sufficient to build a gene pool of the whole microbial community from just a few dozen representative genomes. Today, thanks to new sequencing technologies, more than 1600 microbial whole genome sequences have been released and many more bacterial genome-sequencing projects are ongoing [10]. With the accumulation of bacterial genomic data, the focus of microbial genomics (study of genomes of microorganisms including archaea, bacteria and fungi) is shifting from single genome to pan-genome (gene pool of a particular species) and meta-genome (environmental gene/species pool). However, the explosion of data has not answered all the questions of researchers in this field. It becomes evident that these data just revealed a tip of the iceberg for the bacterial world. In-depth analysis of these data is needed to help better understand the genome diversity and dynamics of bacteria, interactions between bacteria and their hosts/environments, and the pathogenicity of pathogens. Meanwhile, the unprecedented amount of genome data also poses major challenges for computational analysis, which is an essential tool for microbial genomics. In fact, computational methods for massive genomic sequence analysis have become a bottleneck of microbial genomics.;In this dissertation, we will focus on computational methods for discovering the interactions between bacteria and hosts/environments and bacterial characterization (i.e. identification and classification), based on sequencing data with consideration of bacteria's hosts and environments. While this topic has been brought up in recent publications [11-16], no in-depth review has been presented. Bacterial identification through detecting variations of genome sequences across different species/genus is a very important and essential step of analyzing genomic data, especially for metagenomic data. Thus, in chapter 2, we first review existing computational tools and their limitations for bacterial identification. As bacteria evolve rapidly in response to the environments, bacterial adaptations to different environments/hosts will reflect in their genome sequences. Many bacteria, even belonging to the same species, still show extensive genomic plasticity and diverse pathogenicity. For example, three different E. coli strains, laboratory strains E. coli MG1655, enterohemorrhagic E. coli EDL933, and an uropathogenic strain E. coli CFT073-), share only 39.2% common genes [17].Thus, chapter 3 of this dissertation, we will assess the practical computational methods for detecting the sequence variations of bacteria in different environments for a given species. In chapter 4, we will dissect the evolutionary dynamics of bacterial virulence and review the methods for identification of genetic markers in bacterial DNA sequences that are associated with a disease or host. In chapter 5, based on our observations and works in chapter 4, we predict some novel effectors for those known pathogens. The last chapter is the summary of this dissertation.
Keywords/Search Tags:Bacteria, Computational methods, Chapter, Data
Related items