Font Size: a A A

Unbiased population genetic inference from high-throughput sequencing data

Posted on:2010-06-19Degree:Ph.DType:Dissertation
University:University of California, BerkeleyCandidate:Johnson, Philip Lee FalkFull Text:PDF
GTID:1440390002483365Subject:Biology
Abstract/Summary:PDF Full Text Request
Metagenomic sequencing projects generate short, overlapping fragments of DNA sequence, each of which derives from a different individual at random locations throughout the genome. These data stand in sharp contrast to traditional population genetic samples which consist of sequences from a small number of individuals at a small number of fixed locations throughout the genome. As a result, the high-resolution, genome-wide metagenomic data hold the potential to reveal much more information about the biology of the sampled organism. I develop two novel estimation methods that operate on these data while properly accounting for sequencing error by using quality scores during inference. In general, if population genetic inference ignores sequence quality, then the resulting estimate will be biased, with the extent of bias depending on the amount of signal (true polymorphic sites) relative to the amount of noise (false polymorphic sites generated by sequencing errors). My first estimation method applies maximum likelihood using the site-frequency spectrum to yield unbiased estimates of the scaled mutation rate, theta = 2Nemu, and the scaled exponential growth rate, R = Ner. The second method applies maximum composite-likelihood using pairs of sites to estimate the scaled recombination rate, rho = 2 Nec. Given the genome-wide nature of metagenomic data, this estimator will be able to detect hitherto-unknown recombination hotspots in microbial populations. The methods are tested using simulated data and then, as a proof-of-concept, briefly applied to data from a metagenome project that sampled a population of Accumulibacter phosphatis from activated sludge.
Keywords/Search Tags:Data, Population, Sequencing, Inference
PDF Full Text Request
Related items