Font Size: a A A

Reconstructing Posterior Distributions of a Species Phylogeny Using Estimated Gene Tree Distribution

Posted on:2007-08-20Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Liu, LiangFull Text:PDF
GTID:1443390005475586Subject:Biostatistics
Abstract/Summary:
The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods to combine data, such as the concatenation method, the consensus tree method, or the gene tree parsimony method may be biased. In this dissertation, I propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions such as those that arise in a Bayesian analysis of DNA sequence data. The model employs substitution models used in traditional phylogenetics, but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to simultaneously estimate gene trees, species trees, ancestral population sizes, and species divergence times. The proposed model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. The method is applied to three multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of species trees that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication.
Keywords/Search Tags:Species, Gene, Tree, Estimated
Related items