Font Size: a A A

The Development Of Efficient Strategies For Resolving Contentious Phylogenetic Relationships In Phylogenomic Studies

Posted on:2018-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Y CheFull Text:PDF
GTID:1310330536476268Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
With the advent of high-throughput sequencing approaches,molecular phylogenetics has entered the era of phylogenomics in which using genome-scale data to reconstruct the evolutionary history of organismal groups.The emergence of phylogenomic data provides unprecedented opportunities to resolve challenging phylogenies of species.However,recent in-depth studies have revealed that larger sets of characters contribute to both phylogenetic signal and noise.When we focusing on contentious nodes in the Tree of Life(e.g.phylogenetic relationships of rapid radiation clades),merely adding more sequences does not necessarily solve the problem.Even worse,phylogenetic noise may become dominant and yield statistically highly supported but misleading phylogenetic inference.In this study,we developed new data filtering methods to increase the signal-to-noise ratio,and introduced more informative data type for case ofrapid radiation thus improving tree reconstructions of long-standing controversies.?.The backbone phylogeny of jawed vertebrates inferred by phylogenomic data sets.In the genomic era,how to best analyze massive data sets,dissect phylogenetic signal and examine systematic sources of error in order to assess the robustness of the obtained estimates are still important open questions for phylogenomic analysis.To improve the signal quality of data,phylogenomic studies normally adopt some data filtering approaches,such as reducing missing data or using slowly evolving genes,etc.However,few empirical studies compared their performances in data-quality improvement to each other.Several nodes within the backbone phylogeny of jawed vertebrates remain contentious despite the use of considerable sequence data,making it as an appropriate model for investigating the performance of different data filtering methods in simultaneously resolving multiple difficult phylogenetic questions.Here,we assembled a phylogenomic data set of 58 jawed vertebrate taxa and 4682 genes based on ten newly generated transcriptomic data combined with publicly available genomic or transcriptomic data to investigate the backbone phylogeny of jawed vertebrates.To evaluate the efficiency of extracting phylogenetic signals among different data filtering methods,we chose six highly intractable internodes within the backbone phylogeny of jawed vertebrates as our test questions.We found that our phylogenomic data set exhibits substantial conflicting signal among genes for these questions.Our analyses showed that existing data filtering methods may be inefficient for increasing the signal-to-noise ratio when there are several difficult nodes within a phylogeny.Non-specific data sets produced inconsistent results,and phylogenetic accuracy based on non-specific data is considerably influenced by the size of data and the choice of tree inference methods.To address these issues,we proposed that question-specific data filtering approach could be an efficient way to improve data quality,which refers to selecting genes that resolve a given internode but not the entire phylogeny.It can be implemented in two ways,we refer to the first questionspecific strategy as the “hypothesis-control” approach.This method removes those genes whose gene trees do not support any of the predefined hypotheses for a given question.The second question-specific strategy is “node-control” approach.In this strategy,we only select genes whose gene-tree recovers a specific node that related to the question.Notably,not only can this strategy yield correct relationships for the question,but it also reduces inconsistency associated with data sizes and inference methods.Using the largest current data set comprising 4682 genes,our in-depth phylogenomic analyses produced a reliable framework for the backbone phylogeny of jawed vertebrates.Our study highlights the importance of gene selection in phylogenomic analyses,suggesting that simply using a large amount of data cannot guarantee correct results.Constructing question-specific data sets may be more powerful for resolving problematic nodes.?.Phylogenomic analyses and improved resolution of Laurasiatheria.Phylogenetic relationships in rapid radiationsare notoriously difficult to resolve as short divergences provide very little phylogenetic signal of branching order and it can easily be confounded by non-phylogenetic signal in the data.Traditionally,considering the availability of data and complexity of phylogenetic inference,analyses of rapid radiations mainly relied on relative slowly evolving coding sequences(CDS),rather than fast evolving noncoding sequences.Laurasiatherian lineages present a classical example of rapid radiation,and the interordinal relationships of Laurasiatherian mammals is currently one of the most controversial questions in mammalian phylogenetics.In this study,by data mining the public genome data,we compiled an intron data set of 3,638 genes(19,055,073 bp)and a CDS data set of 10,259 genes(20,994,285 bp)to investigate phylogeny of Laurasiatheria under both concatenation and coalescent-based frameworks,covering all major lineages of Laurasiatherian mammals(except Pholidota).We found that the intron data contained more homogeneous and stronger phylogenetic signals than the CDS data.In agreement with this observation,concatenation and species-tree analyses of the intron data set yielded well-resolved and identical phylogenies,while the CDS data set produced weakly supported and incongruent results.To investigate whether the well-resolved phylogeny inferred from intron data set was an artifact due to systematic errors,we generated multiple data subsets from the original intron data set based on different data filtering criteria.For the sake of comparison,similar data subsets were also generated for the CDS data set according to the same criteria.In addition,we evaluated sensitivity to outgroup selection for both intron and CDS data sets.Further analyses showed that the phylogeny inferred from the intron data is highly robust to data subsampling and change in outgroup,but the CDS data produced highly unstable results under the same conditions.Interestingly,gene tree statistical results showed that the most frequently observed gene tree topologies for the CDS and intron data are identical,suggesting that the major phylogenetic signal within the CDS data is actually congruent with that within the intron data.Using the largest current intron data set,our phylogenetic analyses recovered a statistically wellsupported phylogenetic framework for Laurasiatheria: Chiroptera and Perissodactyla formed a well-supported clade that is sister to the clade comprising Cetartiodactyla and Carnivora,representing a step towards ending the long-standing “hard” polytomy.Besides,our study argues that intron genome data is a promising data resource for resolving rapid radiation events across the tree of life.
Keywords/Search Tags:phylogenomics, phylogenetic signal, phylogenetic noise, intron, phylogenomic subsampling
PDF Full Text Request
Related items