Font Size: a A A

The use of alignment-free statistics for the evolutionary study of 5' cis-regulatory sequences

Posted on:2012-10-19Degree:M.SType:Thesis
University:University of Southern CaliforniaCandidate:Li, JingFull Text:PDF
GTID:2450390011452057Subject:Statistics
Abstract/Summary:
Phylogenetic tree reconstruction is important for the understanding of the evolutionary history of sequences. Traditionally, it requires construction of a multiple sequence alignment (MSA) from sequences. However, for gene regulatory regions, multiple sequence alignments often do not work well as most parts have diverged. In this thesis, we focus on the task of using alignment-free statistics-based distance measures to infer the genetic evolutionary relationship of 5' cis-regulatory sequences. Based on the alignment-free statistic D2 and its two variants D*2 and DS2 , we develop their corresponding k-tuple distance measures for phylogenetic tree reconstruction. Through simulations we show that the accuracy of our k-tuple distance measures depends on the extent the evolving sequences diverge from their most recent common ancestor. However, as simulations cannot capture all the complexities of real sequence evolution, it is essential and useful to perform analyses on real data. Therefore, we collect genome-wide 5' cis-regulatory sequences of vertebrates' orthologous genes for real data studies. Our three k-tuple distance measures present a high accuracy, in terms of the percentage of genes with phylogenetic trees of the 5' cis-regulatory sequences consistent with the reference species tree. The performance comparisons of our distance measures among different sub-regions in real sequences verify the reliability of those measures. Our results also show 5-tuple or 7-tuple might be the optimal size for the evolutionary study of 5' cis-regulatory sequences. Among the three statistics, DS2 -based distance measure performs the best in real data sets but not better than MSA-based methods. In summary, our analysis indicates that the alignment-free k-tuple distance measures can be an useful alternative for phylogenetic tree reconstruction, as a complement of the MSA-based methods.
Keywords/Search Tags:Sequences, K-tuple distance measures, Tree reconstruction, Evolutionary, Alignment-free, Phylogenetic
Related items