Font Size: a A A

Research Of Phylogenetic Tree Reconstruction Based On Pairwise Sequences Alignment

Posted on:2009-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:F Y WangFull Text:PDF
GTID:2178360242494177Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Phylogenetic analysis is one important field in bioinformatics; its main task is to reconstruct a phylogenetic tree from a group of homologous DNA or protein sequences, showing the evolutionary relationship between those sequences. There are mainly three types of tree-building methods: distance, parsimony and likelihood. This thesis will do some exploratory researches to improve the distance method.The traditional distance method usually computes distance matrices by multiple sequences alignment, which is a NP-hard problem not feasible to get the best alignment for a large number of sequences. One solution is to apply heuristic algorithms for multiple sequences alignment, the other is to compute some distances between pairwise sequences instead of multiple sequences alignment. For the method to compute pairwise diantances, Hansan H.Out and Khalid Sayood proposed a LZ complexity based distance, which can reconstruct some Phylogenetic trees correctly. The thesis tries to avoid the difficulty of multiple sequences alignment by a normalized edit distance sastifying the triangle inequality, but still needs pairwise sequences alignment.Needle-Wunschman algorithm is commonly used for pairwise sequences alignment, both its time and space complexities are O (mn)(m, n stand for the lengths of two sequences), sometimes being out of memory for long sequences. The memory problem can be solved by Hirschberg algorithm, which has a linear space complexity but roughly doubles the running time as O (2 mn). Using a new way to compute check points, this thesis proposes a block recursive sequences alignment algorithm by trying to compromise the complexities between time and space. Theorical analysis shows its space complexity is roughly between O (5( m + n ) + Ls×min( m - 1, n - 1) + C2) and , with its time complexity between and in general cases but between and in the case for sequences of high similarities. Some experiments in aligning genomes from homology species have further shown that it runs correctly and at least 10% faster than the Hirschberg algorithm if the normalized edit distance between the two compared sequences is less than 0.25. Therefore, the algorithm is useful in real applications such as homology sequences analysis, phylogenic tree construction, etc. O (5( m + n ) + Ls×( m + n - 2) +C2)O (1.5 mn)O (3m n )O (1.5 mn )O (2 mn)The distances directly computed from pairwise sequences alignment are subjected to sequence lengths and not good representations of real evolutionary phenomena, so they have to be normalized to reduce the influence of sequence lengths in building phylogenetic trees. The normalized edit distance used in this thesis is a metric valued in [0, 1], and it can avoid some problems related to negative branch lengths in phylogenetic trees because the triangle inequality is completely sastified. Finally, this thesis shows in two groups of experiments that the normalized edit distance can successfully reconstruct certain phylogenetic trees proven true by other methods.
Keywords/Search Tags:phylogenetic tree, normalized edit distance, block recursive, time complexity, space complexity
PDF Full Text Request
Related items