Font Size: a A A

Research And Implementation Of Phylogenetic Multifurcating Tree Reconstruction

Posted on:2008-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:L Y XuFull Text:PDF
GTID:2178360215994861Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Molecular phylogenetic analysis is one of the most important fields of Bioinformatics, the main task of which is to reconstruct a phylogenetic tree from a group of homologous DNA or protein sequences, showing the evolutionary relationship between the species of those sequences. Usually, a phylogenetic tree is a binary tree, in which the leaf nodes stand for the species or the organisms, the tree topology indicates the phylogenetic relationship, and the length of branches figure out the evolutionary distance between the species and their common ancestor. There are three main types of methods to reconstruct phylogenetic trees: distance matrix, parsimony and likelihood. Distance matrix method has wide applications because of its simplicity and solid theory.However, it has been reported that some distance matrix methods, e.g. Unweighted Pair-Group Method using Arithmetic averages (UPGMA), may produce multiple phylogenetic trees ("tied trees") given a single set of homologous sequences in certain cases, depending on the input order of the sequences. Although UPGMA is designed to produce single trees, it sometimes generates tied trees in cases that there are more than one pairs of sequences with the minimal distance, because it is not clear for UPGMA to determine which of such pairs should be selected. To interpret different results from the same group of sequences is difficult, and most popular softwares for phylogenetic analysis do not handle this problem but just to give a random result depending on the specific implementation.In order to solve the"tied trees"problem, we first analyze the reason for UPGMA to derive multiple results sometimes in detail, and then present an improved method for UPGMA, namely, Unweighted Multi-Group Method using Arithmetic averages (UMGMA). UMGMA extends UPGMA in dealing with the minimal elements in distance matrix. In fact, it chooses all the elements less than a tolerant parameterτ, to construct new OTUs, in a different way from UPGMA. It has been shown that UMGMA may produce a unique multifurcating tree which integrates all the possible tied UPGMA trees. In case UPGMA produces a unique tree, the UMGMA tree (τ=0) is always the same as the corresponding UPGMA tree. Experiments on real sequences have shown that UMGMA can not only produce phylogenetic trees with a unique topology, but also produce trees at different levels. This means that UMGMA can gradually make clear the overall structure of trees by adjusting the tolerant parameterτin the large-scale phylogenetic analysis.In this thesis, we also propose a new molecular phylogenetic analysis system– Multi-Tree, which integrates both UMGMA and UPGMA. The system is a rich client application base on Microsoft .Net framework 2.0, with a web service support to perform multiple sequences alignment. Using it, one can do some basic phylogenetic analysis, including alignment of sequences, calculation of distance matrix, construction of phylogenetic trees, and so on. Compared with many other software tools for phylogenetic analysis, the Multi-Tree system has a good and friendly User Interface (UI) that supports multi-languages, and it can be easily extended and maintained in the plug-in structure that can load UI and business logic components at runtime.
Keywords/Search Tags:molecular phylogenetic analysis, distance matrix method, uniqueness, multifurcating tree
PDF Full Text Request
Related items