Font Size: a A A

The Research On Construction Of Phylogenetic Tree Based On Discrete Measure

Posted on:2011-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhouFull Text:PDF
GTID:2230330395485436Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Phylogenetic analysis is one important field in bioinformatics;its main task is toreconstruct a phylogenetic tree from a group of homologous DNA or proteinsequences, showing the evolutionary relationship between those sequences.There aremainly three types of tree—building methods: distance, parsimony and likelihood.Distance matrix method has wide applications because of its simplicity and solidtheory.This thesis will do some exploratory researches to improve the distancemethod.Based on distance matrix method is a commonly method in constructingphylogenetic trees, but the traditional distance matrix method is built to base onsequence alignment, which makes some subjective factors destroyed the original stateof whole genome sequences. The process of alignment consumes large cost, and thedistances directly computed from pairwise sequences alignment are subjected tosequence lengths and not good representations of real evolutionary phenomena.Therefore, in order to solve this problem, we propose a new measure forsimilarity, called discrete measure, which measure the similarity between sequenceswithout alignment, and does not have subjective factors to interfere, and relativelyintuitive, less calculation. Based on the discrete measure, we proposed an improvedvertical-horizontal algorithm. After the vertical-horizontal algorithm built theconnected graph, the process of sorting the weight is needed. But for our improvedvertical-horizontal algorithm, it is not needed.The distance matrix method based on discrete measure is put forward on thebasis of information theory. This method transfers the DNA sequences into objectssuch as above defined count vectors, the frequency vectors and so on, which areanalyzed and processed by mathematical tools such as the existing linear algebra, thestatistical theory, information theory and so on. In this paper, we use informationdiscrete measure to measure the similarity or dissimilarity between vectors. To extractthe similarity between sequences, the algorithm firstly codes the sequences byK-strings, and then calculates the information gain. After that, we built the distancematrix to construct the phylogenetic trees, and compared with other methods.The phylogenetic trees construction system is based on LabVIEW platform, andit can conveniently load the data of distance matrix from Excel or TXT file to presentthe graphic result. In order to assess the method feasibility of constructing the phylogenetic treesbased on discrete measure, we select10mammals’ whole mitochondrial genomessequences as a dataset, and use Neighbor.exe program of PHYLIP software to assessit, and then we verify the method feasibility by the experiment.
Keywords/Search Tags:phylogenetic trees, discrete measure, distance matrix, informationgain, information theory
PDF Full Text Request
Related items