Font Size: a A A

Research On Algorithms Of Gene-Duplication And Gene-Loss Based On Nearest Neighbour Interchange Tree Operation

Posted on:2012-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:M CuiFull Text:PDF
GTID:2210330338962714Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The existing species has genetic links due to inheritance. We can use the context information to construct a tree called species tree to reflect the evolutionary relationship between them. With the development of genetic technology, we hold more and more genomic sequence information which provides the vast amounts of potential information for phylogenetic analyses. In order to build a phylogenetic tree for a set of species, we constructs a phylogenetic tree from genes taken from those species. Such tree is called gene tree. The evolution of the chosen genes mimics the evolution of the species themselves. There are a lot of complex evolutionary processes such as gene duplication and gene loss, so trees constructed on genes do not always accurately represent the evolutionary history of the corresponding species. In the actual construction of the optimal tree species, we need to consider the impact of these factors according to certain criteria.Constructing species trees is a fundamental scientific problem facing human being today. Ma Bin et al. have proved this problem to be NP-hard. Therefore, in order to solve this problem effectively, in practice, we need to use some heuristic algorithms to find an optimum species tree. Existing heuristics usually use local search steps to solve these problems.The Gene-duplication and Gene-loss problem are based on the gene duplication-loss model proposed by Goodman et al. M.S. Bansal et al. designed a series of heuristic algorithms to apply to the actual study of species evolution. These heuristics aimed at solving the gene-duplication and gene-loss problem calculate the eigenvalue of every possible species tree in the tree space obtained by tree operations, then perform local search in the tree space to find the local optimal solution. There are three common tree operations:NNI, SPR and TBR. The algorithm of gene-duplication problem base on NNI, SPR and TBR have been optimized to O(kmn), O(kmn) and O(kmnlogm) separately. The algorithm of gene-loss problem base on SPR and TBR have been optimized to O(kmn), O(kmn2) separately.In this paper, we optimized two problems based on original research.(1)Proposed a new algorithm for 1-NNI gene-duplication problem. Through the classification and analysis of the nodes in gene trees, we can eliminate many redundant calculations. So the running time could be greatly reduced. We verify the performance of our algorithm in a comparison experiment using sets of large randomly generated gene trees.(2) Proposed a new algorithm for gene-loss problem based on NNI operation. By analyzing the nodes of which loss value were not changed after the second NM operation, we can get a series of properties and theorems which can be reused to solve the problem. The naive solution for this problem require O(kmn2). The new algorithm we provide can solve this problem in O(kmn) time.
Keywords/Search Tags:Gene duplication, Gene Loss, Species Tree, Gene Tree, NNI
PDF Full Text Request
Related items