Font Size: a A A

Study Of Methods Of Constructing Evolutionary Trees With DNA Sequences

Posted on:2009-11-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:1118360278461977Subject:Artificial Intelligence and information processing
Abstract/Summary:PDF Full Text Request
An evolutionary tree, also known as phylogenetic tree, is a tree structure showing the evolutionary relationship among various organisms. A reliable evolutionary tree inference not only helps to understand the evolutionary history and evolution mechanisms in biology, it is also of great significance to many branches research in biomedical science and computational molecular biology.Currently, a rich variety of evolutionary tree reconstruction methods have been developed, which fall into three categories, (a) distance based methods, (b) maximum parsimony methods and (c) approaches applying the maximum likelihood principle. It is known that maximum likelihood can recover correct trees more frequently than other methods. But high computational complexity is the main disadvantage. In order to improve maximum likelihood, the paper launched an in-depth study from the following four aspects:(1) Proposed a fast and effective topological rearrangement operatorDue to the very high computational complexity, maximum likelihood in application is based on heuristics. Although using different search strategies, the heuristics are all to try to improve a starting tree / starting trees by a series of elementary topological rearrangement operators, until local optima is found. It is obvious that the performance of the heuristics depends on the degree of exhaustiveness of the topological rearrangement operators on some extent. The topological rearrangement operators often used in heuristics are Nearest Neighbor Interchange (NNI), Subtree Prune and Regraft (SPR) and Tree Bisection and Reconnection (TBR), and TBR is the most exhaustive. Even TBR searches, however, can often get trapped in local optima, since there are not many trees accessible in one step from any given tree. Another more exhaustive topological rearrangement operator is p-Edge Contraction and Refinement (p-ECR). However, due to its high computation complexity, p-ECR has rarely been used in practice. This paper presented a fast and efficient topological rearrangement operation combining p-ECR and neighbor joining. The main idea of p-ECRNJ is to to use neighbor joining to refine the unresolved nodes produced in p-ECR. Thus, in very iteration, p-ECRNJ only spends O(p3) to resolve unresolved nodes, and without having to spend a lot of time to evaluate all possible trees (at most (2p+1)!!), which significantly reduced the time complexity of p-ECR. Experiments with 12 real datasets show that p-ECRNJ based heuristics can find better trees than other two popular evolutionary tree reconstruction methods (PHYML and RAxML) in considerable time. Moreover, p-ECRNJ can improve efficiently other local topological rearrangement operators, such as NNI. All the results prove that p-ECRNJ is efficient.(2) Proposed a PSO based evolutionary tree reconstruction methodHill climbing has been widely used in evolutionary tree reconstruction and has achieved some success. However, for hill-climbing strategies to be guaranteed to find the global optimum the optimality landscape must be unimodal. Moreover, research has demonstrated that multiple maxima exist under the maximum likelihood principle. Therefore, hill climbing based evolutionary tree reconstruction methods often return local optima. In principle, PSO (Particle Swarm Optimization) is the parallel and dynamic version of hill climbing and has a higher probability to break away from the local optima effectively. This paper presented a PSO based evolutionary tree reconstruction method named PSOML(Particle Swarm Optimization based Maximum Likelihood). PSOML uses the framework of the basic PSO principle, every particle in the swarm corresponds to a possible evolutionary tree and the update of particles'state is accomplished by p-ECRNJ, while p is dependent on particles'velocity. Experiments with real datasets show that PSOML clearly outperforms PHYML and RAxML in terms of solution optimality, although it cannot compete in terms of runtimes. Moreover, PSOML can find better trees than GARLI across a range of datasets in considerable time.(3) Proposed an evolutionary tree reconstruction combing Quartet Puzzling and neighbor joiningDue to the high time complexity of maximum likelihood, divide-and-conqure based quartet methods of phylogeny reconstruction are currently receiving considerable attention. The most popular is Quartet Puzzling (QP). It first use maximum likelihood to estimate the topology quartet set Q, and then use a greedy algorithm to merge Q into an evolutionary tree containing all sequences. But research shows that QP is not as accurate as expected. How to effectively recombine the information contained in Q is still a hard problem facing QP. Neighbor joining is a clustering technique, which has many good characteristics. However, the performance of neighbor joining depends on the quality of the estimated evolutionary distances of different sequences. Long branch is still a problem facing neighbor joining. Therefore, this paper proposed a reconstruction method QPNJ (Quartet puzzling and Neighbor Joning) combing neighbor joining and QP. The main idea of QPNJ is to use maximum likelihood to estimate the set Q of quartet topologies, estimate the evolutionary distances between every two sequences according to Q, and use neighbor joining to reconstruct an evolutionary tree according to the evolutionary distances. One hand, using neighbor joining with better theoretical guarantees as the recombining technique can improve QP; On the other hand, QPNJ can avoid the long branch problem faced by neighbor joining. In theory, QPNJ has the same time complexity with QP. It is worth noting that the combination of QP and neighbor joining is not only, any clustering method like neighbor joining can replace neighbor joining to combine with QP. Finally, simulation experiments show that QPNJ and QPWNJ(the combination of QP and Weighbor) are more accurate than the QP, and even than neighor joining and Weighbor. Moreover, not like QP, QPNJ and QPWNJ do not rely on the structures of model trees. These experiments results prove the combination of QP and clustering methods such as neighbor joining and Weighbor is efficient.(4) Proposed an evolutionary tree reconstruction combing homotopy with SEMReconstructing evolutionary trees according to current sequences is a typical machine learning problem from incomplete data. SEM(Structural Expectation Maximization) is very efficient to estimate model structures using maximum likelihood with incomplete data. Starting from a structure, SEM iteratively probabilistically complete the data according to the distribution induced by current model and use the completed data to evaluate different candidate structures. However, in SEM, the condition probability of the hidden variables is directly computed by Bayes'rule and the structure obtained in every iteration is optimized with respect to the expected likelihood value of the optima in last iteration. As a result, at later iterations of the procedure, the trees that maximize this expected likelihood value will tend to be similar to the tree found in the previous iteration. Therefore, the performance of SEM depends on the starting points. Moreover, research has demonstrated that multiple maxima exist under the maximum likelihood principle. Thus, the reconstruction algorithm using SEM can often be strapped in local optima. The homotopy method belongs to the field of global optimization techniques. The main idea is that a smoothed version of the objective function is first optimized. With enough smoothing, the optimization will be convex and the global optimum can be found. Smoothing then increases and the new optimal are computed, where the solution found in the previous step serves as a starting point. The algorithm iterates until there is no smoothing. On the basis of SEM and homotopy, this paper proposed an evolutionary tree reconstruction, which is named HSEMPHY. HSEMPHY computes the condition probability of hidden variables through maximum entropy principle. It can reduce the influence of the initial value on the final solution by introducing the homotopy parameterβand simulating the process of homotopy continuation principle. Experimental results with real datasets show that HSEMPHY is at least as good as the two most popular methods PHYML and RAxML while it is very robust to poor starting trees.
Keywords/Search Tags:Evolutionary tree reconstruction, Maximum likelihood, Neighbor joining, Topological rearrangement, Quartet methods, Structural expectation maximization
PDF Full Text Request
Related items